NCLT Seminar Series 2013/2014The NCLT seminar series usually takes place every 2nd Wednesday from 4-5 pm in Room L2.21 (School of Computing).
The schedule of presenters will be added below as they are confirmed. Please contact John Judge if you have any queries about the NCLT 2013/2014 Seminar Series.
We present a number of semi-supervised parsing experiments on the Irish language carried out using a small seed set of manually parsed trees and a larger, yet still relatively small, set of unlabelled sentences. We take two popular dependency parsers -- one graph-based and one transition-based -- and compare results for both. Results show that using semi-supervised learning in the form of self-training and co-training yields only very modest improvements in parsing accuracy. We also try to use morphological information in a targeted way and fail to see any improvements.
In this work we propose a probabilistic graphical model as an innovative framework for studying typological universals. We view language as a system and linguistic features as its components whose relationships are encoded in a Directed Acyclic Graph (DAG). Taking discovery of the word order universals as a knowledge discovery task we learn the graphical representation of a word order sub-system which reveals a finer structure such as direct and indirect dependencies among word order features. Then probabilistic inference enables us to see the strength of such relationships: given the observed value of one feature (or combination of features), the probabilities of values of other features can be calculated. Our model is not restricted to using only two values of a feature. Using imputation technique and EM algorithm it can handle missing values well. Model averaging technique solves the problem of limited data. In addition the incremental and divide-and-conquer method addresses the areal and genetic effects simultaneously instead of separately as in Daumé III and Campbell (2007).
The open source Unstructured Information Management Architecture (Apache UIMA) that IBM Research donated to the Apache Foundation in 2006 is what makes Watson’s hundreds of independent algorithms work together. UIMA is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities. It provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications. In this talk, we'll get introduced to the framework and what it provides to the natural language processing community of researchers and engineers.
Rule-based machine translation (MT) is the paradigm of choice when the amount of bilingual resources available is not large enough to train a full-fledged statistical MT system. Building a rule-based MT system usually implies a considerable investment in the development of linguistics resources. However, even in those cases in which bilingual parallel corpora are scarce, automatic inference methods can be used to automatically infer structural transfer rules. In this talk I will present the current developments at Universitat d'Alacant aimed at learning shallow-transfer MT rules from small parallel corpora for their used by the shallow-transfer MT platform Apertium. Inspired by the work by Sánchez-Martínez & Forcada (2009) we use alignment templates (AT), like those used in statistical MT, and overcomes the main limitations of their approach: the inability of finding the appropriate level of generalisation for the ATs from which rules are generated; the inability to perform context-dependent lexicalisations to be able to give a different treatment to those words that are incorrectly translated by more general ATs; and the deficient selection of the sequences of lexical categories for which transfer rules are generated. Preliminary experiments show that translation quality is improved as compared to the method by Sánchez-Martínez & Forcada (2009), and the number of inferred rules is considerably smaller.
As some of you may know, University of Helsinki is mostly known from its strictly rule-based approach to computational linguistics, with main contributions like TWOL system by Prof. Koskenniemi in 1983 and CG system by Prof. Karlsson 1995. In my doctoral dissertation I experimented with some basic approaches of combining statistical information to weighted finite-state models (cf. Openfst and Mohri's academic papers) of language, esp. for morphologically complex languages with limited resources (e.g. Greenlandic). The presentation will consist of some slides from my FSMNLP 2012 tutorial and parts of my lectio praecursoria for my PhD .
|Last update: 10th March 2014|