Machine Translation @ the NCLT & the CNGL

Dublin City University, Ireland

Home People Projects Publications EventsLinks
Prof. Hermann Ney visit

TMI / MT Summit

NCLT / CNGL Workshop

MaTrEx Redesign Meeting

NCLT / CNGL Workshop

Wednesday 23- Thursday 24 July 2008    1:30-5pm    rooms N209 (23rd) and L221 (24th)
Wednesday (N209)Thursday (L221)
Previous and future MT work [slides]
Previous MT+ GramLab [slides]
Word alignment [slides]
Work plans in the CSET [slides]
Transfer talk + DOT+ OpenMPI
Undergraduate Thesis [slides]
Msc Thesis [slides]
Transfer talk
Previous exposure to MT
MaTrEx system [slides]
Current Research [slides]
Coffee break!
Syntax for SMT Phrase Extraction [slides]
Probabilistic Transfer-based MT [slides]
PhD thesis [slides]
Automatic grammaticality judgements [slides]
The TALP MT system [slides]
Joint project with Jennifer
PhD thesis [slides]
Computing infrastructures [slides]
PhD thesis [slides]
PBS jobs and Taskfarming [slides]
The CASIA MT system [slides]
Conclusion talk/Discussion [slides]
Wednesday 23 July, room N209
First session (responsible: Jinhua)
Andy Way
Introduction Talk
In this talk I'll give an overview of the MT research we aim to carry out in the NGL CSET, together with pointers to related work that has been done here in the NCLT.
Yanjun Ma
Word alignment
Firstly, I will give a brief overview on the task of word alignment and relate it to statistical machine translation. Secondly, I will explore two of the factors that will influence the performance of word aligner---word segmentation (tokenisation) and syntax. I briefly introduce two novel approaches, word packing investigating the role of segmentation in word alignment and syntax-enhanced word alignment justifying the use of syntax. Then, I will say a few words about integrating these approaches into MaTrEx system and using them in MT evaluation. Finally, we point out some future work in this line.
Ventsislav Zhechev
Tree-to-tree alignment and Data Oriented Translation
The talk will be about the tree-to-tree alignment system I developed, but will also include a short overview of DOT. If time permits, I will give a presentation on using the OpenMP APIs in C++ to parallelise existing software. This will focus mainly on giving links to resources with information on this topic, but I'll also give an example of how I used this technology.
John Tinsley
Following Ventsi's discussion of the aligner, I will describe how I've used resulting treebanks as training data in phrase-based SMT and mention some outstanding issues with this approach. I will then discuss how to further exploit treebanks in syntax-aware MT.
John Tinsley
The MaTrEx Translation System
I will give an overview of our MT system, MaTrEx, describing its general functionality and capabilities. I will then discuss the MT wiki in terms of MaTrEx and talk about some work that could be done with the system.
Second session (responsible: Andy)
Sylwia Ozdowska
Constituency and Dependency Representations for SMT Phrase Extraction
The talk will focus on the value of replacing and/or combining string-based methods with syntax-based methods for PB-SMT, and the relative merits of using constituency-annotated vs. dependency-annotated training data.
Sara Morrissey
Data-Driven Machine Translation for Sign Languages
My thesis explores the application of data-driven machine translation (MT) to sign languages (SLs) to facilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individual. In this talk, I will first overview sign languages and outline previous approaches and problems that arise. I will then describe the experiments performed translating both to and from SLs along with automatic evaluation scores. I will finish by describing the SL animation process and manual evaluations performed on this task.
Patrik Lambert
The N-gram based machine translation system
I will describe the N-gram based machine translation framework, developed in the TALP group at UPC, Barcelona. In this approach, the joint translation probability is modelled via a log-linear combination of a bilingual N-gram model and additional feature functions.
Patrik Lambert
Exploiting Lexical Information and Discriminative Alignment Training in SMT
The thesis work mainly focused on three aspects of statistical machine translation: the use of lexical information like basic lexical models and multi-word expressions, minimum error training strategies and word alignment models. We proposed a novel framework for discriminative training of alignment models with automated translation metrics as maximisation criterion.
Jinhua Du
Ph.D thesis
This talk mainly focuses on one topic - multiple system combination. We proposed an improved combination framework which uses MBR decoding, GIZA-TER alignment metric and Confusion Network decoding to generate an optimal translation hypothesis.
Jinhua Du
The CASIA Translation System
This talk will describe the MT system which participates in many international and domestic MT evaluations on behalf of Institute of Automation. This system is a complete MT platform, including automatic preprocessing module, word alignment processing and phrase generation module, decoding and MER training module and multiple system combination module. I'll focus on the key modules and system configuration.
Thursday 24 July, room L221 (responsible: Patrik)
Josef van Genabith
Previous MT at DCU, LFG parsing and Generation technologies
Two topics will be covered: (i) Previous MT at DCU (Transbooster, MT Evaluation),and (ii) GramLab LFG parsing & generation technologies for English, Spanish, Chinese, German, French , Japanese and Arabic
Sara Morrissey
Sergio Penkale
Genetic Algorithms for Syntactic Parsing
Traditional syntactic parsers define probabilistic models that allow them to exhaustively explore the search space in reasonable time. In this work we propose and evaluate a search method that performs a non-exhaustive search using heuristics.
Ankit Srivastava
Learning a Translation Lexicon from non-Parallel Corpora
This project evaluates the performance of syntactic context windows against positional context windows in extracting word translations from non-parallel English and German newswire corpora.
Tsuyoshi Okita
Distance in SMT
This study is about incorporating syntax information into ngram-based MT focusing on monolingual symmetries. How to integrate distances in these two distinct nature of spaces is the topic.
Stephen Doherty
Current Research
This talk will give an overview of the research proposal for my PhD, which will compare readability and comprehensibility of RBMT and SMT output for controlled and uncontrolled input. The focus here will be on readability and comprehensibility, controlled language, and eye tracking.
Second session (responsible: Sylwia)
Yvette Graham
Probabilistic Transfer-based Machine Translation
Probabilistic Transfer-based Machine Translation involves automatically inducing transfer rules from parsed bilingual corpora. In my work, I use Lexical Functional Grammar (LFG) F-structures as the intermediate representation for transfer. In this talk I describe an algorithm for inducing transfer rules automatically from the f-structures of an LFG parsed corpus. The transfer rule induction algorithm uses an efficient packed representation that stores multiple rules (up to O(2^n)) in a single structure (O(n) size). I present recent experiment results tested on German-English Europarl corpus showing a vast reduction in the amount of resources that the rule induction algorithm requires. I also briefly describe a chart-based decoder used for translating unseen sentences using transfer rules induced by the rule induction alogrithm.
Jennifer Foster
Automatic grammaticality judgements
Joint work with Joachim on our method for automatic grammaticality judgements and how that might be useful for ranking MT output.
Deirdre Hogan
New project
Talk about the new project jointly with Jennifer and how that might link in with MT work.
Joachim Wagner
Computing Resources
MT research often demands resources not available on a single desktop PC. Training models can be memory-intensive both in RAM and on disk. Decoding requires lots of CPU time. In this talk I will give an overview of the existing MT group cluster, ICHEC resources, and plans for more new machines.
Joachim Wagner
PBS job and taskfarming example
If many users share the same machines for their experiments without any job management, there will be resource conflicts, for example two processes competing for RAM, causing lots of "swapping" and slowing down both processes almost to a standstill. To address these needs and problems, 5 machines of the MT group have been organised in a cluster. A PBS jobs management system manages the resources centrally and allocates exclusive access to machines for experiments. I will show some examples how to use it.
Andy Way
Conclusion talk
Having heard about the MT research being/about to be carried out in the NCLT and NGL CSET, we'll attempt to identify trends, convergences, and any gaps that need filling. This will, hopefully, provide strong pointers to the future direction of our research, in the short- to medium-term, at least.
Last update: August 6 2008
Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University