Prof. Hermann Ney visit
TMI / MT Summit
NCLT / CNGL Workshop
MaTrEx Redesign Meeting
NCLT / CNGL Workshop
|Wednesday 23- Thursday 24 July 2008 1:30-5pm rooms N209 (23rd) and L221 (24th) |
|Wednesday (N209)||Thursday (L221)|
Transfer talk + DOT+ OpenMPI
Previous exposure to MT
Syntax for SMT Phrase Extraction [slides]
Probabilistic Transfer-based MT [slides]
Automatic grammaticality judgements [slides]
Joint project with Jennifer
|Wednesday 23 July, room N209|
|First session (responsible: Jinhua)|
In this talk I'll give an overview of the MT research we aim to carry out in the NGL CSET, together with pointers to related work that has been done here in the NCLT.
Firstly, I will give a brief overview on the task of word alignment and relate it to statistical machine translation. Secondly, I will explore two of the factors that will influence the performance of word aligner---word segmentation (tokenisation) and syntax. I briefly introduce two novel approaches, word packing investigating the role of segmentation in word alignment and syntax-enhanced word alignment justifying the use of syntax. Then, I will say a few words about integrating these approaches into MaTrEx system and using them in MT evaluation. Finally, we point out some future work in this line.
Tree-to-tree alignment and Data Oriented Translation
The talk will be about the tree-to-tree alignment system I developed, but will also include a short overview of DOT. If time permits, I will give a presentation on using the OpenMP APIs in C++ to parallelise existing software. This will focus mainly on giving links to resources
with information on this topic, but I'll also give an example of how I
used this technology.
Following Ventsi's discussion of the aligner, I will describe how I've used resulting treebanks as training data in phrase-based SMT and mention some outstanding issues with this approach. I will then discuss how to further exploit treebanks in syntax-aware MT.
The MaTrEx Translation System
I will give an overview of our MT system, MaTrEx, describing its general functionality and capabilities. I will then discuss the MT wiki in terms of MaTrEx and talk about some work that could be done with the system.
|Second session (responsible: Andy)|
Constituency and Dependency Representations for SMT Phrase Extraction
The talk will focus on the value of replacing and/or combining string-based methods with syntax-based methods for PB-SMT, and the relative merits of using constituency-annotated vs. dependency-annotated training data.
Data-Driven Machine Translation for Sign Languages
My thesis explores the application of data-driven machine translation (MT) to sign languages (SLs) to facilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individual. In this talk, I will first overview sign languages and outline previous approaches and problems that arise. I will then describe the experiments performed translating both to and from SLs along with automatic evaluation scores. I will finish by describing the SL animation process and manual evaluations performed on this task.
The N-gram based machine translation system
I will describe the N-gram based machine translation framework, developed in the TALP group at UPC, Barcelona. In this approach, the joint translation probability is modelled via a log-linear combination of a bilingual N-gram model and additional feature functions.
Exploiting Lexical Information and Discriminative Alignment Training in SMT
The thesis work mainly focused on three aspects of statistical machine translation: the use of lexical information like basic lexical models and multi-word expressions, minimum error training strategies and word alignment models. We proposed a novel framework for discriminative training of alignment models with automated translation metrics as maximisation criterion.
This talk mainly focuses on one topic - multiple system combination. We proposed an improved combination framework which uses MBR decoding, GIZA-TER alignment metric and Confusion Network decoding to generate an optimal translation hypothesis.
The CASIA Translation System
This talk will describe the MT system which participates in many international and domestic MT evaluations on behalf of Institute of Automation. This system is a complete MT platform, including automatic preprocessing module, word alignment processing and phrase generation module, decoding and MER training module and multiple system combination module. I'll focus on the key modules and system configuration.
|Thursday 24 July, room L221 (responsible: Patrik)|
|Josef van Genabith|
Previous MT at DCU, LFG parsing and Generation technologies
Two topics will be covered: (i) Previous MT at DCU (Transbooster, MT Evaluation),and (ii) GramLab LFG parsing & generation technologies for English, Spanish, Chinese, German, French , Japanese and Arabic
Genetic Algorithms for Syntactic Parsing
Traditional syntactic parsers define probabilistic models that allow them to exhaustively explore the search space in reasonable time. In this work we propose and evaluate a search method that performs a non-exhaustive search using heuristics.
Learning a Translation Lexicon from non-Parallel Corpora
This project evaluates the performance of syntactic context windows against positional context windows in extracting word translations from non-parallel English and German newswire corpora.
Distance in SMT
This study is about incorporating syntax information into ngram-based MT focusing on monolingual symmetries. How to integrate distances in these two distinct nature of spaces is the topic.
This talk will give an overview of the research proposal for my PhD, which will compare readability and comprehensibility of RBMT and SMT output for controlled and uncontrolled input. The focus here will be on readability and comprehensibility, controlled language, and eye tracking.
|Second session (responsible: Sylwia)|
Probabilistic Transfer-based Machine Translation
Probabilistic Transfer-based Machine Translation involves automatically
inducing transfer rules from parsed bilingual corpora. In my work, I use
Lexical Functional Grammar (LFG) F-structures as the intermediate
representation for transfer. In this talk I describe an algorithm for
inducing transfer rules automatically from the f-structures of an LFG
parsed corpus. The transfer rule induction algorithm uses an efficient
packed representation that stores multiple rules (up to O(2^n)) in a
single structure (O(n) size). I present recent experiment results tested
on German-English Europarl corpus showing a vast reduction in the amount
of resources that the rule induction algorithm requires. I also briefly
describe a chart-based decoder used for translating unseen sentences using
transfer rules induced by the rule induction alogrithm.
Automatic grammaticality judgements
Joint work with Joachim on our method for automatic grammaticality
judgements and how that might be useful for ranking MT output.
Talk about the new project jointly with Jennifer and how that might link in with MT work.
MT research often demands resources not available on a single desktop PC. Training models can be memory-intensive both in RAM and on disk. Decoding requires lots of CPU time. In this talk I will give an overview of the existing MT group cluster, ICHEC resources, and plans for more new machines.
PBS job and taskfarming example
If many users share the same machines for their experiments without any job management, there will be resource conflicts, for example two processes competing for RAM, causing lots of "swapping" and slowing down both processes almost to a standstill. To address these needs and problems, 5 machines of the MT group have been organised in a cluster. A PBS jobs management system manages the resources centrally and allocates exclusive access to machines for experiments. I will show some examples how to use it.
Having heard about the MT research being/about to be carried out in the NCLT and NGL CSET, we'll attempt to identify trends, convergences, and any gaps that need filling. This will, hopefully, provide strong pointers to the future direction of our research, in the short- to medium-term, at least.
| Last update: August 6 2008 |
| Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University |