Machine Translation @ the NCLT & the CNGL

Dublin City University, Ireland

Home People Projects Publications Events Announcements Links
-- current projects --
  Confident MT

-- completed projects --
  Sign language

  DVD subtitling
  EBMT & Marker

  Hybrid EBMT-SMT
Title:Attempt: "All Trees" Efficient Models of Parsing and Translation
Duration:October 1st 2006 -- September 30th 2009
Funded by:Science Foundation Ireland's Research Frontiers Programme
People:Ventsislav Zhechev, John Tinsley, Mary Hearne, Andy Way
Collaboration:Khalil Sima'an, University of Amsterdam
High-level Description:Current statistical approaches to Machine Translation often produce 'word salad'. Despite the fact that knowledge of syntax has been shown to be useful in other MT paradigms, no-one has successfully incorporated such models into today's leading SMT systems. Example-based models currently achieve state-of-the-art performance in both parsing and translation, but computational efficiency can be a problem. This project proposes a number of efficient approaches to the problem of Translation by Parsing using all training examples ("all trees"), focusing in the first instance on the underlying monolingual parsing models and scaling them in subsequent phases to the bilingual case.
Detailed Description:

The starting point of this project was the development of a sub-tree alignment tool for the automatic induction of translational equivalence links between pairs of trees. This allows for the automatic construction of parallel treebanks which can subsequentially be used in a number of NLP applications such as, most significantly for us, training data for MT.

Following on from this, we've incorporated syntax-based phrases extracted from automatically built parallel treebanks into state-of-the-art phrase-based SMT systems and shown them to significantly improve translation accuracy. More recent work has been focussing on optimising the use of parallel treebanks to this effect.

Work is also being carried out on extending the sub-tree aligner to align not only pairs of trees but also string-tree, tree-string and string-string pairs as well as further optimisation of the tree-to-tree aligner itself

Some work has also been carried out in terms of using the aligner to align dependency structures and investigating the relative merits of using dependency and phrase-structure based phrase pairs in phrase-based SMT.

Support:ICHEC: Irish Centre for High-End Computing
Last update: May 7 2008
Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University