Machine Translation @ the National Centre for Language Technology

Dublin City University, Ireland

Home People Projects Publications Events Announcements Links
-- current projects --
  Confident MT

-- completed projects --
  Sign language

  DVD subtitling
  EBMT & Marker

  Hybrid EBMT-SMT
Title:TransBooster: boosting the performance of existing MT by complex sentence reduction
Duration:October 1st 2003 - September 30th 2006
Funded by:Enterprise Ireland
People:Bart Mellebeek, Karolina Owczarzak, Josef van Genabith, Andy Way
Description:Machine Translation (MT) systems tend to underperform when faced with long, linguistically complex sentences. Rule-based systems often trade a broad but shallow linguistic coverage for a deep, fine-grained analysis since hand-crafting rules based on detailed linguistic analyses is time-consuming, error-prone and expensive. Most data-driven systems lack the necessary syntactic knowledge to effectively deal with non-local grammatical phenomena. Therefore, both rule-based and data-driven MT systems are better at handling short, simple sentences than linguistically complex ones.
 This thesis proposes a new and modular approach to help MT systems improve their output quality by reducing the number of complexities in the input. Instead of trying to reinvent the wheel by proposing yet another approach to MT, we build on the strengths of existing MT paradigms while trying to remedy their shortcomings as much as possible. We do this by developing TransBooster, a wrapper technology that reduces the complexity of the MT input by a recursive decomposition algorithm which produces simple input chunks that are spoon-fed to a baseline MT system. TransBooster is not an MT system itself: it does not perform automatic translation, but operates on top of an existing MT system, guiding it through the input and trying to help the baseline system to improve the quality of its own translations through automatic complexity reduction.
 In this dissertation, we outline the motivation behind TransBooster, explain its development in depth and investigate its impact on the three most important paradigms in the field: Rule-based, Example-based and Statistical MT. In addition, we use the TransBooster architecture as a promising alternative to current Multi-Engine MT techniques. We evaluate TransBooster on the language pair English$\rightarrow$Spanish with a combination of automatic and manual evaluation metrics, providing a rigorous analysis of the potential and shortcomings of our approach.
Last update: Sep 19 2007
Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University