-- current projects --
EuroMatrix+
Panacea
CoSyne
PLuTO
T4ME
Confident MT
-- completed projects --
Prospect
Attempt
Sign language translation
Evaluation
Transbooster
DVD subtitling
LFG-DOP
EBMT & Marker Hypothesis
DOP & DOT
Hybrid EBMT-SMT
|
| Title: | TransBooster: boosting the performance of existing MT by complex sentence reduction | |
| Duration: | October 1st 2003 - September 30th 2006 |
| Funded by: | Enterprise Ireland |
| People: | Bart Mellebeek, Karolina Owczarzak, Josef van Genabith, Andy Way |
| Description: | Machine Translation (MT) systems tend to underperform when faced with
long, linguistically complex sentences. Rule-based systems often trade
a broad but shallow linguistic coverage for a deep, fine-grained
analysis since hand-crafting rules based on detailed linguistic
analyses is time-consuming, error-prone and expensive. Most
data-driven systems lack the necessary syntactic knowledge to
effectively deal with non-local grammatical phenomena. Therefore, both
rule-based and data-driven MT systems are better at handling short,
simple sentences than linguistically complex ones. |
| | This thesis proposes a new and modular approach to help MT systems
improve their output quality by reducing the number of complexities in
the input. Instead of trying to reinvent the wheel by proposing yet
another approach to MT, we build on the strengths of existing MT
paradigms while trying to remedy their shortcomings as much as
possible. We do this by developing TransBooster, a wrapper technology
that reduces the complexity of the MT input by a recursive
decomposition algorithm which produces simple input chunks that are
spoon-fed to a baseline MT system. TransBooster is not an MT system
itself: it does not perform automatic translation, but operates on top
of an existing MT system, guiding it through the input and trying to
help the baseline system to improve the quality of its own
translations through automatic complexity reduction. |
| | In this dissertation, we outline the motivation behind TransBooster,
explain its development in depth and investigate its impact on the
three most important paradigms in the field: Rule-based, Example-based
and Statistical MT. In addition, we use the TransBooster architecture
as a promising alternative to current Multi-Engine MT techniques. We
evaluate TransBooster on the language pair English$\rightarrow$Spanish
with a combination of automatic and manual evaluation metrics,
providing a rigorous analysis of the potential and shortcomings of our
approach. |
| |  |
|