-- current projects --
EuroMatrix+
Panacea
CoSyne
PLuTO
T4ME
Confident MT
-- completed projects --
Prospect
Attempt
Sign language translation
Evaluation
Transbooster
DVD subtitling
LFG-DOP
EBMT & Marker Hypothesis
DOP & DOT
Hybrid EBMT-SMT
|
| Title: | Data-Oriented Models of Parsing and Translation | |
| Duration: | October 1st 2001 - September 30th 2004 |
| Funded by: | DCU School of Computing Scholarship |
| People: | Mary Hearne, Andy Way |
| Description: | The merits of combining the positive elements of the rule-based and data-driven approaches to MT are clear: a combined model has the potential to be highly accurate, robust, cost-effective to build and adaptable. While the merits are clear, however, how best to combine these techniques into a model which retains the positive characteristics of each approach, while inheriting as few of the disadvantages as possible, remains an unsolved problem. One possible solution to this challenge is the Data-Oriented Translation (DOT) model originally proposed by Poutsma(1998,2000,2003), which is based on Data-Oriented Parsing (DOP) (e.g. (Bod, 1992, Bod et al., 2003)) and combines examples, linguistic information and a statistical translation model. |
| | In this work, we seek to establish how the DOT model of translation relates to the other main MT methodologies currently in use. We find that this model differs from other hybrid models of MT in that it inextricably interweaves the philosophies of the rule-based, example-based and statistical approaches in an integrated framework. |
| | We look to the innovative solutions developed to meet the challenges of implementing the DOP model, and investigate their application to DOT. This investigation culminates in the development of a DOT system; this system allows us to perform translation experiments which are on a larger scale and incorporate greater translational complexity than heretofore. Our evaluation indicates that the positive characteristics of the model identified on a theoretical level are also in evidence when it is subjected to empirical assessment. For example, in terms of exact match accuracy, the DOT model outperforms an SMT model trained and tested on the same data by up to 89.73%. |
| | The DOP and DOT models for which we provide empirical evaluations assume context-free phrase-structure tree representations. However, such models can also be developed for more sophisticated linguistic formalisms. In this work, we look at and expand on the efforts which have been made to integrate the representations of Lexical-Functional Grammar (LFG) with DOP and DOT. |
| |  |
|