-- current projects --
EuroMatrix+
Panacea
CoSyne
PLuTO
T4ME
Confident MT
-- completed projects --
Prospect
Attempt
Sign language translation
Evaluation
Transbooster
DVD subtitling
LFG-DOP
EBMT & Marker Hypothesis
DOP & DOT
Hybrid EBMT-SMT
|
| Title: | Example-Based Machine Translation using the Marker Hypothesis | |
| Duration: | October 1st 2001 - September 30th 2004 |
| Funded by: | DCU School of Computing Scholarship; IBM Fellowship | |
| People: | Nano Gough, Andy Way |
| Description: | This work centres around the development of a linguistics-lite EBMT system which has no recourse to extensive linguistic resources. We apply the Marker Hypothesis (Green, 1979), which is a psycholinguistic theory stating that all natural languages are 'marked' for complex syntactic structure at surface form by a closed set of specific lexemes and morphemes. We use this technique to segment aligned sentence pairs, and then apply an alignment algorithm which deduces smaller aligned chunk and word pairs. We generalise these alignments by replacing certain function words with an associated tag, clustering on marker words and thus adding flexibility to our matching process. In a post-hoc stage we treat the World Wide Web as a large corpus and validate and correct instances of determiner-noun and noun-verb boundary friction. We have applied this system to a variety of bitexts and achieved competitive results. |
| | It has been suggested that EBMT is more suited to controlled translation than RBMT as it has been known to overcome the 'knowledge acquisition bottleneck'. To this end, we developed the first controlled EBMT system and used it to perform experiments in controlled analysis and generation. We show that our controlled EBMT system can outperform an RBMT system, and also an SMT system trained and tested on the same data. |
| |  |
|