Title:Example-Based Machine Translation using the Marker Hypothesis
Duration:October 1st 2001 - September 30th 2004
Funded by:DCU School of Computing Scholarship; IBM Fellowship
People:Nano Gough, Andy Way
Description:This work centres around the development of a linguistics-lite EBMT system which has no recourse to extensive linguistic resources. We apply the Marker Hypothesis (Green, 1979), which is a psycholinguistic theory stating that all natural languages are 'marked' for complex syntactic structure at surface form by a closed set of specific lexemes and morphemes. We use this technique to segment aligned sentence pairs, and then apply an alignment algorithm which deduces smaller aligned chunk and word pairs. We generalise these alignments by replacing certain function words with an associated tag, clustering on marker words and thus adding flexibility to our matching process. In a post-hoc stage we treat the World Wide Web as a large corpus and validate and correct instances of determiner-noun and noun-verb boundary friction. We have applied this system to a variety of bitexts and achieved competitive results.
 It has been suggested that EBMT is more suited to controlled translation than RBMT as it has been known to overcome the 'knowledge acquisition bottleneck'. To this end, we developed the first controlled EBMT system and used it to perform experiments in controlled analysis and generation. We show that our controlled EBMT system can outperform an RBMT system, and also an SMT system trained and tested on the same data.
