Machine Translation @ the National Centre for Language Technology

Dublin City University, Ireland

Home People Projects Publications Events Announcements Links
-- current projects --
  Confident MT

-- completed projects --
  Sign language

  DVD subtitling
  EBMT & Marker

  Hybrid EBMT-SMT
Title:GF-DOP: Grammatical Feature Data-Oriented Parsing
Duration:October 1st 2004 - September 30st 2006
Funded by:IRCSET's Embark Initiative
People:Ríona Finn, Mary Hearne, Josef van Genabith, Andy Way
Description:Data-Oriented Parsing (DOP) is a hybrid, language-independent, parsing formalism. Combining rules, statistics and linguistics, all parsing knowledge is learned from existing texts. However, the expressive power of the DOP model is limited by the corpus representations it assumes. DOP makes use of context-free phrase-structure trees which characterise phrasal and sentential syntax, but cannot reflect linguistic phenomena at deeper levels. The integration of Lexical Functional Grammar (LFG), which is known to be beyond context-free, enables a development of a model which produces more linguistically detailed descriptions of language, known as LFG-DOP (Lexical Functional Grammar DOP). Because of difficulties inherent in building an LFG-DOP system whose probability model corresponds to the probability distribution of derivations, there are currently no satisfactory implementations.
 The GF-DOP model can be seen as an extension of the Tree-DOP model, and an approximation towards LFG-DOP. It combines the robustness of the DOP model with some of the linguistic competence of LFG. This model exploits a corpus of annotated c-structures: features are extracted from f-structures and appended to the c-structure category labels. We aim to accurately identify constituent features and functions, and improve the quality of c-structures generated by modelling this grammatical information.
 The GF-DOP model improves over the Tree-DOP model in that it uses additional grammatical information to rule out derivations which the Tree-DOP model would consider valid, as can be seen in the example below:
 One weakness of the GF-DOP model is that it may be slightly less robust than the Tree-DOP model, but if necessary, we can back off to unannotated fragments to generate a parse.
Last update: June 16 2007
Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University