-- current projects --
EuroMatrix+
Panacea
CoSyne
PLuTO
T4ME
Confident MT
-- completed projects --
Prospect
Attempt
Sign language translation
Evaluation
Transbooster
DVD subtitling
LFG-DOP
EBMT & Marker Hypothesis
DOP & DOT
Hybrid EBMT-SMT
|
| Title: | GF-DOP: Grammatical Feature Data-Oriented Parsing | |
| Duration: | October 1st 2004 - September 30st 2006 |
| Funded by: | IRCSET's Embark Initiative |
| People: | Ríona Finn, Mary Hearne, Josef van Genabith, Andy Way |
| Description: | Data-Oriented Parsing (DOP) is a hybrid, language-independent, parsing formalism. Combining rules, statistics and linguistics, all parsing knowledge is learned from existing texts. However, the expressive power of the DOP model is limited by the corpus representations it assumes. DOP makes use of context-free phrase-structure trees which characterise phrasal and sentential syntax, but cannot reflect linguistic phenomena at deeper levels. The integration of Lexical Functional Grammar (LFG), which is known to be beyond context-free, enables a development of a model which produces more linguistically detailed descriptions of language, known as LFG-DOP (Lexical Functional Grammar DOP). Because of difficulties inherent in building an LFG-DOP system whose probability model corresponds to the probability distribution of derivations, there are currently no satisfactory implementations. |
| | The GF-DOP model can be seen as an extension of the Tree-DOP model, and an approximation towards LFG-DOP. It combines the robustness of the DOP model with some of the linguistic competence of LFG. This model exploits a corpus of annotated c-structures: features are extracted from f-structures and appended to the c-structure category labels. We aim to accurately identify constituent features and functions, and improve the quality of c-structures generated by modelling this grammatical information. |
| | The GF-DOP model improves over the Tree-DOP model in that it uses additional grammatical information to rule out derivations which the Tree-DOP model would consider valid, as can be seen in the example below: |
| |  |
| |  |
| | One weakness of the GF-DOP model is that it may be slightly less robust than the Tree-DOP model, but if necessary, we can back off to unannotated fragments to generate a parse. |
|