My PhD research

I built a dependency treebank for the Irish language. The dependency labelling scheme is LFG-inspired, similar to that of Cetinoglu et al. (2010), with some amendments to suit the Irish language.

A treebank is a corpus (large collection of text) that has been annotated (tagged) with linguistic information regarding the structure of the sentences. Treebanks are valuable linguistic resources and are used as 'training data' by data-driven parsers that learn syntactical (structural) patterns from the annotated text through statistical methods.

As Irish is an under-resourced minority language, I took a bootstrapping approach to building a treebank and hope to have annotated 3,000 sentences in phase 1. The treebank was then used to induce a statistical parser for Irish.

The intended application of this work is future development of Irish language e-learning and CALL (Computer-Aided Language Learning) Systems, Hybrid English-Irish Machine Translation Systems, improved Irish language Search Engines and Irish language Proofing Tools.

My work was based on the research and Irish NLP (Natural Language Processing) tools developed by Elaine Uí Dhonnchadha of Trinity College Dublin. It involved extensive research on Irish syntax, treebank development, annotation scheme design, parser induction and empirical methods.

My PhD supervisers were:

  • Josef van Genabith, Dublin City University
  • Jennifer Foster, Dublin City University
  • Mark Dras at Macquarie University, Sydney