The starting point of this project was the development of a sub-tree alignment tool for the automatic induction of translational equivalence links between pairs of trees. This allows for the automatic construction of parallel treebanks which can subsequentially be used in a number of NLP applications such as, most significantly for us, training data for MT.
Following on from this, we've incorporated syntax-based phrases extracted from automatically built parallel treebanks into state-of-the-art phrase-based SMT systems and shown them to significantly improve translation accuracy. More recent work has been focussing on optimising the use of parallel treebanks to this effect.
Work is also being carried out on extending the sub-tree aligner to align not only pairs of trees but also string-tree, tree-string and string-string pairs as well as further optimisation of the tree-to-tree aligner itself
Some work has also been carried out in terms of using the aligner to align dependency structures and investigating the relative merits of using dependency and phrase-structure based phrase pairs in phrase-based SMT.