Machine Translation @ the National Centre for Language Technology

Dublin City University, Ireland

Home People Projects Publications Events Announcements Links
-- current projects --
  Confident MT

-- completed projects --
  Sign language

  DVD subtitling
  EBMT & Marker

  Hybrid EBMT-SMT
Title:Example-based Sign Language Machine Translation
Duration:October 1st 2004 - September 30th 2007
Funded by:IBM-IRCSET Fellowship
People:Sara Morrissey, Andy Way
Description:Sign languages (SLs) are the first and preferred languages of the Deaf Community worldwide. As with other minority languages, they are often poorly resourced and in many cases lack political and social recognition. As with speakers of minority languages, Deaf people are often required to access documentation or communicate in a language that is not natural to them. In an attempt to alleviate this problem we are developing an example-based machine translation (EBMT) system to allow Deaf people to access information in the language of their choice. While some research exists on translating between natural and sign languages, we believe ours is the first attempt to tackle this problem using an EBMT approach.
 An EBMT approach necessitates the composition of a bilingual data set aligned sententially and sub-sententially using a predefined method. The lack of a formally adopted, or even recognised, writing system for SLs makes finding a dataset suited to our method difficult. Of the small few transcription methods available, we have chosen to use annotated video data to construct our bilingual corpus. An example of such data may be seen below where the video of SL utterances is present in the upper left corner and the respective annotations of this data presented horizontally below in correspondence with a timeline.
 The annotations are composed of a gloss for the articulations of the right and left hands with the possibility of including non-manual feature (NMF) details such as head nods and eyebrow movement that can alter the semantics of a sentence. One of the main advantages to using annotated data is that all features, (i.e. glosses, NMFs and phonetic description of the signs in terms of handshape, orientation etc.) can be included and temporally aligned. This allows for the annotations to be bound together according to their time frames to form chunks that can correspond to the chunks formed on the spoken language side of the text. The Marker Hypothesis is used to chunk the spoken language side of the texts. Despite the different chunking methods, manual examination of both chunk sets showed a large number of potentially alignable chunks are produced.
 We have developed an EBMT system using data in Dutch Sign Language/Nederlandse Gebarentaal (NGT). The dataset is composed of only 561 sentences of poetry and children's fables, a topic not suited to machine translation. For this reason we have created and developed a dataset of Irish Sign Language (ISL) videos with corresponding annotations three times the size of the NGT corpus and on the more suited closed domain topic of flight information queries.
 Currently output is in the form of the SL video annotations. In future work, we intend to make use of the phonetic details added to the annotations in combination with the glosses and NMFs to automatically produce sign language using an signing avatar like the one below.
Last update: June 16 2007
Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University