Generating Rich Morphology
Rich morphology is especially a problem on the target side, since choosing the right morphological variants depends on various factors (agreement constraints, grammatical gender). Often relevant information is distributed widely over the input sentence or miss altogether.
Generating Rich Morphology is the main subject of 22 publications.
Minkov et al. (2007)
use a maximum entropy model to generate rich Russian morphology and show improved performance over using the standard approach of relying on the language model. Such a model may be used for statistical machine translation by adjusting the inflections in a post-processing stage (Toutanova et al., 2008)
. Similarly, Fraser et al. (2012)
use a conditional random field model for each morphological feature for target-side lemmas in post-processing. Weller et al. (2013)
show that prediction of the case of German noun phrases can be improved by learning subcategorization frames for verbs.
Clifton and Sarkar (2011)
overcome the need for morphological analyzers in this approach by using unsupervised morphology induction and use automatically generated suffix classes as tags.
Chahuneau et al. (2013)
use a morphological prediction model to extend the phrase dictionary with inflected forms, initially for the insertion of determiners (Tsvetkov et al., 2013)
. This approach is available as a toolkit (Schlinger et al., 2013)
Translation between related morphologically rich related languages may model the lexical translation step as a morphological analysis, transfer and generation process using finite state tools (Tantug et al., 2007)
. But also splitting words into stem and morphemes is a valid strategy for translating into a language with rich morphology as demonstrated for English–Turkish (Oflazer and El-Kahlout, 2007)
and English–Arabic (Badr et al., 2008)
, and also for translating between two highly inflected languages as in the case of Turkman–Turkish language pairs (Tantug et al., 2007)
addresses translation of definite noun phrases into Scandinavian languages where definiteness is expressed either in forms of determiners or noun suffixes.
Translating unknown morphological variants may be learned by analogy to other morphological spelling variations (Langlais and Patry, 2007)
For very closely related languages such as Catalan and Spanish translating not chunks of words but chunks of letters in a phrase-based approach achieves decent results, and addresses very well the problem of unknown words (Vilar et al., 2007)
- Kirchhoff et al. (2015)
- Gandhe and Gangadharaiah (2013)
- Salameh et al. (2013)
- Al-Haj and Lavie (2010)
- Jeong et al. (2010)
- Kholy and Habash (2012)
- Gros and Gruden (2007)