Suffix Array Translation Models
Large translation models take a long time to train and often exceed the available working memory of current machines. Storing the word aligned parallel corpus in a suffix array and retrieving translation options on demand offer an alternative.
Suffix Arrays is the main subject of 11 publications.
The translation table may be represented in a suffix array as proposed for a searchable translation memory (Callison-Burch et al., 2005)
and integrated into the decoder (Zhang and Vogel, 2005)
. Callison-Burch et al. (2005)
propose a suffix-tree structure to keep corpora in memory and extract phrase-translations on the fly.
Suffix arrays may also be used to quickly learn phrase alignments from a parallel corpus without the use of a word alignment (McNamee and Mayfield, 2006)
. Related to this is the idea of prefix data structures for the translation which allow quicker access and storing the model on disk for on-demand retrieval of applicable translation options (Zens and Ney, 2007)
Hierarchical phrase based models
may also be stored in such a way (Lopez, 2007)
and allow for much bigger models (Lopez, 2008)
- Germann (2015)
- Denkowski et al. (2014)
- Germann (2014)
- Cromieres and Kurohashi (2011)