Search Descriptions

Main Topics

Search Publications





Suffix Array Translation Models

Large translation models take a long time to train and often exceed the available working memory of current machines. Storing the word aligned parallel corpus in a suffix array and retrieving translation options on demand offer an alternative.

Suffix Arrays is the main subject of 11 publications.


The translation table may be represented in a suffix array as proposed for a searchable translation memory (Callison-Burch et al., 2005) and integrated into the decoder (Zhang and Vogel, 2005). Callison-Burch et al. (2005) propose a suffix-tree structure to keep corpora in memory and extract phrase-translations on the fly.
Suffix arrays may also be used to quickly learn phrase alignments from a parallel corpus without the use of a word alignment (McNamee and Mayfield, 2006). Related to this is the idea of prefix data structures for the translation which allow quicker access and storing the model on disk for on-demand retrieval of applicable translation options (Zens and Ney, 2007).
Hierarchical phrase based models may also be stored in such a way (Lopez, 2007) and allow for much bigger models (Lopez, 2008).



Related Topics

New Publications

  • Germann (2015)
  • Denkowski et al. (2014)
  • Germann (2014)
  • Cromieres and Kurohashi (2011)