Neural machine Translation
Statistical Machine Translation
Building machine translation systems for under-resourced languages or in the face of sparse data conditions for other reasons, is a special challenge, and may require special methods.
Sparse Data is the main subject of 15 publications. 11 are discussed here.
Topics in DataParallel Corpora | Comparable Corpora | Dictionaries | Corpus Cleaning | Sentence Alignment | Truecasing | Word Segmentation | Spelling Correction | Sparse Data | Pivot Languages | Domain Adaptation
Sparse data increases the problem of Unknown Words, which may be replaced by Paraphrasing. If training data into a bridge language is available, such Pivot Languages can be exploited. The need to make use of any available data resources, even Comparable Corpora, is more urgent.
In general, since many methods in statistical machine translations are geared towards making effective use of the training data, they will be more likely make a difference in a sparse data scenario.