Neural machine Translation
Statistical Machine Translation
The key data resources for statistical machine translation are parallel corpora, which are sentence aligned. Other low-level data preparation issues are splitting sentences into words (tokenization or segmentation), spelling correction, and truecasing (handling lowercase/uppercase).
Data and its 11 sub-topics are the main subject of 423 publications.
Topics in DataParallel Corpora | Comparable Corpora | Dictionaries | Corpus Cleaning | Sentence Alignment | Truecasing | Word Segmentation | Spelling Correction | Sparse Data | Pivot Languages | Domain Adaptation