Statistical and Neural Machine Translation
This website contains resources for research in statistical and neural machine
translation, i.e. the translation of text from one human language to
another by a computer that learned how to translate from vast amounts
of translated text.
Events
- Conference on machine translation:
2022,
2021,
2020,
2019,
2018,
2017,
2016.
- Workshop on machine translation:
2015.
2014.
2013.
2012.
2011.
2010.
2009.
2008.
2007.
2006.
- Workshop on building and using parallel text 2015
- Machine Translation Marathon:
2022,
2019,
2018,
2017,
2016,
2015,
2014,
2013,
2012,
2011b,
2011a,
2010,
2009,
2008,
2007.
- Machine Translation Marathon of the Americas:
2022,
2019,
2018,
2017,
2016,
2015.
Resources
- Textbook: Neural Machine Translation (2020)
- Textbook: Statistical Machine Translation (2010)
- Moses statistical machine translation toolkit
- Machine Translation Research Survey Wiki
- Proceedings of the European Parliament Proceedings (Europarl)
- 1 Billion Word Language Model Benchmark
- News Commentary
- N-gram counts and language models from the CommonCrawl (2014)
- SIGIR 2020 Tutorial: Searching the Web for Cross-lingual Web Data
- Data for "On the Impact of Various Types of Noise on Neural Machine Translation" (2018)
- Early Release of Parallel Data of Paracrawl (2016)
- Benchmark data for "Paracrawl: Web-Scale Acquisition of Parallel Corpora" (2020)
- Code and data for "Simulated Multiple Reference Training (SMRT) Improves Low-Resource Machine Translation" (2020)
- Parallel Named Entity Corpus for "XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment" (2021)
- Data for "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings" (2017)
- Daat for experiments on context-aware neural machine translation (2018)
- CC-100: Monolingual data used to train XLM-R extracted from CommonCrawl (2020)
- CC-Matrix
- Translation Service Containers for the European Language Grid
- Monolingual News Crawl used for WMT
- Monolingual News Discussions used for WMT 2020
- Data for "PMIndia - A Collection of Parallel Corpora of Languages of India" (2020)
- PRISM: Data for "Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing" (2020)
- Wikititles used for WMT
- University of Edinburgh's models from WMT 2020,
2019,
2017,
2016.
- Data resources for WMT 2022,
2021,
2020,
2019,
2018,
2017,
2016,
2015,
2013.
- CC-Aligned: A Massive Collection of Cross-lingual Web-Document Pairs (2020)
- Resources for the paper "When Does Unsupervised Machine Translation Work?" (Marchisio et al., 2020)
- Wiki of the Machine Translation Research Group at Johns Hopkins University
External Historic Links: Introduction to Statistical MT Research
- The Mathematics of Statistical Machine Translation by Brown, Della Petra, Della Pietra, and Mercer
- Statistical MT Handbook by Kevin Knight
- SMT Tutorial (2003) by Kevin Knight and Philipp Koehn
- ESSLLI Summer Course on SMT (2005), day1,
2,
3,
4,
5 by Chris Callison-Burch and Philipp Koehn.
- MT Archive by John Hutchins, electronic repository and bibliography of articles, books and papers on topics in machine translation and computer-based translation tools
External Historic Software
External Parallel Corpora
maintained by Philipp Koehn
|