| Statistical and Neural Machine TranslationThis website contains resources for research in statistical and neural machine
translation, i.e. the translation of text from one human language to
another by a computer that learned how to translate from vast amounts
of translated text.
 EventsConference on machine translation:
2022,
2021,
2020,
2019,
2018,
2017,
2016.
Workshop on machine translation:
2015.
2014.
2013.
2012.
2011.
2010.
2009.
2008.
2007.
2006.
Workshop on building and using parallel text 2015
Machine Translation Marathon:
2022,
2019,
2018,
2017,
2016,
2015,
2014,
2013,
2012,
2011b,
2011a,
2010,
2009,
2008,
2007.
Machine Translation Marathon of the Americas:
2022,
2019,
2018,
2017,
2016,
2015.
 ResourcesTextbook: Neural Machine Translation (2020)
Textbook: Statistical Machine Translation (2010)
Moses statistical machine translation toolkit
Machine Translation Research Survey Wiki
Proceedings of the European Parliament Proceedings (Europarl)
1 Billion Word Language Model Benchmark
News Commentary
N-gram counts and language models from the CommonCrawl (2014)
SIGIR 2020 Tutorial: Searching the Web for Cross-lingual Web Data
Data for "On the Impact of Various Types of Noise on Neural Machine Translation" (2018)
Early Release of Parallel Data of Paracrawl (2016)
Benchmark data for "Paracrawl: Web-Scale Acquisition of Parallel Corpora" (2020)
Code and data for "Simulated Multiple Reference Training (SMRT) Improves Low-Resource Machine Translation" (2020)
Parallel Named Entity Corpus for "XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment" (2021)
Data for "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings" (2017)
Daat for experiments on context-aware neural machine translation (2018)
CC-100: Monolingual data used to train XLM-R extracted from CommonCrawl (2020)
CC-Matrix
Translation Service Containers for the European Language Grid
Monolingual News Crawl used for WMT
Monolingual News Discussions used for WMT 2020
Data for "PMIndia - A Collection of Parallel Corpora of Languages of India" (2020)
PRISM: Data for "Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing" (2020)
Wikititles used for WMT
University of Edinburgh's models from WMT 2020,
2019,
2017,
2016.
Data resources for WMT 2022,
2021,
2020,
2019,
2018,
2017,
2016,
2015,
2013.
CC-Aligned: A Massive Collection of Cross-lingual Web-Document Pairs (2020)
Resources for the paper "When Does Unsupervised Machine Translation Work?" (Marchisio et al., 2020)
Wiki of the Machine Translation Research Group at Johns Hopkins University
 External Historic Links: Introduction to Statistical MT Research The Mathematics of Statistical Machine Translation by Brown, Della Petra, Della Pietra, and Mercer
 Statistical MT Handbook by Kevin Knight
 SMT Tutorial (2003) by Kevin Knight and Philipp Koehn
 ESSLLI Summer Course on SMT (2005), day1,
2,
3,
4,
5 by Chris Callison-Burch and Philipp Koehn.
MT Archive by John Hutchins, electronic repository and bibliography of articles, books and papers on topics in machine translation and computer-based translation tools
 External Historic SoftwareExternal Parallel Corporamaintained by Philipp Koehn
 |