Search Descriptions

Main Topics

Search Publications





General Introduction

The field of statistical machine translation concerns itself with methods to automatically learn how to translate from translated texts (so-called parallel corpora).

Introduction and its 6 sub-topics are the main subject of 800 publications.


Hutchins (2007) gives a concise overview of the history of machine translation. Jelinek (2009) recalls the birth of statistical machine translation, and previously statistical speech recognition, at IBM. See also the famous ALPAC report (Pierce and Carroll, 1966). Gaspari and Hutchins (2007) reports on the recent rise of online machine translation services and usage patterns.
Recently, a textbook about the field was published (Koehn, 2010). A survey of work in statistical machine translation is presented by Lopez (2008). For non-statistical methods to machine translation, refer to the books by Arnold et al. (1994) and by Hutchins and Somers (1992).
A good introduction into probability theory and information is given by Cover and Thomas (1991). For an application of probabilistic methods to the related field of speech recognition, see the book by Jelinek (1998).
There are several textbooks on natural language processing that may serve as background to the material presented here. Good general introductions are given by Manning and Schütze (1999) as well as Jurafsky and Martin (2008).


Each year, a few evaluation campaigns are staged whose aim is to assess the validity of novel methods in competitive systems.


New Publications

  • Morrissey and Way (2013)
  • Lopez et al. (2013)
  • Lopez et al. (2013)
  • Cancedda (2012)
  • Somers (1992)
  • Church and Hovy (1993)