Search Descriptions

Main Topics

Search Publications


author

title

other

year

Data

The key data resources for statistical machine translation are parallel corpora, which are sentence aligned. Other low-level data preparation issues are splitting sentences into words (tokenization or segmentation), spelling correction, and truecasing (handling lowercase/uppercase).

Data and its 11 sub-topics are the main subject of 354 publications.

Publications

Benchmarks

Discussion

New Publications

  • Pavlick et al. (2014)
  • Ayd\in (2014)
  • Ayd\in (2014)
  • Lewis and Eetemadi (2013)
  • Kurokawa et al. (2009)
  • Zhu et al. (2007)

Actions

Download

Contribute