Moses
statistical
machine translation
system

Parallel Corpora Available On-Line

This page is your 'shopping list' for parallel texts. Let us know if we're missing something.

  • We don't claim anything about copyright issues, make sure you don't break any restrictions.
  • We don't claim anything about alignment of the collections. Some sources might need more work from you, some might need less.

And remember, we're interested in any tools you create to get the clean data from not so clean collections.

Multi-language

Bi-language

Other

Besides collections mentioned above, LDC has heaps of data available.

Edit - History - Print
Page last modified on January 31, 2014, at 02:26 PM