Shared Task: Exploiting Parallel Texts for Statistical  Machine Translation

June 8 and 9, 2006, in conjunction with NAACL 2006 in New York City


The shared task of the workshop is to build a probabilistic phrase translation table for phrase-based statistical machine translation (SMT). Evaluation is translation quality on an unseen test set. We provide a parallel corpus as training data (with word alignment), a baseline statistical machine translation system, and additional resources. Participants may augment this system or use their own system.


The goals of staging this shared task are: We hope that both beginners and established research groups will participate in this task.

Task Description

We provide training data for three European language pairs, and a common framework (including a language model and a basline system). The task is to improve methods to build a phrase translation table (e.g. by better word alignment, phrase extraction, phrase scoring), augment the system otherwiese (e.g. by preprocessing), or build entirely new translation systems.

The participants' system is used to translate a test set of unseen sentences in the source language. The translation quality is measured by the BLEU score, which measures overlap with a reference translation, and manual evaluation. Participants agree to contribute to the manual evaluation about eight hours of work.

To have a common framework that allows for comparable results, and also to lower the barrier to entry, we provide

Optionally, you may use Most current methods to train phrase translation tables build on a word alignment (i.e., the mapping of each word in the source sentence to words in the target sentence). Since word alignment is by itself a difficult task, we provide word alignments. These word alignments are acquired by automatic methods, hence they contain errors. You may get better performance by coming up with your own word alignment.

We also strongly encourage your participation, if you use

Your submission report should highlight in which ways your own methods and data differ from the standard task. We may break down submitted results in different tracks, based on what resources were used.

Provided Data

The provided data is taken from the Europarl corpus, which is freely available. Please click on the links below to download the data. If you prepare training data from the Europarl corpus directly, please do not take data from Q4/2000 (October-December), since it is reserved for development and test data. Note that the training data is not lowercased. This may be useful for tagging and parsing tools. However, the phrase translation tables and language model use lowercased text. Since the provided development test set and final test set are mixed-cased, they have to be lowercased before translating.

Development Data

To tune your system during development, we provide a development set of 2000 sentences. This data is identical with the 2005 development test data.

Development Test Data

To test your system during development, we provide a development test set of 2000 sentences. This data is identical with the 2005 test data.

Test Data

To test your system, translate the following 3064 sentences and send the output per email to


Evaluation will be done both automatically as well as by human judgement.


March 20: Test data released (available on this web site)
March 31: Results submissions (by email to
April 7: Short paper submissions (4 pages)


Philipp Koehn (University of Edinburgh)
Christof Monz (University of London)