Shared Task: Exploiting Parallel Texts for Statistical Machine Translation

June 30, 2005, in conjunction with ACL 2005 in Ann Arbor, Michigan

Test, reference data and results are now available!

The second shared task of the workshop is to build a probabilistic phrase translation table for phrase-based statistical machine translation (SMT). Evaluation is translation quality on an unseen test set. We provide a parallel corpus as training data (with word alignment), a statistical machine translation decoder, and additional resources. Participants who use their own system are also very welcome.


Phrase-based SMT is currently the best performing method in statistical machine translation. In short, the input is segmented into arbitrary multi-word units ("phrases", "segments", "blocks", "clumps"). Each of the units is translated into a target language unit. The units may be reordered. Here an example:

The core of a phrase-based statistical machine translation system is the phrase translation table: a lexicon of phrases that translate into each other, with a probability distribution, or any other arbitrary scoring method. The phrase translation table is trained from a parallel corpus.

You can find some more information on phrase-based SMT in the paper Statistical Phrase-Based Translation or the manual for the Pharaoh decoder.


The goals of staging this shared task are: We hope that both beginners and established research groups will participate in this task.

Task Description

We provide training data for four European language pairs, and a common framework (including a language model and a decoder). The task is to learn a phrase translation table. Given the provided framework, this table is used to translate a test set of unseen sentences in the source language. The translation quality is measured by the BLEU score, which measures overlap with a reference translation.

To have a common framework that allows for comparable results, and also to lower the barrier to entry, we provide

Optionally, you may use Most current methods to train phrase translation tables build on a word alignment (i.e., the mapping of each word in the source sentence to words in the target sentence). Since word alignment is by itself a difficult task, we provide word alignments. These word alignments are acquired by automatic methods, hence they contain errors. You may get better performance by coming up with your own word alignment.

We also encourage your participation, if you use

Your submission report should highlight in which ways your own methods and data differ from the standard task. We may break down submitted results in different tracks, based on what resources were used.

Provided Data

The provided data is taken from the Europarl corpus, which is freely available. Please click on the links below to download the data. Note that the training data is not lowercased. This may be useful for tagging and parsing tools. However, the phrase translation tables and language model use lowercased text. Since the provided development test set and final test set are mixed-cased, they have to be lowercased before translating.

Available Software (Linux)

Development Test Data

To test your phrase table during development, we provide a development test set of 2000 sentences.

Test Data

This is the official test data. The official competition is over, but you may use the data for testing your own system.

Top Performances on Test Data

The test and training data will be kept available. You may want to compare your system with the results in the official competition. If you want to have your system score reported here, you must have it published in a workshop, conference, or journal paper, so we can link to it. Please send an email to


SystemBLEU1/2/3/4-gram precision
uw 30.2764.8/36.8/23.8/16.0 (BP=0.981)
upc-r 30.2063.9/36.2/23.3/15.6 (BP=0.998)
nrc 29.5363.7/35.8/22.7/14.9 (BP=0.997)
rali 28.8962.6/34.7/22.0/14.6 (BP=1.000)
cmu-bing27.6563.1/34.0/20.9/13.3 (BP=0.995)
cmu-joy 26.7161.9/33.0/20.3/13.1 (BP=0.984)
saar 26.2960.8/32.5/20.1/12.9 (BP=0.982)
glasgow 23.0157.3/28.0/16.7/10.5 (BP=1.000)
uji 21.2559.8/27.7/14.8/8.3 (BP=1.000)
cots1 20.2955.5/26.4/14.2/8.1 (BP=1.000)
cots2 17.8253.0/23.6/12.1/6.6 (BP=1.000)


SystemBLEU1/2/3/4-gram precision
uw 22.0159.0/28.6/16.1/9.4 (BP=0.979)
nrc 20.9557.8/27.2/14.8/8.4 (BP=0.996)
upc-r 20.3156.6/26.0/14.3/8.3 (BP=0.993)
rali 18.8755.2/24.7/13.1/7.1 (BP=0.998)
saar 16.7658.4/26.3/14.2/8.0 (BP=0.819)
uji 13.7960.0/23.2/10.8/5.3 (BP=0.821)
cmu-joy 12.6653.9/21.7/10.7/5.7 (BP=0.775)


SystemBLEU1/2/3/4-gram precision
uw 30.9564.1/36.6/24.0/16.3 (BP=1.000)
upc-r 30.0763.1/35.8/23.2/15.6 (BP=1.000)
upc-m 29.8463.9/35.5/23.0/15.5 (BP=0.995)
nrc 29.0862.7/34.9/22.2/14.7 (BP=1.000)
rali 28.4962.4/34.5/21.9/14.4 (BP=0.992)
upc-j 28.1361.5/33.8/21.4/14.1 (BP=1.000)
saar 26.6961.0/33.1/20.7/13.5 (BP=0.973)
cmu-joy26.1461.2/32.4/19.8/12.6 (BP=0.986)
uji 21.6559.7/27.8/15.2/8.7 (BP=1.000)
cots1 17.3852.7/23.1/11.7/6.4 (BP=1.000)
cots2 17.2852.2/23.0/11.7/6.4 (BP=1.000)


SystemBLEU1/2/3/4-gram precision
uw 24.7762.2/31.8/18.8/11.7 (BP=0.965)
upc-r 24.2659.7/30.1/17.6/11.0 (BP=1.000)
nrc 23.2160.3/29.8/17.1/10.3 (BP=0.979)
rali 22.9158.9/29.0/16.8/10.3 (BP=0.982)
saar 20.4858.0/27.5/15.5/9.2 (BP=0.938)
cmu-joy18.9359.2/26.8/14.3/8.1 (BP=0.914)
uji 18.8959.3/25.5/13.0/7.2 (BP=0.976)
cots1 14.9251.6/20.7/9.7/4.8 (BP=1.000)
cots2 13.9749.9/19.5/8.9/4.4 (BP=1.000)

Participating Teams

How to Get Started

Here some quick steps to get started. We walk you through the process of downloading tools and data for French-English, and how to run the decoder with it. Click on the links to get the necessary software and data.

The phrase translation table is very big (1.3 GB), which may pose problems for running the decoder on a machine with little RAM. However, for a given test corpus, only a fraction of that table is needed. The script run-filtered-pharaoh.perl filters the phrase table for needed entries and runs the decoder on a filtered translation table (104 MB).

Its syntax is:

For instance:
run-filtered-pharaoh.perl pharaoh "-monotone" >

You are now set to download the parallel corpora and build your own phrase translation table.


April 3: Test data released (available on this web site)
April 10: Results submissions (by email to
April 17: Short paper submissions (4 pages)


Philipp Koehn (University of Edinburgh)
Christof Monz (University of Maryland)