NAACL 2012

June 7-8, 2012
Montreal, Quebec, Canada


This workshop builds on six previous workshops on statistical machine translation:

The workshop is sponsored by the ACL's special interest group in machine translation (SIGMT).


Release of training dataDecember 9, 2011
Test set distributed for translation taskFebruary 27, 2012
Submission deadline for translation taskMarch 2, 2012
System outputs distributed for metrics taskMarch 9, 2012
Submission deadline for metrics taskMarch 30, 2012
Paper submission deadlineApril 6, 2012
Start of manual evaluation periodApril 6, 2012
Notification of acceptanceApril 24, 2012
End of manual evaluationMay 1, 2012
Camera-ready deadlineMay 7, 2012
Papers available onlineJune 1, 2012
Workshop in Montreal following NAACLJune 7-8, 2012


This year's workshop will feature three shared tasks:

In addition to the shared tasks, the workshop will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to:

We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared tasks.


The first shared task which will examine translation between the following language pairs:

Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora.

All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared tasks to manually judge some of the submitted translations.

We also provide baseline machine translation systems, with performance comparable to the best systems from last year's shared task.


A topic of increasing interest in MT is that of estimating the quality of translated texts. Different from MT evaluation, quality estimation (QE) systems do not rely on reference translations, but rather predict the quality of an unseen translated text (document, sentence, phrase) at system run-time. This topic is particularly relevant from a user perspective: among other applications, it can (i) help decide whether a given translation is good enough for publishing as is (Soricut and Echihabi, 2010); (ii) filter out sentences that are not good enough for post-editing (Specia, 2011); (iii) select the best translation among options from multiple MT and/or translation memory systems (He et al., 2010); and (iv) inform readers of the target language of whether or not they can rely on a translation (Specia et al., 2011).

Although still very recent, research in this topic has been showing promising results in the last couple of years. However, efforts are scattered around several groups and, as a consequence, comparing different systems is difficult as there are neither well established baselines nor standard evaluation metrics. In the Quality-Estimation track of the WMT workshop and shared-task, we will provide training and test sets, along with evaluation metrics and a baseline system. By providing a common ground for development and comparison, we expect to foster research in the topic, as well as to attract new people interested in the subject, who can build and evaluate new solutions using the provided resources.


The evaluation task will assess automatic evaluation metrics' ability to:

Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the system combination task. They will be provided with the output from the other two shared tasks along with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments.


Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references, formatted following the NAACL 2012 guidelines. In addition, shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora.


Subscribe to to the announcement list for WMT12 by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.

You can read past announcements on the Google Groups page for WMT12. These also include an archive of annoucements from WMT10 and WMT11. Google Groups


Chris Callison-Burch (Johns Hopkins University)
Philipp Koehn (University of Edinburgh)
Christof Monz (University of Amsterdam)
Matt Post (Johns Hopkins University)
Radu Soricut (SDL Language Weaver)
Lucia Specia (University of Sheffield)



For questions, comments, etc. please send email to

supported by the EuroMatrixPlus project  
funded by the European Commission  
under Framework Programme 7