ACL 2014 Ninth Workshop on Statistical Machine Translation

ACL 2014
NINTH WORKSHOP ON
STATISTICAL MACHINE TRANSLATION

26-27 June 2014
Baltimore, USA

This workshop builds on eight previous workshops on statistical machine translation, which is one of the most prestigious venues for research in computational linguistics:

IMPORTANT DATES

Release of training data for translation task	Early December 2013
Release of training data for quality estimation task	January 15, 2014
Test set distributed for translation task	February 24, 2014
Submission deadline for translation task	February 28, 2014
System outputs distributed for metrics task	March 7, 2014
Test sets distributed for quality estimation task	March 7, 2014
Submission deadline for metrics task	March 28, 2014
Submission deadline for quality estimation task	April 1, 2014
Start of manual evaluation period	March 11, 2014
End of manual evaluation	April 1, 2014
Paper submission deadline	April 1, 2014
Notification of acceptance	April 21, 2014
Camera-ready deadline	April 28, 2014

OVERVIEW

This year's workshop will feature five shared tasks:

a translation task,
a quality estimation task,
a task to test automatic evaluation metrics,
a medical text translation task

In addition to the shared tasks, the workshop will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to:

word-based, phrase-based, syntax-based, semantics-based SMT
using comparable corpora for SMT
incorporating linguistic information into SMT
decoding
system combination
error analysis
manual and automatic method for evaluating MT
scaling MT to very large data sets

We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared tasks.

TRANSLATION TASK

The first shared task which will examine translation between the following language pairs:

English-German and German-English
English-French and French-English
English-Hindi and Hindi-English NEW
English-Czech and Czech-English
English-Russian and Russian-English

Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora.

All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared tasks to manually judge some of the submitted translations. For each team, this will amount to ranking 300 sets of 5 translations, per language pair submitted.

We also provide baseline machine translation systems, with performance comparable to the best systems from last year's shared task.

QUALITY ESTIMATION TASK

A topic of increasing interest in MT is that of estimating the quality of translated texts. Different from MT evaluation, quality estimation (QE) systems do not rely on reference translations, but rather predict the quality of an unseen translated text (document, sentence, phrase) at system run-time. This topic is particularly relevant from a user perspective: among other applications, it can (i) help decide whether a given translation is good enough for publishing as is (Soricut and Echihabi, 2010); (ii) filter out sentences that are not good enough for post-editing (Specia, 2011); (iii) select the best translation among options from multiple MT and/or translation memory systems (He et al., 2010); and (iv) inform readers of the target language of whether or not they can rely on a translation (Specia et al., 2011).

Although still very recent, research in this topic has been showing promising results in the last couple of years. However, efforts are scattered around several groups and, as a consequence, comparing different systems is difficult as there are neither well established baselines nor standard evaluation metrics. In the Quality-Estimation track of the WMT workshop and shared-task, we will provide training and test sets, along with evaluation metrics and a baseline system. By providing a common ground for development and comparison, we expect to foster research in the topic, as well as to attract new people interested in the subject, who can build and evaluate new solutions using the provided resources.

EVALUATION TASK

The evaluation task will assess automatic evaluation metrics' ability to:

Rank systems on their overall performance on the test set
Rank systems on a sentence by sentence level

Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the system combination task. They will be provided with the output from the other two shared tasks along with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments.

MEDICAL TEXT TRANSLATION TASK

See here.

PAPER SUBMISSION INFORMATION

Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references, formatted following the ACL 2013 guidelines. In addition, shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically. Note that regular papers must be anonymized, while system descriptions do not need to be.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora.

POSTER FORMAT

The posters will be attached to self standing posterboards measuring 3 ft high and 4 ft wide and sitting on top of tables so there will be laptop/handout space as well. We will provide pushpins, double-sided tape, that putty-like substance, and clips to affix the posters to the posterboards.

ANNOUNCEMENTS

Subscribe to to the announcement list for WMT14 by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.

Email:

You can read past announcements on the Google Groups page for WMT. These also include an archive of annoucements from earlier workshops.

INVITED TALK

TBC

ORGANIZERS

Ondřej Bojar (Charles University in Prague)
Christian Buck (University of Edinburgh)
Christian Federmann (MSR)
Barry Haddow (University of Edinburgh)
Philipp Koehn (University of Edinburgh / Johns Hopkins University)
Matouš Macháček (Charles University in Prague)
Christof Monz (University of Amsterdam)
Pavel Pecina (Charles University in Prague)
Matt Post (Johns Hopkins University)
Herve Saint-Amand (University of Edinburgh)
Radu Soricut (Google)
Lucia Specia (University of Sheffield)

PROGRAM COMMITTEE

Lars Ahrenberg (Linköping University)
Alexander Allauzen (Universite Paris-Sud / LIMSI-CNRS)
Tim Anderson (Air Force Research Laboratory)
Eleftherios Avramidis (German Research Center for Artificial Intelligence)
Wilker Aziz (University of Sheffield)
Daniel Beck (University of Sheffield)
Jose Miguel Benedi (Universitàt Politecnica de València)
Nicola Bertoldi (FBK)
Alexandra Birch (University of Edinburgh)
Arianna Bisazza (University of Amsterdam)
Graeme Blackwood (IBM Research)
Phil Blunsom (University of Oxford)
Fabienne Braune (University of Stuttgart)
Chris Brockett (Microsoft Research)
Hailong Cao (Harbin Institute of Technology)
Michael Carl (Copenhagen Business School)
Marine Carpuat (National Research Council)
Francisco Casacuberta (Universitat Politècnica de València)
Daniel Cer (Google)
Boxing Chen (NRC)
Colin Cherry (NRC)
David Chiang (USC/ISI)
Vishal Chowdhary (Microsoft)
Steve DeNeefe (SDL Language Weaver)
Michael Denkowski (Carnegie Mellon University)
Jacob Devlin (Raytheon BBN Technologies)
Markus Dreyer (SDL Language Weaver)
Kevin Duh (Nara Institute of Science and Technology)
Marcello Federico (FBK)
Yang Feng (USC/ISI)
Andrew Finch (NICT)
Mark Fishel (University of Zurich)
Jose A. R. Fonollosa (Universitat Politecnica de Catalunya)
George Foster (NRC)
Michel Galley (Microsoft Research)
Juri Ganitkevitch (Johns Hopkins University)
Katya Garmash (University of Amsterdam)
Josef van Genabith (Dublin City University)
Ulrich Germann (University of Edinburgh)
Daniel Gildea (University of Rochester)
Kevin Gimpel (Toyota Technological Institute at Chicago)
Jesús Gonzalez-Rubio (Universitat Politecnica de València)
Yvette Graham (The University of Melbourne)
Spence Green (Stanford University)
Francisco Guzmán (Qatar Computing Research Institute)
Greg Hanneman (Carnegie Mellon University)
Christian Hardmeier (Uppsala universitet)
Eva Hasler (University of Edinburgh)
Yifan He (New York University)
Kenneth Heafield (Stanford)
John Henderson (MITRE)
Felix Hieber (Heidelberg University)
Hieu Hoang (University of Edinburgh)
Stephane Huet (Universite d'Avignon)
Young-Sook Hwang (SKPlanet)
Gonzalo Iglesias (University of Cambridge)
Ann Irvine (Johns Hopkins University)
Abe Ittycheriah (IBM)
Laura Jehl (Heidelberg University)
Doug Jones (MIT Lincoln Laboratory)
Maxim Khalilov (BMMT)
Alexander Koller (University of Potsdam)
Roland Kuhn (National Research Council of Canada)
Shankar Kumar (Google)
Mathias Lambert (Amazon.com)
Phillippe Langlais (Université de Montréal)
Alon Lavie (Carnegie Mellon University)
Gennadi Lembersky (NICE Systems)
William Lewis (Microsoft Research)
Lemao Liu (The City University of New York)
Qun Liu (Dublin City University)
Wolfgang Macherey (Google)
Saab Mansour (RWTH Aachen University)
José B. Mariño (Universitat Politècnica de Catalunya)
Cettolo Mauro (FBK)
Arne Mauser (Google, Inc)
Jon May (SDL Language Weaver)
Wolfgang Menzel (Hamburg University)
Shachar Mirkin (Xerox Research Centre Europe)
Yusuke Miyao (National Instutite of Informatics)
Dragos Munteanu (SDL Language Technologies)
Markos Mylonakis (Lexis Research)
Lluis Marquez (Qatar Computing Research Institute)
Preslav Nakov (Qatar Computing Research Institute)
Graham Neubig (Nara Institute of Science and Technology)
Jan Niehues (Karlsruhe Institute of Technology)
Kemal Oflazer (Carnegie Mellon University - Qatar)
Daniel Ortiz-Martinez (Copenhagen Business School)
Stephan Peitz (RWTH Aachen University)
Sergio Penkale (Lingo24)
Maja Popovic (DFKI)
Stefan Riezler (Heidelberg University)
Johann Roturier (Symantec)
Raphael Rubino (Prompsit Language Engineering)
Alexander M. Rush (MIT)
Anoop Sarkar (Simon Fraser University)
Hassan Sawaf (eBay Inc.)
Lane Schwartz (Air Force Research Laboratory)
Jean Senellart (SYSTRAN)
Rico Sennrich (University of Zurich)
Kashif Shah (University of Sheffield)
Wade Shen (MIT)
Patrick Simianer (Heidelberg University)
Linfeng Song (ICT/CAS)
Sara Stymne (Uppsala University)
Katsuhito Sudoh (NTT Communication Science Laboratories / Kyoto University)
Felipe Sanchez-Martínez (Universitat d'Alacant)
Jörg Tiedemann (Uppsala University)
Christoph Tillmann (TJ Watson IBM Research)
Antonio Toral (Dublin City Unversity)
Hajime Tsukada (NTT Communication Science Laboratories)
Yulia Tsvetkov (Carnegie Mellon University)
Dan Tufis (Research Institute for Artificial Intelligence, Romanian Academy)
Marco Turchi (Fondazione Bruno Kessler)
Ferhan Ture (University of Maryland)
Masao Utiyama (NICT)
Ashish Vaswani (University of Southern California Information Sciences Institute)
David Vilar (Pixformance GmbH)
Haifeng Wang (Baidu)
Taro Watanabe (NICT)
Marion Weller (University of Stuttgart)
Philip Williams (University of Edinburgh)
Guillaume Wisniewski (Univ. Paris Sud and LIMSI-CNRS)
Hua Wu (Baidu)
Joern Wuebker (RWTH Aachen University)
Peng Xu (Google Inc.)
Wenduan Xu (Cambridge University)
Francois Yvon (LIMSI/CNRS)
Richard Zens (Google)
Hao Zhang (Google)
Liu Zhanyi (Baidu)

CONTACT

For questions, comments, etc. please send email to pkoehn@inf.ed.ac.uk.

Supported by the European Commision
under the
project (grant number 288487)