ACL 2016 First Conference on Machine Translation (WMT16)

Home

This conference builds on ten previous workshops on statistical machine translation:

IMPORTANT DATES

Release of training data for shared tasks	January, 2016
Evaluation periods for shared tasks	April, 2016
Paper submission deadline (Research Papers)	May 8, 2016
Paper submission deadline (System Papers)	May 15, 2016
Notification of acceptance	June 5, 2016
Camera-ready deadline	June 22, 2016
Conference in Berlin	August 11-12th, 2016

OVERVIEW

This year's conference will feature ten shared tasks:

a news translation task,
an IT domain translation task (NEW),
a biomedical translation task (NEW),
an automatic post-editing task,
a metrics task (assess MT quality given reference translation).
a quality estimation task (assess MT quality without access to any reference),
a tuning task (optimize a given MT system),
a pronoun translation task,
a bilingual document alignment task (NEW),
a multimodal translation task (NEW)

In addition to the shared tasks, the conference will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to:

word-based, phrase-based, syntax-based, semantics-based SMT
neural machine translation
using comparable corpora for SMT
incorporating linguistic information into SMT
decoding
system combination
error analysis
manual and automatic method for evaluating MT
scaling MT to very large data sets

We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared tasks.

REGISTRATION

Registration will be handled by ACL 2016.

NEWS TRANSLATION TASK

The first shared task which will examine translation between the following language pairs:

English-German and German-English
English-Finnish and Finnish-English
English-Czech and Czech-English
English-Romanian and Romanian-English NEW
English-Russian and Russian-English
English-Turkish and Turkish-English NEW

The text for all the test sets will be drawn from news articles. Participants may submit translations for any or all of the language directions. In addition to the common test sets the conference organizers will provide optional training resources.

All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared tasks to manually judge some of the submitted translations. For each team, this will amount to ranking 300 sets of 5 translations, per language pair submitted.

We also provide baseline machine translation systems, with performance comparable to the best systems from last year's shared task.

IT TRANSLATION TASK

This task focuses on domain adaptation of MT to the IT domain for the following languages pairs:

English-to-Bulgarian (EN-BG)
English-to-Czech (EN-CS)
English-to-German (EN-DE)
English-to-Spanish (EN-ES)
English-to-Basque (EN-EU)
English-to-Dutch (EN-NL)
English-to-Portuguese (EN-PT)

Parallel corpora (including in-domain training data) are available. Evaluation will be carried out both automatically and manually. See detailed information about the task.

BIOMEDICAL TRANSLATION TASK

In this first edition of this task, we will evaluate systems for the translation of scientific abstracts in biological and health sciences for the following languages pairs:

English-French and French-English
English-Spanish and Spanish-English
English-Portuguese and Portuguese-English

Parallel corpora will be available for the above language pairs but also monoligual corpora for each of the four languages. Evaluation will be carried out both automatically and manually.

AUTOMATIC POST-EDITING TASK

This shared task will examine automatic methods for correcting errors produced by machine translation (MT) systems. Automatic Post-editing (APE) aims at improving MT output in black box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view APE components would make it possible to:

Cope with systematic errors of an MT system whose decoding process is not accessible
Provide professional translators with improved MT output quality to reduce (human) post-editing effort

In this second edition of the task, the evaluation will focus on one language pair (English-German), measuring systems' capability to reduce the distance (HTER) that separates an automatic translation from its human-revised version approved for publication. This edition will focus on IT domain data, and will provide post-editions (of MT output) collected from professional translators.

METRICS TASK

The metrics task (also called evaluation task) will assess automatic evaluation metrics' ability to:

Rank systems on their overall performance on the test set
Rank systems on a sentence by sentence level

Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the tunable metrics task. In addition to MT outputs from the other two tasks, the participants will be provided with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments.

QUALITY ESTIMATION TASK

Quality estimation systems aim at producing an estimate on the quality of a given translation at system run-time, without access to a reference translation. This topic is particularly relevant from a user perspective. Among other applications, it can (i) help decide whether a given translation is good enough for publishing as is; (ii) filter out sentences that are not good enough for post-editing; (iii) select the best translation among options from multiple MT and/or translation memory systems; (iv) inform readers of the target language of whether or not they can rely on a translation; and (v) spot parts (words or phrases) of a translation that are potentially incorrect.

Research on this topic has been showing promising results in the last couple of years. Building on the last three years' experience, the Quality-Estimation track of the WMT15 workshop and shared-task will focus on English, Spanish and German as languages and provide new training and test sets, along with evaluation metrics and baseline systems for variants of the task at three different levels of prediction: word, sentence, and document.

TUNING TASK

This task will assess your team's ability to optimize the parameters of a given hierarchical MT system (Moses).

Participants in the tuning task will be given complete Moses models for English-to-Czech and Czech-to-English translation and the standard developments sets from the translation task. The participants are expected to submit the moses.ini for one or both of the translation directions. We will use the configuration and a fixed revision of Moses to translate official WMT15 test set. The outputs of the various configurations of the system will be scored using the standard manual evaluation procedure.

CROSS-LINGUAL PRONOUN PREDICTION TASK

Pronoun translation poses a problem for current state-of-the-art SMT systems as pronoun systems do not map well across languages, e.g., due to differences in gender, number, case, formality, or humanness, and to differences in where pronouns may be used. Translation divergences typically lead to mistakes in SMT, as when translating the English "it" into French ("il", "elle", or "cela"?) or into German ("er", "sie", or "es"?). One way to model pronoun translation is to treat it as a cross-lingual pronoun prediction task.

We propose such a task, which asks participants to predict a target-language pronoun given a source-language pronoun in the context of a sentence. We further provide a lemmatised target-language human-authored translation of the source sentence, and automatic word alignments between the source sentence words and the target-language lemmata. In the translation, the words aligned to a subset of the source-language third-person pronouns are substituted by placeholders. The aim of the task is to predict, for each placeholder, the word that should replace it from a small, closed set of classes, using any type of information that can be extracted from the documents.

The cross-lingual pronoun prediction task will be similar to the task of the same name at DiscoMT 2015:

http://www.idiap.ch/workshop/DiscoMT/shared-task

Participants are invited to submit systems for the English-French and English-German language pairs, for both directions.

BILINGUAL DOCUMENT ALIGNMENT TASK

The task is to identify pairs of English and French documents from a given collection of documents such that one document is the translation of the other. As possible pairs we consider all pairs of documents from the same webdomain for which the source side has been identified as (mostly) English and the target side as (mostly) French.

MULTIMODAL TRANSLATION TASK

This is a new task where participants are requested to generate a description for an image in a target language, given the image itself and one or more descriptions in a different (source) language.

PAPER SUBMISSION INFORMATION

Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references, formatted following the ACL 2016 guidelines. In addition, shared task participants will be invited to submit short papers (suggested length: 4-6 pages, plus references) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically. Note that regular papers must be anonymized, while system descriptions do not need to be.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this conference and past workshops, so that their experiments can be repeated by others using these publicly available corpora.

POSTER FORMAT

A0, vertical. For details on posters, please check with the local ACL organisers.

ANNOUNCEMENTS

Subscribe to to the announcement list for WMT by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.

Email:

You can read past announcements on the Google Groups page for WMT. These also include an archive of announcements from earlier workshops.

INVITED TALK

Spence Green (Lilt)
Interactive Machine Translation: From Research to Practice

ORGANIZERS

Ondřej Bojar (Charles University in Prague)
Christian Buck (University of Edinburgh)
Rajen Chatterjee (FBK)
Christian Federmann (MSR)
Liane Guillou (University of Edinburgh)
Barry Haddow (University of Edinburgh)
Matthias Huck (University of Edinburgh)
Antonio Jimeno Yepes (IBM Research Australia)
Aurélie Névéol (LIMSI, CNRS)
Mariana Neves (Hasso-Plattner Institute)
Pavel Pecina (Charles University in Prague)
Martin Popel (Charles University in Prague)
Philipp Koehn (University of Edinburgh / Johns Hopkins University)
Christof Monz (University of Amsterdam)
Matteo Negri (FBK)
Matt Post (Johns Hopkins University)
Lucia Specia (University of Sheffield)
Karin Verspoor (University of Melbourne)
Jörg Tiedemann (University of Helsinki)
Marco Turchi (FBK)

PROGRAM COMMITTEE

Lars Ahrenberg (Linköping University)
Alexander Allauzen (Université Paris-Sud / LIMSI-CNRS)
Tim Anderson (Air Force Research Laboratory)
Daniel Beck (University of Sheffield)
Jose Miguel Benedi (Universitàt Politecnica de València)
Nicola Bertoldi (FBK)
Alexandra Birch (University of Edinburgh)
Arianna Bisazza (University of Amsterdam)
Graeme Blackwood (IBM Research)
Frédéric Blain (University of Sheffield)
Fabienne Braune (University of Stuttgart)
Chris Brockett (Microsoft Research)
José G. C. de Souza (eBay Inc.)
Michael Carl (Copenhagen Business School)
Marine Carpuat (University of Maryland)
Francisco Casacuberta (Universitàt Politecnica de València)
Daniel Cer (Google)
Mauro Cettolo (FBK)
Rajen Chatterjee (Fondazione Bruno Kessler)
Boxing Chen (NRC)
Colin Cherry (NRC)
David Chiang (University of Notre Dame)
Eunah Cho (Karlsruhe Institute of Technology)
Kyunghyun Cho (New York University)
Vishal Chowdhary (Microsoft)
Praveen Dakwale (University of Amsterdam)
Steve DeNeefe (SDL Language Weaver)
Michael Denkowski (Amazon.com)
Jacob Devlin (Microsoft Research)
Markus Dreyer (Amazon.com)
Nadir Durrani (QCRI)
Marc Dymetman (Xerox Research Centre Europe)
Minwei Feng (IBM Watson Group)
Andrew Finch (NICT)
Orhan Firat (Middle East Technical University)
Marina Fomicheva (Universitat Pompeu Fabra)
José A. R. Fonollosa (Universitat Politècnica de Catalunya)
Mikel Forcada (Universitat d’Alacant)
George Foster (NRC)
Alexander Fraser (Ludwig-Maximilians-Universität München)
Markus Freitag (IBM Research)
Michel Galley (Microsoft Research)
Ekaterina Garmash (University of Amsterdam)
Daniel Gildea (University of Rochester)
Kevin Gimpel (Toyota Technological Institute at Chicago)
Jesús González-Rubio (Universitat Politécnica de Valéncia)
Francisco Guzmán (Qatar Computing Research Institute)
Thanh-Le Ha (Karlsruhe Institute of Technology)
Nizar Habash (New York University Abu Dhabi)
Keith Hall (Google Research)
Greg Hanneman (Carnegie Mellon University)
Christian Hardmeier (Uppsala universitet)
Saša Hasan (Lilt Inc.)
Eva Hasler (University of Cambridge)
Yifan He (New York University)
Kenneth Heafield (University of Edinburgh)
Carmen Heger (Iconic)
John Henderson (MITRE)
Felix Hieber (Amazon Research)
Hieu Hoang (University of Edinburgh)
Stéphane Huet (Université d’Avignon)
Young-Sook Hwang (SKPlanet)
Gonzalo Iglesias (University of Cambridge)
Abe Ittycheriah (IBM)
Laura Jehl (Heidelberg University)
Doug Jones (MIT Lincoln Laboratory)
Marcin Junczys-Dowmunt (Adam Mickiewicz University, Poznan)´
Roland Kuhn (National Research Council of Canada)
Shankar Kumar (Google)
Mathias Lambert (Amazon.com)
Phillippe Langlais (Université de Montréal)
William Lewis (Microsoft Research)
Lemao Liu (NICT)
Qun Liu (Dublin City University)
Shujie Liu (Microsoft Research Asia, Beijing, China)
Saab Mansour (ebay)
Daniel Marcu (ISI/USC)
Arne Mauser (Google, Inc)
Mohammed Mediani (Karlsruhe Institute of Technology)
Wolfgang Menzel (Hamburg University)
Abhijit Mishra (Indian Institute of Technology Bombay)
Yusuke Miyao (National Instutite of Informatics)
Maria Nadejde (University of Edinburgh)
Preslav Nakov (Qatar Computing Research Institute, HBKU)
Graham Neubig (Nara Institute of Science and Technology)
ThuyLinh Nguyen (Carnegie Mellon University)
Jan Niehues (Karlsruhe Institute of Technology)
Kemal Oflazer (Carnegie Mellon University - Qatar)
Tsuyoshi Okita (Ludwig-Maximilians-Universität München)
Noam Ordan (Univeristy of Haifa)
Daniel Ortiz-Martínez (Technical University of Valencia)
Pavel Pecina (Charles University in Prague)
Stephan Peitz (Apple)
Sergio Penkale (Lingo24)
Martin Popel (Charles University in Prague, Faculty of Mathematics and Physics, UFAL)
Maja Popovic (Humboldt University of Berlin) ´
Stefan Riezler (Heidelberg University)
Johann Roturier (Symantec)
Baskaran Sankaran (IBM T.J. Watson Research Center)
Hassan Sawaf (eBay Inc.)
Rico Sennrich (University of Edinburgh)
Kashif Shah (University of Sheffield)
Michel Simard (NRC)
Patrick Simianer (Heidelberg University)
Linfeng Song (University of Rochester)
David Steele (The University of Sheffield)
Sara Stymne (Uppsala University)
Katsuhito Sudoh (NTT Communication Science Laboratories)
Aleš Tamchyna (Charles University in Prague, UFAL MFF)
Christoph Tillmann (IBM Research)
Ke M. Tran (University of Amsterdam)
Yulia Tsvetkov (Carnegie Mellon University)
Dan Tufi¸s (Research Institute for Artificial Intelligence, Romanian Academy)
Ferhan Ture (Comcast Labs)
Masao Utiyama (NICT)
Ashish Vaswani (University of Southern California Information Sciences Institute)
Yannick Versley (University of Heidelberg)
David Vilar (Nuance)
Martin Volk (University of Zurich)
Taro Watanabe (Google)
Bonnie Webber (University of Edinburgh)
Marion Weller-Di Marco (Universität Stuttgart)
Philip Williams (University of Edinburgh)
Hua Wu (Baidu)
Joern Wuebker (Lilt, Inc.)
Peng Xu (Google Inc.)
Wenduan Xu (Cambridge University)
François Yvon (LIMSI/CNRS)
Hao Zhang (Google)
Joy Ying Zhang (Carnegie Mellon University)
Hai Zhao (Shanghai Jiao Tong University)
Tiejun Zhao (Harbin Institute of Technology)

CONTACT

For general questions, comments, etc. please send email to bhaddow@inf.ed.ac.uk.
For task-specific questions, please contact the relevant organisers.

ACKNOWLEDGEMENTS

This conference has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements 645452 (QT21) and 645357 (Cracker).
We thank Yandex for their donation of data for the Russian-English and Turkish-English news tasks, and the University of Helsinki for their donation for the Finnish-English news tasks.