This conference builds on a series of annual workshops and conferences on statistical machine translation, going back to 2006:
|Release of training data for shared tasks||February/March, 2020|
|Evaluation periods for shared tasks||May/June, 2020|
|Paper submission deadline||July 15, 2020|
|Paper notification||August 17, 2020|
|Camera-ready version due||August 31, 2020|
|Conference in Punta Cana||November 11-12, 2020|
This year's conference will feature the following shared tasks:
In addition to the shared tasks, the conference will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to:
This shared task will examine translation between the following language pairs:
Additional language pairs are still to be confirmed.
The text for all the test sets will be drawn from news articles. Participants may submit translations for any or all of the language directions. In addition to the common test sets the conference organizers will provide optional training resources.
Development sets for the new language pairs, and training data for all pairs, will be made available in January/February 2020. There will be a mixture of high and low resource language pairs, and we expect also to include an unsupervised translation task, as well as allowing multilingual systems.
All submitted systems will undergo human evaluation, and participating teams are expected to contribute to this evaluation.
The news task is supported by Microsoft, NTT and the University of Tokyo, Tilde, National Research Council of Canada, Yandex and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825299 (Gourmet).
In this fifth edition of this task, we plan to evaluate systems for the translation of biomedical abstracts for the following languages pairs:
As well as translation of biomedical terminologies for the following language pair:
Parallel corpora will be available for all language pairs but also monoligual corpora for some languages. Evaluation will be carried out both automatically and manually.
The task is organized to evaluate the performance of state-of-the-art MT systems on translating between pairs of languages from the same language family. We will provide participants with training and testing data from similar languages of different language families. Evaluation will be carried out using automatic evaluation metrics and human evaluation.
This task will focus on the automatic correction of machine translation outputs given a corpus of (source, target, human post-edit) triplets as training material.
In this task, participants develop software that can assign a score to the output of MT, based on the reference translation or without access to the reference (the "Quality Estimation as a Metric" track). Metrics are assessed on their correlation with human judgement.
This consists of several sub-tasks, all of which are concerned with the idea of assessing the quality of MT output without using a reference, at different levels of granularity and including different language pairs, from low to high resource languages.
This task will address the issue of auto-adapting and auto-evaluating MT system across time, i.e. with a stream of incoming data. It will be based on previous News MT tasks (EN-DE and EN-FR) with an evaluation protocol taking the system performance across time into account.
In the chat translation task we aim at addressing a different type of text in which there is a dialogue between [at least] two speakers and once the sentence is uttered there is a limited possibility to revise it. In this scenario, due to its nature, the sentences tend to be very short with a large number of references to the previous sentences. This makes it necessary to use document-level information for translating the sentences, which makes it more challenging. The parallel data used for training and evaluating the systems belongs to the customer support domain and will be available for the English-German and English-French language pairs.
Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references. Formatting will follow EMNLP guidelines (TBC). Supplementary material can be added to research papers. In addition, shared task participants will be invited to submit short papers (suggested length: 4-6 pages, plus references) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically. Note that regular papers must be anonymized, while system descriptions should not be.
Research papers that have been or will be submitted to other meetings or publications must indicate this at submission time, and must be withdrawn from the other venues if accepted and published at WMT 2020. We will not accept for publication papers that overlap significantly in content or results with papers that have been or will be published elsewhere. It is acceptable to submit work that has been made available as a technical report (or similar, e.g. in arXiv) without citing it. This double submission policy only applies to research papers, so system papers can have significant overlap with other published work, if it is relevant to the system description.
We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this conference and past workshops, so that their experiments can be repeated by others using these publicly available corpora.
|Subscribe to to the announcement list for WMT by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.|
|You can read past announcements on the Google Groups page for WMT. These also include an archive of announcements from earlier workshops.|
WMT follows the ACL's anti-harassment policy
For general questions, comments, etc. please send email
For task-specific questions, please contact the relevant organisers.