Shared Task: General Machine Translation

News translation task is changing to General MT task

With recent improvements of MT quality, we decided to move away from testing only on news domain and we are shifting the WMT focus on testing general capabilities of MT systems. Here are the main changes:

If you have any questions, please, refer to this living document or write an email to the organizers.

The former News translation task of the WMT changes focus this year on evaluation of general MT capabilities. The main difference in contrast past years is that testsets will contain multiple domains.
For this year the language pairs are:

We provide parallel corpora for all languages as training data, and additional resources for download.

The following is a quick guide to the language pairs (in terms of resource-level and language similarity)

High resourceMedium resourceLow resource
Closely-related uk-cs
Same familyen-de, en-cs, en-rufr-de, uk-enen>hr
Distanten-zhen-jaliv-en, sah-ru

GOALS

The goals of the shared translation task are:

We hope that both beginners and established research groups will participate in this task.

IMPORTANT DATES

Release of training data for shared tasks (by)most data are released
Test suite source texts must reach us TBC - June
Test data released 21st July
Translation submission deadline28th July (AoE)
Translated test suites shipped back to test suites authors TBC - July
Abstract system description submission 4th August

TASK DESCRIPTION

We provide training data for all language pairs, and a common framework. The task is to improve current methods. We encourage a broad participation -- if you feel that your method is interesting but not state-of-the-art, then please participate in order to disseminate it and measure progress. Participants will use their systems to translate a test set of unseen sentences in the source language. The translation quality is measured by a manual evaluation and various automatic evaluation metrics.

You may participate in any or all of the language pairs. For all language pairs we will test translation in both directions. To have a common framework that allows for comparable results, and also to lower the barrier to entry, we provide a common training set. You are not limited to this training set, and you are not limited to the training set provided for your target language pair. This means that multilingual systems are allowed, and classed as constrained as long as they use only data released for WMT22.

If you use additional training data (not provided by the WMT22 organisers) or existing translation systems, you must flag that your system uses additional data. We will distinguish system submissions that used the provided training data (constrained) from submissions that used significant additional data resources. Note that basic linguistic tools such as taggers, parsers, or morphological analyzers are allowed in the constrained condition as well as pretrained language models released before February 2022.

Each participant is required to submit submission paper, which should highlight in which ways your own methods and data differ from the standard task. You should make it clear which tools you used, and which training sets you used.
Each participant has to submit (one page) abstract of the system description one week after the system submission deadline. It may be a full system description paper or only a draft that can be later modified for the final system description paper.

Document-level MT

We are interested in the question of whether MT can be improved by using context beyond the sentence, and to what extent state-of-the-art MT systems can produce translations that are correct "in-context" All of our development and test data contains full documents, and all our human evaluation will be in-context, in other words the evaluators will view the sentence as well as its surrounding context when evaluating.

Our training data retains context and document boundaries wherever possible, in particular the following corpora retain the context intact:

DATA

LICENSING OF DATA

The data released for the WMT22 General MT task can be freely used for research purposes, we just ask that you cite the WMT22 shared task overview paper, and respect any additional citation requirements on the individual data sets. For other uses of the data, you should consult with original owners of the data sets.

TRAINING DATA

We aim to use publicly available sources of data wherever possible. Our main sources of training data are the Europarl corpus, the UN corpus, the news-commentary corpus and the ParaCrawl corpus. We also release a monolingual News Crawl corpus. Other language-specific corpora will be made available.

You may also use the following monolingual corpora released by the LDC:

Note that the released data is not tokenized and includes sentences of any length (including empty sentences). All data is in Unicode (UTF-8) format. The following Moses tools allow the processing of the training data into tokenized format:

These tools are available in the Moses git repository.

DEVELOPMENT DATA

To evaluate your system during development, we suggest using previous test sets. For automatic evaluation, we recommend to use sacreBLEU, which will automatically download previous WMT test sets for you. You may want to consider COMET automatic metric that has been shown to have high correlation with humans. We also release other dev and test sets from previous years.

The 2022 test sets will be created from a sample of up to four domains (most likely news, e-commerce, social, and conversational) with equal number of sentences per domain. The sources of the test sets will be original text, whereas the targets will be human-produced translations. This is in contrast to the test sets up to and including 2018, which were a 50-50 mixture of test sets produced in this way, and test sets produced in the reverse direction (i.e. with the original text on the target side).

DOWNLOAD

NEW: You can download all corpora via command line approach here with detailed instructions. Except three datasets marked as 'Register and Download' (CzEng2.0, CCMT, and Yandex Corpus). Usage:

pip install mtdata==0.3.5
wget https://www.statmt.org/wmt22/mtdata/mtdata.recipes.wmt22-constrained.yml
for rid in wmt22-{cs,de,ja,ru,hr}en wmt22-frde; do
  mtdata get-recipe -ri $rid -o $rid
done

TEST SET SUBMISSION

TBA

EVALUATION

Primary systems will be included in the human evaluation. We will collect subjective judgments about the translation qaility from annotators, taking the document context into account.

In the unlikely event of an unprecedented number of system submissions that we couldn't evaluate, we may decide to preselect systems for human evaluation by automatic metrics (especially not evaluating low-performing unconstrained systems). However, we believe this won't be applied and all primary systems will be evaluated by humans.

CONTACT

For queries, please use the mailing list or contact tomkocmi@microsoft.com.

ACKNOWLEDGEMENTS

This task would not have been possible without the sponsorship of monolingual data, test sets translation and evaluation from Microsoft, Facebook, NTT, the University of Tokyo, LinguaCustodia, Webinterpret, as well as funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 825299 (GoURMET) and 825460 (Elitr). French-German testsets has been funded by the French Ministry of Defense.