Shared Task: Large-Scale Multilingual Machine Translation

UDDATE: 7/6 Additional details on the evaluation server are up! Please submit early.

UPDATE: 6/4 FLORES 101 and devtest data are up!


One of the most exciting recent trends in NLP is training a single system on multiple languages at once. In particular, a multilingual machine translation system may be capable of translating a sentence into several languages, or to translate sentences from several languages into a given language or any combinations thereof.

This is a powerful paradigm for two reasons: From a practical perspective, it greatly simplifies system development and deployment, as only a single model needs to be built and used for all language pairs, as opposed to one for each language pair. Second, it has the potential to improve the translation quality on low-resource language pairs by leveraging the ability of the single multilingual machine translation system to transfer knowledge from similar but higher resource language pairs and data in similar domains but in different languages.

However, to date evaluation of multilingual machine translation systems has been hindered by the lack of high quality evaluation benchmarks and the lack of a standardized evaluation process.


The goal of this task is to bring the community together on the topic of low-resource multilingual machine translation for the first time, in the hope to foster progress in this exciting direction. We do so by introducing a realistic benchmark as well as a fair and rigorous evaluation process, as described below.

Task Description

We are going to have three tracks: two small tasks and a large task.
The small tracks evaluate translation between fairly related languages and English (all pairs). The large track uses 101 languages.

The small tracks are an example of a practical MMT problem in similar languages, which does not require very large computational resources at training time, particularly so given the pretrained models we provide.

At the other end of the spectrum, the large track explores the very ambitious goal to translate all at once a very large number of languages which may require a substantial amount of compute at training time.

Track Details

Allowed resources

Compute grants

We want to continue encouraging the research community to work on low-resource translation. As part of this, we encourage participants to apply for compute grants so that GPU compute is less of a barrier for translation research. You can see more detailed information and apply for the compute grants here.


The training data is provided by the publicly available Opus repository, which contains data of various quality from a variety of domains. We also provide in-domain Wikipedia monolingual data for each language. All tracks will be fully constrained, so only the data that is provided can be used. This will enable fairer comparison across methods. Check the multilingual data page for a detailed view of the resources.

The validation and test data are obtained from the Flores 101 evaluation benchmark, which will be made available in June 2021.
This is a high-quality evaluation benchmark that enables evaluation of MMT systems in more than a hundred languages. It supports many-to-many evaluation, as all sentences are aligned across all languages. In particular, we will provide a validation and validation-test datasets to aid the development of systems. The actual evaluation will be performed on a dedicated evaluation server, where participants will upload their evaluation code.


We'll be using the sentence-piece BLEU (spBLEU) variant for evaluation. All scripts and instructions are available in the FLORES respository.


Submission to the leaderboard is avaliable!! Look at this guide for reference on how to do it.

There will be two submission periods.

Participants will be required to submit code that fits certain memory and compute requirements to fit a p2.xlarge AWS instance, strictly constrained.



Important dates

Full language list

  • Afrikaans
  • Amharic
  • Arabic
  • Armenian
  • Assamese
  • Asturian
  • Azerbaijani
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Burmese
  • Catalan
  • Cebuano
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Filipino (Tagalog)
  • Finnish
  • French
  • Fula
  • Galician
  • Ganda
  • Georgian
  • German
  • Greek
  • Gujarati
  • Hausa
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kabuverdianu
  • Kamba
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kyrgyz
  • Lao
  • Latvian
  • Lingala
  • Lithuanian
  • Luo
  • Luxembourgish
  • Macedonian
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Nepali
  • Northern Sotho
  • Norwegian
  • Nyanja
  • Occitan
  • Oriya
  • Oromo
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Serbian
  • Shona
  • Sindhi
  • Slovak
  • Slovenian
  • Somali
  • Sorani Kurdish
  • Spanish
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Umbundu
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Wolof
  • Xhosa
  • Yoruba
  • Zulu