Compute Grants: Large-Scale Multilingual Machine Translation

Overview

Machine Translation (MT) has made significant progress in recent years with a shift to neural models and rapid development of new architectures such as the transformer. High-quality benchmarks such as FLORES have made it possible to track improvements for low-resource language pairs such as Nepali-English, Sinhala-English, Pashto-English and Khmer-English. However, current models trained on little parallel data tend to produce poor quality translation. Recent advances in Massively Multilingual Translation Models and semi-supervised techniques (back translation) have made it possible further improve the quality of translation systems. However, such techniques require large quantities of compute power, which make it prohibitive for smaller communities to contribute to research in this area.

In conjunction with the Conference on Machine Translation (WMT) and Microsoft Azure, we are pleased to invite the academic community to respond to this call for compute credit grants with the aim of improving translation, with particular emphasis on low-resource. Applicants for the awards will be expected to use their compute credit grants to train neural machine translation systems and track their progress using the FLORES101 dashboard. The FLORES101 dashboard will be open from June 4 to August 13, and the winners of the different tracks will be announced along the WMT competition. Microsoft Azure will provide credits that will be awarded to the selected proposals. While all applications will be considered, priority for the grants will be given to researchers working with the lowest resource languages in FLORES101: Albanian, Amharic, Armenian, Assamese, Asturian, Aymara, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Cebuano, Chokwe, Croatian, Danish, Dioula (Romanized), Estonian, Filipino (Tagalog), Fula, Galician, Ganda, Georgian, Haitian Creole, Hausa, Hebrew, Hungarian, Icelandic, Igbo, Iloko, Indonesian, Irish, Javanese, Kabuverdianu, Kachin, Kamba, Kannada, Kazakh, Khmer, Kikongo, Kimbundu, Kurdish (Kurmanji), Kyrgyz, Lao, Lingala, Luo, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Māori, Marathi, Mongolian, Nepali, Northern Sotho, Norwegian, Nyanja, Occitan, Oriya, Oromo, Pashto, Persian, Polish, Punjabi, Quechua, Scottish Gaelic, Serbian, Shan, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sorani Kurdish, Sundanese, Swahili, Swazi, Tajik, Tamil.

Applicants should submit a one-page proposal outlining their intended research, the languages for which they plan to submit a system and the total amount of requested compute credits. Compute credit grants winners will be announced by email by May 28, 2021 and the Credits will be made available at the start of the FLORES101 evaluation cycle.

Proposals should include

Eligibility

Timing and Dates