Compute Grants: Large-Scale Multilingual Machine Translation

Overview

Machine Translation (MT) has made significant progress in recent years with a shift to neural models and rapid development of new architectures such as the transformer. High-quality benchmarks such as FLORES have made it possible to track improvements for low-resource language pairs such as Nepali-English, Sinhala-English, Pashto-English and Khmer-English. However, current models trained on little parallel data tend to produce poor quality translation. Recent advances in Massively Multilingual Translation Models and semi-supervised techniques (back translation) have made it possible further improve the quality of translation systems. However, such techniques require large quantities of compute power, which make it prohibitive for smaller communities to contribute to research in this area.

In conjunction with the Conference on Machine Translation (WMT) and Microsoft Azure, we are pleased to invite the academic community to respond to this call for compute credit grants with the aim of improving translation, with particular emphasis on low-resource. Applicants for the awards will be expected to use their compute credit grants to train neural machine translation systems and track their progress using the FLORES101 dashboard. The FLORES101 dashboard will be open from June 4 to August 13, and the winners of the different tracks will be announced along the WMT competition. Microsoft Azure will provide credits that will be awarded to the selected proposals. While all applications will be considered, priority for the grants will be given to researchers working with the lowest resource languages in FLORES101: Albanian, Amharic, Armenian, Assamese, Asturian, Aymara, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Cebuano, Chokwe, Croatian, Danish, Dioula (Romanized), Estonian, Filipino (Tagalog), Fula, Galician, Ganda, Georgian, Haitian Creole, Hausa, Hebrew, Hungarian, Icelandic, Igbo, Iloko, Indonesian, Irish, Javanese, Kabuverdianu, Kachin, Kamba, Kannada, Kazakh, Khmer, Kikongo, Kimbundu, Kurdish (Kurmanji), Kyrgyz, Lao, Lingala, Luo, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Māori, Marathi, Mongolian, Nepali, Northern Sotho, Norwegian, Nyanja, Occitan, Oriya, Oromo, Pashto, Persian, Polish, Punjabi, Quechua, Scottish Gaelic, Serbian, Shan, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sorani Kurdish, Sundanese, Swahili, Swazi, Tajik, Tamil.

Applicants should submit a one-page proposal outlining their intended research, the languages for which they plan to submit a system and the total amount of requested compute credits. Compute credit grants winners will be announced by email by May 28, 2021 and the Credits will be made available at the start of the FLORES101 evaluation cycle.

Proposals should include

A summary of the project explaining tracks of the large scale translation shared task in which they wish to participate, a description of techniques planned to be used and an expected number of compute credit hours expected to be consumed.
Curriculum Vitae for the main point of contact for the application.
Organization (University/Lab) details;
Proposals should be online through this application form.

Eligibility

Applicants may submit one proposal per solicitation.
Organizations must be a nonprofit or non-governmental organization with recognized legal status in their respective country (equal to 501(c)(3) status under the United States Internal Revenue Code).
Government officials (excluding faculty and staff of public universities, to the extent they may be considered government officials), political figures, and politically affiliated businesses are not eligible.

Timing and Dates

Applications are now open. Deadline to apply is May 10, 2021 Anywhere on Earth.
Notifications will be sent by email to selected applicants by May 28, 2021

For questions related to this application, please email the large scale translation shared task organizers.