Shared Task: Biomedical Translation Task

Task description

This task aims to evaluate systems on the translation of documents from the biomedical domain. The training and test data will consist of documents retrieved from various databases. This year, the biomedical translation task will address the following language pairs:

Data

Our training and development data comes from various sources, as listed below:

Evaluation

Evaluation will be carried out both automatically and manually. Automatic evaluation will make use of standard machine translation metrics, such as BLEU. Native speakers of each of the languages will manually check the quality of the translation for a small sample of the submissions. We plan to release test sets for the following language pairs and sources:

Submission format

The various datasets come in different formats, please check details on the Web site of the respective repositories and collections and contact us if you have any question.

Submission Requirements

Please register your team using this form. You will receive a mail with the confirmation of your registration. The link for submission is informed in this mail.

The test files are available in the WMT'17 biomedical task Google Drive folder. There are three folders:

The format for the submission files should included the original test file name preceded by the team identifier (as registered in the form above) and the run number, following this example:

Each team is allowed to submit up to 3 runs per test file.

Please check the submission format of each dataset. The UFAL datasets (Cochrane and NHS) follow the format of the corresponding UFAL corpus while the Scielo and the EDP datasets follow the BioC format of the Scielo dataset.

Results

Results for the biomedical task are available.

The gold standard files are available in the WMT'17 biomedical task Google Drive folder.

Important dates

Release of training dataend of January 2017
Release of test dataMay 2, 2017
Results submission deadlineMay 8, 2017 May 12, 2017

Organisers

Ondrej Bojar (Charles University in Prague, Czech Republic)
Antonio Jimeno Yepes (IBM Research Australia)
Aurélie Névéol (LIMSI, CNRS, France)
Mariana Neves (Federal Institute for Risk Assessment / Hasso Plattner Institute, Germany)
Pavel Pecina (Charles University in Prague, Czech Republic)
Karin Verspoor (University of Melbourne, Australia)


Please contact us in the mail wmtbiomedical@gmail.com. Please also joing our discussion forum.