Shared Task: Medical Translation

26-27 June 2014
Baltimore, USA


The medical text translation task of WMT14 focuses on translation of texts from the medical domain. The task is split into two subtasks:

  1. translation of sentences from summaries of medical articles,
  2. translation of queries entered by users of medical information search engines.
In each subtask, the translation quality will be evaluated on shared, unseen test sets, provided by the EU FP7 project Khresmoi. On top of the resources available for the translation task, we provide links to additional in-domain data for training and tuning. Participants may train/tune their system using the provided resources (constrained task) or any additional resource (unconstrained task).



The goal of the medical text translation task is to investigate the applicability of current MT techniques to the translation o domain-specific and genre-specific texts. We encourage both beginners and established research groups to participate in this novel task.


Texts from specific domains (such as medicine) and genre (such as search queries) are characterised by frequent occurrence of specific vocabulary and syntactic constructions which are rare or even absent in traditional general-domain training data and therefore difficult to translate for an SMT system. In-domain training data for such specific purposes is usually scarce or not available at all.

Medicine is an example of a domain for which some in-domain training data is available. We provide links to such resources for four European languages: Czech, English, French, and German. These resources can be used to train an SMT system from scratch or to adapt an existing one. The task is to improve the current methods of machine translation and its domain/genre adaptation. They will use their systems to translate test sets consisting of unseen sentences in the source language. The translation quality will be measured by various automatic evaluation metrics.

For the first subtask, English test sentences were randomly sampled from automatically generated summaries of documents containing medical information aimed at general public and medical professionals, found to be relevant to 50 topics provided for the CLEF 2013 eHealth Task 3. Out-of-domain and ungrammatical sentences were manually removed. The development and test setences are provided with information on document ID and topic ID. The topic descriptions are provided as well. The sentences were translated by medical experts into Czech, French, and German. The translations were further reviewed.

For the second subtask, English test queries were randomly sampled from real user query logs provided by the Health on the Net foundation and the Trip database. The queries were translated into Czech, German, and French by medical experts and reviewed.

You may participate in any or all of the following language pairs (both directions):

If you use additional training data (beyond the resources listed on this page below) or existing translation systems (e.g. on-line systems), you must indicate upon submission that your system uses additional resources. We will distinguish system submissions that used the provided in-domain training data and the data provided for the standard translation task (constrained) from submissions that used significant additional data resources. Note that basic linguistic tools such as taggers, parsers, or morphological analyzers are allowed in the constrained condition.


We provide links to several in-domain data resources that are freely available for research purposes. Some of the resources require user registration and licence agreement. To lower the barrier to entry, we provide a set of easy-to-use scripts to extract parallel data in the plain-text sentence-aligned format and monolingual plain texts for language modelling. See the download section below.

Parallel training data resources

Monolingual training data resources


The data is provided in plain text format and in an SGML format that suits the NIST scoring tool.


Parallel data

Data set Parallel sentences Links Notes
EMEA 1M CS-EN, DE-EN, FR-EN Direct download.
COPPA 1.6M FR-EN Provided on DVD, data sent on request. The extraction script splits the data into in-domain and out-of-domain.
MuchMore 29K DE-EN Direct download (two files!).
PatTR 1.8M-2.2M* DE-EN, FR-EN Direct download. The extraction script splits the data into in-domain and out-of-domain.
UMLS 116K-675K* ALL Provided upon registration (download the 2013AB Full Release). The script extracts term-to-term translation dictionary.
Wikipedia titles 3K-10K* CS-EN, DE-EN, FR-EN Direct download, provided by Charles University in Prague.
* depending on the language pair.

Monolingual data

Corpus Sentences Tokens Links Notes
AACT >3.1M 58.7M EN Direct download.
DrugBank 23K 826K EN Direct download.
GENIA 18K 557K EN Direct download.
GREC 1K 62K EN Direct download.
FMA 150K 884K EN Direct download.
PatTR descriptions 1-1.5M* 38M-52M* DE-EN, FR-EN Direct download (the same source as for the parallel data above). The script extracts monolingual sentences from the descriptions section. It splits the data into in-domain and out-of-domain.
PIL 20K 567K EN Direct download.
UMLS descriptions 3K-200K* 1K-6.3M* ALL Provided upon registration (the same source as for the parallel data above). The script extracts monolingual sentences from term descriptions.
Wikipedia articles 50K-562K* 2M-23M* EN, CS, DE, FR Direct download, provided by Charles University in Prague
* depending on the language.


A set of scripts that extract plain text sentence-aligned parallel data and plain-text monolingual data for language modelling from the original packages can be downloaded here.


For intrinsic evaluation (translation quality), convert your output files into the SGML format required by the NIST evaluation tool (see the instructions for the standard task here), and upload your translations of the khresmoi-summary and khresmoi-query test sets (any translation direction) to the

  1. Go to the website
  2. Create an account under the menu item Account -> Create Account.
  3. Go to Account -> upload/edit content, and follow the link "Submit a system run"

The first submitted run will be considered primary. Other runs (if any) will be considered contrastive.

For extrinsic evaluation (cross-lingual information retrieval quality), convert your translations of the clir-query test sets (from any language to English) with 10-best distinct translations of each query in the following format based on the NIST SGML, where each "seg" element has a new "rank" attribute which is an integer value ranging from 1 to 10 and corresposnding to the ranks of the translations variants.

<seg id="1" rank="1">test translation variant 1</seg>
<seg id="1" rank="2">test translation variant 2</seg>
<seg id="1" rank="10">test translation variant 10</seg>

Submit your results via email to Pavel Pecina ( We reserve the right to evaluate a limited number of contrastive submissions from each participant.


Evaluation will be done automatically using common evaluation metrics. We expect the translated submissions to be in recased, detokenized, SGML format.


Task announcementDecember 12, 2013
Release of development test setsJanuary 6, 2014
Release of test setsMarch 10, 2014
Submission of translationsMarch 14, 2014
Submission of papersApril 1, 2014


You are invited to submit a report about your approach. Your submission report should highlight in which ways your own methods and data differ from the standard approaches.


We thank all the data providers for granting the license, especially Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and other data providers to provide the summary sentences. We thank the expert translators for translating the data.


For questions, comments, etc. please send an email to Pavel Pecina (

Supported by the European Commision
under the Khresmoi project
(grant number 257528).