EMNLP 2011 SIXTH WORKSHOP
ON STATISTICAL MACHINE TRANSLATION

Shared Task: System Combination

July 30 - 31, 2011
Edinburgh, UK

[HOME] | [TRANSLATION TASK] | [FEATURED TRANSLATION TASK] | [SYSTEM COMBINATION TASK] | [EVALUATION TASK] | [RESULTS]
[BASELINE SYSTEM] | [BASELINE SYSTEM 2]
[SCHEDULE] | [PAPERS] | [AUTHORS]

The system combination task of the workshop will focus on processing all of the system translations produced in the translation task. You will be provided with the submissions of all entrants to this year's translation task, split into tuning and testing sets, as well as references for the tuning portion of the data. You will be asked to return your combination of translations of the test set.

Goals

The goals of the system combination task are:

We hope that both beginners and established research groups will participate in this task with both novel and established combination techniques. We welcome everything from simple translation output selection to advanced consensus decoding techniques. As with the shared translation task, participants agree to contribute about eight hours of work to the manual evaluation.

Task Description

The system combintion shared task builds on the individual submissions to the shared translation task. Evaluation of system combination entrants will be similar to translation entrants, with both human judgements and automaic metrics. In last year's human evaluation, the system combinations were compared to indivudial systems. In this year's human evaluation, we will only compare system combinations to other system combinations and individual systems to other individual systems. The rationale is that it will be easier to achieve statistical significance if we reduce the number of entries in the n^2 comparsions.

The translation task submissions from the last three years have been made available as early-release training data. Obviously, this data was generated by different systems than those that will be submitted this year, so may not be useful for tuning system-specific feature weights. With one test set this year across all language pairs, we will split the system data we receive into two sets, with tuning being approximately 500 lines and testing approximately 2500 lines. This will provide the opportunity for system combination entrants to learn weights for this year's systems.

System combination submissions will be accepted in all translation tasks for which we receive two or more entrants. We are evaluating in both directions on the following language pairs:

We will also provide the individual system from the featured translation task of translating Haitian Creole SMS messages into English.

Any of the training data provided for the translation task can be used to train language models or other elements needed for your system combination approach. We also allow unconstrained entries, taking special note of the follow information from the translation task: Your submission report should highlight which data you used for training and what unconstrained resources you used. We may break down submitted results in different tracks based on what resources were used. You may submit contrastive runs to demonstrate different techniques or variants of your system, but we cannot guarantee that contrastive systems will receive human evaluation scores.

Training Data

Any of the training data posted for the shared translation task may be used.

Development Data

Any of the development data posted for the shared translation task may be used. We also release the submissions from the last three years to the translation task (the data can be downloaded below). While the prior data won't be useful for tuning submission-specific weights for the 2011 competition, we are providing these sets as a way to jump start training combination systems.

Test Data

Once we have received and processed all submissions to the translation task, we will split the data into approximately 500 lines of tuning data and 2500 lines of test data, and provide sources and references for the tuning set. At that point you can use the tuning data to refine your weights if necessary and return you system combination entry. Note that we only require 1-best submissions from translation task entrants, so your system combination method should not rely on n-best data. We will request that participants provide n-best output if they can, but there is no guarantee that this data will be available. If you or your group is also participating in the translation task, encourage them to send their n-best output to pkoehn@inf.ed.ac.uk

The test data will be released on March 11.

Download

Evaluation

Evaluation will be done both automatically as well as by human judgement.

Dates

Translations released for system combination (download tarball)March 25, 2011
System combination deadlineApril 1, 2011
Paper submission deadlineMay 19, 2011

supported by the EuroMatrixPlus project
P7-IST-231720-STP
funded by the European Commission
under Framework Programme 7