Automatic Post-Editing task - - EMNLP 2015 Tenth Workshop on Statistical Machine Translation

EMNLP 2015 TENTH WORKSHOP
ON STATISTICAL MACHINE TRANSLATION

Shared Task: Automatic Post-Editing

17-18 September 2015
Lisbon, Portugal

This shared task will examine automatic methods for correcting errors produced by an unknown machine translation (MT) system. Since the system itself is a "black-box", automatic post-editing methods have to operate at downstream level (that is, after MT decoding), by exploiting knowledge acquired from previous human post-editions and provided as training material.

Goals

Automatic Post-editing (APE) aims at improving MT output in black-box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view APE components would make it possible to:

Cope with systematic errors of an MT system whose decoding process is not accessible
Provide professional translators with improved MT output quality to reduce (human) post-editing effort
Adapt the output of a general-purpose system to the lexicon/style requested in a specific application domain

Task Description

In this pilot run of the shared task we will provide you with training (source, target, human post-edition) triples and you will return automatic post-editions for unseen (source, target) test pairs.

Data

Training and development data (the same used for the Sentence-level Quality Estimation task) respectively consist of 11,272 and 1,000 English-Spanish triples in which:

The source is a tokenized English sentence coming from the NEWS domain, having a length of at least 4 tokens (that is, likely to be a grammatically corrected full sentence)
The target is a tokenized Spanish translation of the source, produced by an unknown MT system
The human post-edition is a manual revision of the target, collected by means of the Unbabel crowd post-editing platform

Sources, targets and human post-editions are provided in separated files.
Download training and development data.

Test data consist of 1,817 tokenized (source, target) pairs having the same characteristics of the source and target sentences provided as training.
Download test data.. (NEW!!!! -- TEST SET AVAILABLE! -- )

Any use of additional data for training your system is allowed (e.g. parallel corpora, post-edited corpora).

EVALUATION

Systems' performance will be evaluated with respect to their capability to reduce the distance that separates an automatic translation from its human-revised version. Such distance will be measured in terms of human-targeted TER (HTER).

While HTER is normally calculated as the minimum edit distance between the machine translation and its manually post-edited version in [0,1], in the APE task it will be used to measure the edit distance between automatic and manual post-editions.

The submitted runs will be ranked based on the average HTER calculated on the test set by using the tercom software.

IMPORTANT NEWS:
Each run will be evaluated in two modes, namely: i) case insensitive and ii) case sensitive.

If specified by the participants at submission stage (see Submission Requirements), final results for a given run can be released according to only one of the two modes.
Instead, if not specified by the participants at submission stage (see Submission Requirements), final results will be released by measuring system's performance in both ways, that is with two separate scores.

In both cases, lower average HTER will correspond to a higher rank.

The evaluation scripts available for download allow participants to compute HTER scores in both modalities.
Download the evaluation script.

The HTER calculated between the raw MT output and human post-editions in the test set will be used as baseline (i.e. the baseline is a system that leaves all the test instances unmodified).

Submission Format

The output of your system should produce automatic post-editions of the target sentences in the test in the following way:

<METHOD NAME>   <SEGMENT NUMBER>   <APE SEGMENT>

Where:

METHOD NAME is the name of your automatic post-editing method.
SEGMENT NUMBER is the line number of the plain text target file you are post-editing.
APE SEGMENT is the automatic post-edition for the particular segment.

Each field should be delimited by a single tab character.

Submission Requirements

Each participating team can submit at most 3 systems, but they have to explicitly indicate which of them represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.

Submissions should be sent via email to wmt-ape-submission@fbk.eu. Please use the following pattern to name your files:

INSTITUTION-NAME_METHOD-NAME_SUBTYPE, where:

INSTITUTION-NAME is an acronym/short name for your institution, e.g. "UniXY"

METHOD-NAME is an identifier for your method, e.g. "pt_1_pruned"

SUBTYPE indicates whether the submission is primary or contrastive with the two alternative values: PRIMARY, CONTRASTIVE.

EVALTYPE indicates whether the submission should be evaluated with only one of the two alternative modes or in both ways: INSENSITIVE, SENSITIVE, BOTH

For instance, the name "UniXY_pt_1_pruned_PRIMARY_BOTH" could be used to indicate the primary submission from team UniXY, based on method "pt_1_pruned", to be evaluated both in case insensitive and case sensitive mode.

You are also invited to submit a short paper (4 to 6 pages) to WMT describing your APE method(s). You are not required to submit a paper if you do not want to. In that case, we ask you to give an appropriate reference describing your method(s) that we can cite in the WMT overview paper.

Important dates

Release of training data	January 31, 2015
Test set distributed	April 27, 2015
Submission deadline	May 15, 2015
Paper submission deadline	June 28, 2015
Notification of acceptance	July 21, 2015
Camera-ready deadline	August 11, 2015

Organisers

Rajen Chatterjee (Fondazione Bruno Kessler)
Matteo Negri (Fondazione Bruno Kessler)
Marco Turchi (Fondazione Bruno Kessler)

Acknowledgements

All the APE task data are kindly provided by Unbabel.

Contact

Please send your questions, comments, etc. to wmt-ape@fbk.eu.
To be always updated about this year's edition of the APE pilot task, you can also join the wmt-ape group.

Supported by the European Commission under the QT21
project (grant number 645452)

EMNLP 2015 TENTH WORKSHOPON STATISTICAL MACHINE TRANSLATION