Shared Task: Automatic Post-Editing

Latest updates

  • August 25, 2020: Results of automatic evaluation for all submissions have been released (scroll down to Results section)
  • May 30, 2020: System submission deadline has been extended. Please refer to important date section below


    The sixth round of the APE shared task follows the success of the previous rounds organised from 2015 to 2019. The aim is to examine automatic methods for correcting errors produced by an unknown machine translation (MT) system. This has to be done by exploiting knowledge acquired from human post-edits, which are provided as training material.


    The aim of this task is to improve MT output in black-box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view, APE components would make it possible to:

    Task Description

    This year the task will use Wikipedia data for English --> German and English --> Chinese lanaguge pairs. In these datasets, the source sentences have been translated into the target language by using a state-of-the-art neural MT system unknown to the participants (in terms of system configuration) and then manually post-edited. This dataset is shared by both Automatic Post-Editing and Quality Estimation shared tasks.

    At training stage, the collected human post-edits have to be used to learn correction rules for the APE systems. At test stage they will be used for system evaluation with automatic metrics (TER and BLEU).


    Compared to the the previous round, the main differences are:


    Training, development and test data consist in (source, target, post-edit) triplets. The source sentences come from the English Wikipiedia. The target sentences are automatic translations either in German (English --> German sub-task) or Chinese (English --> Chinese sub-task). The English --> German data is already truecased and tokenized (using '-no-escape' argument) with Moses scripts. Similarly, the English data of English-->Chinese language pair is tokenized with Moses but the Chinese data is tokenized with jieba tokenizer ( The post-edits are human revisions of the target elements.

    To download the data, click on the links in the table below:

    Language pair Data Additional Resource
    English --> German train, dev, test, test_with_gold_labels artificial training data+, eSCAPE Corpus*
    English --> Chinese train, dev, test, test_with_gold_labels

    +: This training data was created and used in "Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing"

    *: This corpus was created and used in "eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing". It contains data generated by both PBSMT as well as NMT system

    Any use of additional data for training your system is allowed (e.g. parallel corpora, post-edited corpora).

    Data Citation

    Please cite the following paper if you use the datasets released in this shared task:
    (will be added during the camera-ready period)


    Systems' performance will be evaluated with respect to their capability to reduce the distance that separates an automatic translation from its human-revised version.

    Such distance will be measured in terms of TER, which will be computed between automatic and human post-edits in case-sensitive mode.

    Also BLEU will be taken into consideration as a secondary evaluation metric. To gain further insights on final output quality, a subset of the outputs of the submitted systems will also be manually evaluated like in previous rounds.

    The submitted runs will be ranked based on the average HTER calculated on the test set by using the tercom software.

    The HTER calculated between the raw MT output and human post-edits in the test set will be used as baseline (i.e. the baseline is a system that leaves all the test instances unmodified).

    The evaluation script can be downloaded here

    Submission Format

    The output of your system should produce automatic post-editions of the target sentences in the test in the following way (each column is tab separated):


    Where: Each field should be delimited by a single tab character.

    Submission Requirements

    Each participating team can submit at most 2 systems, but they have to explicitly indicate which of them represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.

    Submissions should be sent via email to Please use the following pattern to name your files:


    INSTITUTION-NAME is an acronym/short name for your institution, e.g. "UniXY"

    METHOD-NAME is an identifier for your method, e.g. "pt_1_pruned"

    SUBTYPE indicates whether the submission is primary or contrastive with the two alternative values: PRIMARY, CONTRASTIVE.

    You are also invited to submit a short paper (4 to 6 pages) to WMT describing your APE method(s). You are not required to submit a paper if you do not want to. In that case, we ask you to give an appropriate reference describing your method(s) that we can cite in the WMT overview paper.


    The official results of the 2020 APE shared task will be available here

    Important dates

    Release of training and development data March 28, 2020
    Release of test data June 8, 2020 July 8, 2020
    APE system submission deadline June 15, 2020 July 15, 2020
    Manual evaluationJune, 2020 July 2020
    Paper submission deadlineJuly 15, 2020 August 15, 2020
    Notification of acceptanceAugust 17, 2020 September 29, 2020
    Camera-ready deadlineAugust 31, 2020 October 10, 2020
    ConferenceNovember 11-12, 2020 November 19-20, 2020


    Rajen Chatterjee (Apple Inc.)
    Matteo Negri (Fondazione Bruno Kessler)
    Marco Turchi (Fondazione Bruno Kessler)


    For any information or question about the task, please send an email
    To be always updated about this year's edition of the APE task, you can also join the wmt-ape group.