ACL 2008 THIRD WORKSHOP
ON STATISTICAL MACHINE TRANSLATION

Shared Task: Automatic Evaluation of Machine Translation

June 19, in conjunction with ACL 2008 in Columbus, Ohio

[HOME] | [SHARED TRANSLATION TASK] | [SHARED EVALUATION TASK] | [RESULTS] | [BASELINE SYSTEM]
[SCHEDULE] | [AUTHORS] | [PAPERS]

The shared evaluation task of the workshop will examine automatic evaluation metrics for machine translation. We will provide all of the translations produced in the shared translation task, as well as the reference translations. You will return rankings for each of each of the translations at the system-level and/or at the sentence-level. We will calculate the correlation on your rankings with the human evaluation when it is completed.

Goals

The goals of the shared evaluation task are:

Submission Format

Once we receive the system outputs from the shared translation task we will post all of the system translations, along with source documents and reference translations, for you to evaluate with your metric. The translations will be available in two formats:

You can use either of these as input to your software. The output of your software should produce scores for the translations either at the system-level or the segment-level (or preferably both).

Output file format for system-level rankings

The output files for system-level rankings should be formatted in the following way:

<TEST SET>   <SYSTEM>   <SYSTEM LEVEL SCORE>
Where: Each field should be delimited by a single tab character.

Output file format for segment-level rankings

The output files for segment-level rankings should be formatted in the following way:

<TEST SET>   <SYSTEM>   <DOCUMENT ID>   <SEGMENT ID>   <SEGMENT SCORE>
Where: Each field should be delimited by a single tab character.

The output file formats are identical to the ones that will be used in the NIST workshop on evaluation metrics for machine translation, which is going to be held at AMTA this year.

Development Data

Segment-level and sentence-level development data is available for all of the language pairs featured in last year's workshop. The development data was compiled from the sentence-level rankings of last year's manual evaluation process. You are welcome to create customized dev data from the raw data from last year's human evaluation.

Dates

March 29: System translations released (tar file here: wmt08-eval.tar.gz)
April 4: Deadline for short paper submissions (4 pages)
Extended April 9: Deadline for submitting rankings (by email to ccb@csjhu.comedu)

supported by the EuroMatrix project, P6-IST-5-034291-STP
funded by the European Commission under Framework Programme 6