Shared Task: Machine Translation Evaluation (MetricsMATR)
July 15-16, in conjunction with ACL 2010 in Uppsala, Sweden
| [TRANSLATION TASK]
| [SYSTEM COMBINATION TASK]
| [EVALUATION TASK]
| [BASELINE SYSTEM 2]
This is a brief summary of the shared evaluation task / MetricsMATR10. NIST plans to have the official detailed evaluation plan available in early January. Please check back soon either here or on NIST's MetricsMATR page for a link to the plan. Please visit that page for general information on NIST's Metrics for Machine Translation challenge and contact
firstname.lastname@example.org with any questions.
The shared evaluation task of this year's workshop will be implemented as the second NIST's Metrics for Machine Translation (MetricsMATR) challenge, MetricsMATR10. MetricsMATR evaluates automatic innovative metrics of machine translation quality by comparing their scores with human assessments of the quality of the machine translation.
Joining the shared evaluation task of WMT with the MetricsMATR challenge will allow developers to focus on metric development for one rather than two similar evaluations. Also, it will allow testing the metrics on a larger amount of data by joining the WMT and MetricsMATR data resources.
Evaluation data for MetricsMATR consists of a set of machine translations with their corresponding reference translations and a set of human assessments of the quality of the MT. This year's evaluation data will include:
Participants will submit their metric(s) as a software package to NIST. NIST will install the metrics, score the evaluation data with them, and correlate the automatic metric scores with the human assessments.
- The MetricsMATR08 evaluation set from the last MetricsMATR evaluation (described in the evaluation plan, section 2.2)
- Possibly new MetricsMATR evaluation data with new and/or different human assessments
- Output from the WMT10 shared translation task for all of the WMT10 language pairs along with the corresponding human assessments collected for WMT10
The MetricsMATR08 development set (see the evaluation plan, section 2.1) will be provided again. WMT08 and WMT09 data, including system output, source data, human reference translations, and human assessments, will be available for metric development purposes as well.
The input files containing the translation data to be scored by the submitted metrics will be in XML format. Metrics must be capable of handling this data format. They cannot be accepted otherwise. The file format is defined by NIST's current version of its MT evaluation XML DTD, ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-xml-v1.4.dtd.
The format for the output files containing the metric scores remains unchanged from MetricsMATR08. It is described in the evaluation plan, section 2.3.2.
All metric developers participating in MetricsMATR10 must submit an informal metric description to NIST via e-mail to
email@example.com. These descriptions will be made available to the workshop participants, but they will not be published in the ACL 2010 proceedings. Additionally, developers are encouraged to submit a short (4-6 pages) paper outlining their metric(s) for the workshop, following the ACL short paper guidelines and dates.
A section of the joint workshop will be dedicated to MetricsMATR. NIST will provide an evaluation overview and an overview of the metrics submitted, and report on the correlations found with human assessments. Select metrics will be presented in more detail by their developers.
January 11: MetricsMATR08 development data set re-release for MetricsMATR10
March 26: Metric submission commitment due at NIST
March 26-May 14: Metric submission period, metrics must be installed and operational at NIST by May 14, 5pm EDT
April 23: Official ACL short metric papers due via ACL website (online, 4-6 pages)
May 14: Official ACL short metric paper acceptance notification
June 18 2010: Informal metric descriptions due at NIST (mandatory)
July 15-16: Workshop
The results of the evaluation are reported in the workshop overview paper.
supported by the EuroMatrixPlus project
funded by the European Commission
under Framework Programme 7