This directory contains developement data that may be useful for automatic evaluation metrics for statistical machine translation. The files in the segment-rankings/ subdirectory contain segment-level rankings for each of the data conditions in the Workshop on Statistical Machine Translation at ACL 2007 (http://www.statmt.org/wmt07/). The files look like this: % head -18 segment-rankings/cs-en.nc-test 1 cu = uedin 1 cu > pctranslator2007 1 uedin > pctranslator2007 1 umd > cu 1 umd > pctranslator2007 1 umd > uedin 14 cu > pctranslator2007 14 cu > uedin 14 cu > umd 14 uedin = umd 14 uedin > pctranslator2007 14 umd > pctranslator2007 16 cu = pctranslator2007 16 cu = uedin 16 cu = umd 16 pctranslator2007 = uedin 16 pctranslator2007 = umd 16 uedin = umd The number indicates the segment being judged (indexed from 1, not zero). The information following the segment number indicates the rank of two systems. For instance on the first segment the cu system was better than the pctranslator2007 system, equal to the uedin system, and worse than the umd system. The system translations are provided in the submissions/ subdirectory. Here are the translations produced by the aforementioned four systems: % head -1 submissions/cs-en/* ==> submissions/cs-en/wmt07.cu.nc-test.cs-en <== Racially divided Europe ==> submissions/cs-en/wmt07.pctranslator2007.nc-test.cs-en <== Racially fission Europe ==> submissions/cs-en/wmt07.uedin.nc-test.cs-en <== A racially divided Europe ==> submissions/cs-en/wmt07.umd.nc-test.cs-en <== A Racially Divided Europe The corresponding reference segment is contained in the reference/ subdirectory: % head -1 reference/nc-test2007.en Europe's Divided Racial House The source segment is in the source/ subdir: % head -1 source/nc-test2007.cs Rasově rozdělená Evropa The rankings were produced by running the following script over the raw judgments file available at http://www.statmt.org/wmt07/judgements.gz zcat judgements.gz | scripts/extract_segment_rank.perl| grep "WMT07 English-Czech News Commentary" | sort -n | cut -f1,3 > rankings/en-cz.nc-test When there were multiple judgements for a pair of systems for a single segment, the script took the majority over the judgements. The method for collecting these relative rankings of each segment is described in @InProceedings{callisonburch-EtAl:2007:WMT, author = {Callison-Burch, Chris and Fordyce, Cameron and Koehn, Philipp and Monz, Christof and Schroeder, Josh}, title = {(Meta-) Evaluation of Machine Translation}, booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation}, month = {June}, year = {2007}, address = {Prague, Czech Republic}, publisher = {Association for Computational Linguistics}, pages = {136--158}, url = {http://www.aclweb.org/anthology/W/W07/W07-0218} } The system-rankings/ subdirectory contains system rankings which are based on the total number of times that one system's segments are ranked higher than another's. Here's how those scores were calculated: zcat ~/Downloads/judgements.gz | perl scripts/calculate_system_rank.perl If you have any questions about any of this data, feel free to contact Chris Callison-Burch (http://cs.jhu.edu).