|Release of the MT system to tune|
|Submission deadline for tuning task|
|Start of manual evaluation period||May 2, 2016|
|Paper submission deadline|
|End of manual evaluation||May 22, 2016|
|Notification of acceptance||June 5, 2016|
|Camera-ready deadline||June 22, 2016|
The WMT16 tuning task is similar to the last year tuning task. We provide the participants with a complete
SMT model for English-to-Czech and
Czech-to-English translation (i.e. one
file with all the model
files for each translation direction) and a devset. A designated Moses github revision
will be used to run this model.
The participants are expected to incorporate their evaluation metric into the moses scorer, apply whichever moses optimizer they like or use any other tuning tricks to come up with their weight settings.
A submission to tuning task consists of an updated version of the
file, an optional weights file for sparse features (and your outputs on the official test set as an optional
We will run the designated moses revision using your
obtain your MT outputs. (Note that the evaluation metric or tricks you used in the tuning are not needed and not used for the run.) The outputs will be manually ranked using the same
scheme as the main translation task.
For each run submitted to the tuning task, the team promises to join the WMT manual evaluation and annotate at least 100 HITs (ie. 300 5-way comparisons). This contribution to the manual evaluation can be done in whichever language pair you can evaluate and is needed most.
You are invited to submit a short paper (4 to 6 pages) describing your tuning technique. You are not required to submit a paper if you do not want to. If you don't, we ask that you give an appropriate description (a few paragraphs) or an appropriate reference describing your method to include or cite in the overview paper.
You can download the models to tune from the following locations:
To run your optimized configurations, we will use Moses GitHub revision 2d6f616 unless we run into some unexpected bug.
Details you may want to compare with your setup:
|English -> Czech||Czech -> English|
|Tuning set (newstest2015)||source||newstest2015.tok.en.gz||newstest2015.tok.cs.gz|
|(prepared as the models)||reference||newstest2015.tok.cs.gz||newstest2015.tok.en.gz|
|Sanity Check set (newstest2014)||source||newstest2014.tok.en.gz||newstest2014.tok.cs.gz|
|(prepared as the models)||reference||newstest2014.tok.cs.gz||newstest2014.tok.en.gz|
The BLEU score reported on the Sanity Check set is based on the lowercased tokenized form.
CzEng 1.6pre corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align and the standard Moses pipeline is used for training.
You are supposed to optimize the weights for these standard dense features: word penalty feature, phrase penalty feature, 4 features for a translation table (inverse phrase translation probability, inverse lexical weighting, direct phrase translation probability, direct lexical weighting), 2 features for 2 language models, distortion feature for distance-based reordering model, 6 features for each reordering model (bidirectional features for monotone, swap and discontinuous phrase translation probability).
You may or may not want to add sparse features, see below.
There are two tracks of the tuning task:
When submitting your
moses.ini, please indicate, if your
submission is constrained or non-constrained.
You are allowed to modify the
moses.ini in any way. You may
delete or add features (but you cannot supply additional model files). You may
also change the search algorithm or increase whatever limits, under the
reasonable assumption that we will be able to actually run the translation with
these settings on our machines.
Based on the changes you make in the
moses.ini, we will mark your submission with these flags (within both tracks):
Moses documentation for instructions on adding sparse features to your
moses.ini. If you add
sparse features then you will probably have to use kbmira or PRO for the tuning
of their weights.
For example, you can add sparse features for target word insertion by adding the following line to your moses.ini:
[feature] SourceWordDeletionFeature factor=0
When you use sparse features, the weights are not stored in
moses.ini but in an
additional weights file. Make sure to include this weights file with your
To allow a broader participation in the English-to-Czech direction, each registered participant of the tuning task will be given a 'credit' of manual pairwise sentence comparisons by our Czech native speakers. The exact number of judgments we can provide will be determined from the number of registered participants, but we expect no less than a few hundred sentence pair comparisons. Obviously, manual judging takes time and there can be a peak of demand as the submission deadline approaches, so remember to get in touch early.
To register for this English-to-Czech complimentary manual pre-evaluation, please send an e-mail to firstname.lastname@example.org.
To make use of some of your 'credit', simply send the following plain text files to email@example.com:
Our annotators will see the source, optionally the reference, and the two outputs. The outputs will be shuffled so that the system cannot be determined from the order of the hypotheses. The order of the sentences will not be shuffled, so do this yourself if you want to.
For each sentence, the annotator will mark one of the following:
Submissions should be sent as an e-mail to firstname.lastname@example.org.
In case the above e-mail doesn't work for you (Google seems to prevent non-member postings despite we set it so), please contact us directly.
Supported by the European Commision under the project (grant number 645452)