|Release of the MT system to tune||February 9, 2015|
|Registration for complimentary manual evaluation||February 22, 2015|
|Submission deadline for tuning task||April 20, 2015|
|Start of manual evaluation period||May 4, 2015|
|End of manual evaluation||June 1, 2015|
|Paper submission deadline||June TBD, 2015|
The WMT15 tuning task is similar to WMT11 tunable metrics task. We provide the participants with a complete
hierarchical model for English-to-Czech and
Czech-to-English translation (i.e. one
file with all the model
files for each translation direction) and a devset. A designated moses github revision
will be used to run this model.
The participants are expected to incorporate their evaluation metric into the moses scorer, apply whichever moses optimizer they like or use any other tuning tricks to come up with their weight settings.
A submission to tuning task consists of an updated version of the
file, an optional weights file for sparse features (and your outputs on the official test set as an optional
We will run the designated moses revision using your
obtain your MT outputs. (Note that the evaluation metric or tricks you used in the tuning are not needed and not used for the run.) The outputs will be manually ranked using the same
scheme as the main translation task.
For each run submitted to this evaluation, the team promises to join the WMT manual evaluation and annotate at least 100 HITs (ie. 300 5-way comparisons). This contribution to the manual evaluation can be done in whichever language pair you can evaluate and is needed most.
No registration is needed for the participation in the tuning task, unless you would like to make use of our manual judgements of Czech, see Complimentary Manual Evaluation below.
You are invited to submit a short paper (4 to 6 pages) describing your tuning technique. You are not required to submit a paper if you do not want to. If you don't, we ask that you give an appropriate description (a few paragraphs) or an appropriate reference describing your method to include or cite in the overview paper.
This section contains the complete package of models to download.
|moses.ini preview||moses.ini preview|
|en2cs_model.tgz (1.2GB)||cs2en_model.tgz (1.2GB)|
|Devset (newstest2014 from translation task)|
|Original Corpora, Alignments (optional)|
The models are prepared for lowercase input tokenized with the
standard Moses tokenizer
For completeness and training of some of the standard sparse features, we also provide the full corpora and alignments.
When evaluating your submission, we will use Moses Release 3.0, i.e. the github commit 5244a7b607. This can be obtained also as pre-compiled binaries on the Moses Releases page.
Note that we plan to ignore any subsequent commits to the RELEASE-3.0 branch (unless prohibitive bugs are spotted). So to obtain the right sources, use:
git clone https://github.com/moses-smt/mosesdecoder.git moses cd moses git checkout 5244a7b607 -b tuning-task-2015 ## and *NOT*: git checkout RELEASE-3.0, which could be a newer version
Prior to manual evaluation, we will run only the Moses standard
sentence beginnings. This will result in names not uppercased but in less
random effects due to the recaser. Talk to us if you think this is a bad
There are two tracks of the tuning task:
When submitting your
moses.ini, please indicate, if your
submission is constrained or non-constrained.
You are allowed to modify the
moses.ini in any way. You may
delete or add features (but you cannot supply additional model files). You may
also change the search algorithm or increase whatever limits, under the
reasonable assumption that we will be able to actually run the translation with
these settings on our machines.
Based on the changes you make in the
moses.ini, we will mark your submission with these flags (within both tracks):
Moses documentation for instructions on adding sparse features to your
moses.ini. If you add
sparse features then you will probably have to use kbmira or PRO for the tuning
of their weights.
For example, you can add sparse features for target word insertion by adding the following line to your moses.ini:
[feature] SourceWordDeletionFeature factor=0
When you use sparse features, the weights are not stored in
moses.ini but in an
additional weights file. Make sure to include this weights file with your
To allow a broader participation in the English-to-Czech direction, each registered participant of the tuning task will be given a 'credit' of manual pairwise sentence comparisons by our Czech native speakers. The exact number of judgments we can provide will be determined from the number of registered participants, but we expect no less than a few hundred sentence pair comparisons. Obviously, manual judging takes time and there can be a peak of demand as the submission deadline approaches, so remember to get in touch early.
To register for this English-to-Czech complimentary manual pre-evaluation, please send an e-mail to Ondřej Bojar.
To make use of some of your 'credit', simply send the following plain text files to Ondřej Bojar:
Our annotators will see the source, optionally the reference, and the two outputs. The outputs will be shuffled so that the system cannot be determined from the order of the hypotheses. The order of the sentences will not be shuffled, so do this yourself if you want to.
For each sentence, the annotator will mark one of the following:
Supported by the European Commision
project (grant number 288487)