ACL 2016 First Conference on Machine Translation (WMT16)

WMT16 Tuning Task

Tuning Task Important Dates

Release of the MT system to tune	~~February~~March 30, 2016
Submission deadline for tuning task	~~April 17~~ April 24, 2016
Start of manual evaluation period	May 2, 2016
Paper submission deadline	~~May 8~~ May 15, 2016
End of manual evaluation	May 22, 2016
Notification of acceptance	June 5, 2016
Camera-ready deadline	June 22, 2016

Tuning Task Overview

The WMT16 tuning task is similar to the last year tuning task. We provide the participants with a complete SMT model for English-to-Czech and Czech-to-English translation (i.e. one moses.ini file with all the model files for each translation direction) and a devset. A designated Moses github revision will be used to run this model.

The participants are expected to incorporate their evaluation metric into the moses scorer, apply whichever moses optimizer they like or use any other tuning tricks to come up with their weight settings.

A submission to tuning task consists of an updated version of the moses.ini file, an optional weights file for sparse features (and your outputs on the official test set as an optional sanity check).

We will run the designated moses revision using your moses.ini file to obtain your MT outputs. (Note that the evaluation metric or tricks you used in the tuning are not needed and not used for the run.) The outputs will be manually ranked using the same scheme as the main translation task.

Other Requirements

For each run submitted to the tuning task, the team promises to join the WMT manual evaluation and annotate at least 100 HITs (ie. 300 5-way comparisons). This contribution to the manual evaluation can be done in whichever language pair you can evaluate and is needed most.

You are invited to submit a short paper (4 to 6 pages) describing your tuning technique. You are not required to submit a paper if you do not want to. If you don't, we ask that you give an appropriate description (a few paragraphs) or an appropriate reference describing your method to include or cite in the overview paper.

The System to Tune

You can download the models to tune from the following locations:

English -> Czech
https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1672

Czech -> English
https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1671

To run your optimized configurations, we will use Moses GitHub revision 2d6f616 unless we run into some unexpected bug.

Details you may want to compare with your setup:

English -> Czech Czech -> English

Tuning set (newstest2015) source newstest2015.tok.en.gz newstest2015.tok.cs.gz

(prepared as the models) reference newstest2015.tok.cs.gz newstest2015.tok.en.gz

Sanity Check set (newstest2014) source newstest2014.tok.en.gz newstest2014.tok.cs.gz

(prepared as the models) reference newstest2014.tok.cs.gz newstest2014.tok.en.gz

BLEU 22.26 30.40

		English -> Czech	Czech -> English

Tuning set (newstest2015)	source	newstest2015.tok.en.gz	newstest2015.tok.cs.gz
(prepared as the models)	reference	newstest2015.tok.cs.gz	newstest2015.tok.en.gz

Sanity Check set (newstest2014)	source	newstest2014.tok.en.gz	newstest2014.tok.cs.gz
(prepared as the models)	reference	newstest2014.tok.cs.gz	newstest2014.tok.en.gz
	BLEU	22.26	30.40

The BLEU score reported on the Sanity Check set is based on the lowercased tokenized form.

Description of the Models

CzEng 1.6pre corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align and the standard Moses pipeline is used for training.

You are supposed to optimize the weights for these standard dense features: word penalty feature, phrase penalty feature, 4 features for a translation table (inverse phrase translation probability, inverse lexical weighting, direct phrase translation probability, direct lexical weighting), 2 features for 2 language models, distortion feature for distance-based reordering model, 6 features for each reordering model (bidirectional features for monotone, swap and discontinuous phrase translation probability).

You may or may not want to add sparse features, see below.

Tuning Task Tracks

There are two tracks of the tuning task:

Constrained: You may use only the official WMT16 dev set (i.e. WMT15 test set) to tune the system.
Unconstrained: You may include any other data for the tuning, for instance older WMT test sets, additional reference translations etc.

When submitting your moses.ini, please indicate, if your submission is constrained or non-constrained.

You are allowed to modify the moses.ini in any way. You may delete or add features (but you cannot supply additional model files). You may also change the search algorithm or increase whatever limits, under the reasonable assumption that we will be able to actually run the translation with these settings on our machines.

Based on the changes you make in the moses.ini, we will mark your submission with these flags (within both tracks):

Basic: No sparse features added, no custom settings or limits.
Sparse: Some sparse features added, no custom settings or limits.
Customized Basic: Other changes to the configuration made but no sparse features added.
Customized Sparse: Other changes to the configuration made, including some sparse features.

How to add Sparse Features

Please follow Moses documentation for instructions on adding sparse features to your moses.ini. If you add sparse features then you will probably have to use kbmira or PRO for the tuning of their weights.

For example, you can add sparse features for target word insertion by adding the following line to your moses.ini:

[feature]
SourceWordDeletionFeature factor=0

When you use sparse features, the weights are not stored in moses.ini but in an additional weights file. Make sure to include this weights file with your submission of moses.ini.

Complimentary Manual Evaluation of Translations into Czech

To allow a broader participation in the English-to-Czech direction, each registered participant of the tuning task will be given a 'credit' of manual pairwise sentence comparisons by our Czech native speakers. The exact number of judgments we can provide will be determined from the number of registered participants, but we expect no less than a few hundred sentence pair comparisons. Obviously, manual judging takes time and there can be a peak of demand as the submission deadline approaches, so remember to get in touch early.

To register for this English-to-Czech complimentary manual pre-evaluation, please send an e-mail to wmt-tuning-submissions@googlegroups.com.

Submitting Sentence Pairs for Czech Manual Evaluation

To make use of some of your 'credit', simply send the following plain text files to wmt-tuning-submissions@googlegroups.com:

The source English sentence.
The reference translation (if available).
System A output
System B output

Each sentence should be on a separate line, so all the three or four files must have exactly the same number of lines.

Our annotators will see the source, optionally the reference, and the two outputs. The outputs will be shuffled so that the system cannot be determined from the order of the hypotheses. The order of the sentences will not be shuffled, so do this yourself if you want to.

For each sentence, the annotator will mark one of the following:

Exactly one candidate translation as being the better one.
Both candidates as being equally good, acceptable translations.
Both candidates as being equally bad, inacceptable translations.

How to submit

Submissions should be sent as an e-mail to wmt-tuning-submissions@googlegroups.com.

In case the above e-mail doesn't work for you (Google seems to prevent non-member postings despite we set it so), please contact us directly.

Tuning Task Organizers

Miloš Stanojević (University of Amsterdam, ILLC)
Bushra Jawaid (University of Amsterdam, ILLC)
Amir Kamran (University of Amsterdam, ILLC)
Ondřej Bojar (Charles University in Prague)

Acknowledgement

Supported by the European Commision under the project (grant number 645452)