The training script train-factored-model.perl produces a configuration file moses.ini which has default weights of questionable quality. That's why we need to obtain better weights by optimizing translation performance on a development set.
This is done with the tuning script mert-moses-new.pl. This new version of the minimum error rate trining script is based on a new C++ software. Details about the new implementations are given in Bertoldi, Haddow, Fouet, "Improved Minimum Error Rate Training in Moses", In Proc. of 3rd MT Marathon, Prague, Czech Republic.
The new mert implementation is a standalone open-source software. The only interaction between Moses and the new software is given by the script mert-moses-new.pl itself.
This new implementation of mert stores feature scores and error statistics in separate files )possibly in a binary format) for each nbest-list (at each iteration), and use (some of) these files to optimize weights. At the moment weight optimization can be based on either BLEU or PER.
Most features of the old code mert-moses.pl are maintained, and some new ones are added.
The script are run as follows:
mert-moses-new.pl input-text references decoder-executable decoder.ini
Parameters:
input-text and references are the development set, on which translation performance is optimized. The tuning script tries to find translations for input-text that resemble best the reference translations in references. The script works also with multiple output reference files, these have to be called [references]0, [references]1, [references]2, etc.
decoder-executable is the location of the decoder binary to be used
decoder.ini is the location of the configuration file to be used
Options:
--working-dir=STRING (default mert-dir) directory that contains all files generated during the tuning process. Upon conclusion, it will contain a new moses.ini with better weights
--nbest=NUM (default 100) size of n-best list to be generated at each run of the decoder
--jobs=NUM if the script is run a cluster, this specifies how many jobs to submit (default: serial execution, does not use qsub)
--queue-flags=STRING additional switches to pass to the parallelizer, eg. '-qsub-prefix logname'
--decoder-flags=STRING additional parameters for the decoder
--lambdas=STRING default values and ranges for lambdas, a complex string such as 'd:1,0.5-1.5 lm:1,0.5-1.5 ...' (see below)
--average use the average (not the default, closest) reference length as effective reference length for BLEU score computation
--closest use the closest (default) reference length as effective reference length for BLEU score computation
--shortest use the shortest (not the default, closest) reference length as effective reference length for BLEU score computation
--nocase perform a case-insensitive evaluation between hypos and refs (default is false)
--activate-features=STRING perform optimization on a specified subset of features (default is the optimization of all features); see below for details and for the correct syntax
--continue continue the iterative optimization process from the last finished step (default is false); this is useful to recover a not-terminated optimization process (for example due to any system failure) without losing the first well-completed steps (see the note below for more details
--prev-aggregate-nbestlist=INT number of previous steps to consider when loading data (default =-1). -1 means all previous, i.e. from iteration 1; 0 means no previous data, i.e. from actual iteration; 1 means 1 previous data , i.e. from the actual iteration and from the previous one; and so on.
--mertdir=STRING path to the new implementation of mert software
--mertargs=STRING extra arguments for mert, eg to specify score type (which is BLEU by default)
--help gives a full list of options
Note: the optimized final weights are L1-normalized to 1 (i.e. sum_i |w_i| =1).
Note: the policy for computing the effective reference length in the BLEU score has changed (from revision 2461).
Note: the policy for case-sensitive/insensitive evaluation has changed (from revision 2461); now the default is case-sensitive.
Note: the --continue option relies on several files produced in the well-completed previous steps of the optimization process:
where T is the last well-completed step The file "finished_step.txt" should contain the value T.
Example: to store feature scores and error statistics in binary files and to use PER (instead of the default). Quotation marks are required.
--mertargs "--binary --sctype PER"
Example: to use only the nbest lists produced in the last 3 iterations of the mert process (plus the actual one):
--prev-aggregate-nbestlist=3
If you wish to optimize weights of all models your moses.ini mentions, and you want to use default values and intervals, you do not need to specify --lambdas at all.
--lambdas=STRING specifies the starting values and randomization ranges for the weights in a somwhat obstuse format. Each weight is specified as start,min-max, for instance 0.5,0.25-0.75 for a starting weight of 0.5 (used in the initial decoder run), and the randomized values between 0.25 and 0.75 during the parameter search. Weights have to be defined for reordering (d), language model (lm), translation model (tm), generation model (g), and word penalty (w), for instance by d:1,0.5-1.5 for the reordering model. If there are multiple weights per component, these weights are specified in sequence separated by semicolons ;.
Example:
d:1,0.5-1.5 lm:1,0.5-1.5 tm:0.3,0.25-0.75;0.2,0.25-0.75;0.2,0.25-0.75;0.3,0.25-0.75;0,-0.5-0.5 w:0,-0.5-0.5
sets
Sometimes it could be useful to optimize a subset of the feature weights.
mert-moses.pl allows this through the parameter --activate=list, where list is a non-empty comma-separated list of features.
Features are identified by name_index, where name is the group name and index is the position of the feature inside the group.
The group name are:
d distortion model lm language models tm translation models w word penalty I posterior probability for confusion network
and the index starts from 0; the index is mandatory even if only one feature occurs in a group.
If no features are specified (--activate-features='') or the parameter --activate-features is not set, mert-moses-new.pl perform optimization over all available features.
For instance, setting the option as follows
--activate-features=d_0,d_4,lm_0,tm_3,w_0
only the following features will be optimized: the first (d_0) and the fifth (d_4) distortion model weights, the first language model weight (lm_0), the fourth (tm_3) translation model weight, and the first (w_0) word penalty.
IMPORTANT:
--lambdas parameter (or from their defaults), and not from the configuration file passed through the command line. Hence, if you want to assign specific (already optimized) values for NOT-activated features, please set them by means of the --lambdas parameter. For these NOT-activated features, you must also specify their ranges in the --lambdas parameter to maintain the correct syntax although they are not exploited.
mert-moses-new.pl fails if a wrong feature is specified; for instance lm_2 is not allowed if only one or two language models are used.
mert-moses.pl):
--lambdas)
mert-moses.pl allows this through the parameter --activate=list, where list is a non-empty comma-separated list of features.
Features are identified by their names. The feature names can be found in the nbest list, in the file names.txt in the working directory during the minimum error training, and in the moses help (moses --help). Main features are:
d distortion model lm language models tm translation models w word penalty I posterior probability for confusion network
The parameter --activate=list activates the optimization of only the listed feature weights.
The ratios among the remaining ones are fixed, and are taken from the --lambdas parameter.
If a feature have more scores (eg. tm with 5 scores), optimization can be performed on:
--activate=tm)
--activate=tm_2,tm_3)
The optimization of a subset of feature weights works as follows:
Note that if parameter --activate is not set, all weights are optimized.
Example:
Features are: d lm tm tm tm tm tm w
Parameters are set as follows:--lambdas="d:1,0.5-1.5 lm:1,0.5-1.5 tm:0.3,0.25-0.75;0.2,0.25-0.75;0.2,0.25-0.75;0.3,0.25-0.75;0,-0.5-0.5 w:0,-0.5-0.5"--activate=d,tm_1,tm_5
Features d, the first tm and the fifth tm will be optimized.
Features lm, the second, third and fourth tm and w will NOT be optimized. The ratios among them comes from their initial values: 1, 0.2, 0.2, 0.3, and 0, respectively.
If optimal weights are 0.11, 0.54, and 0.09 respectively, and 0.26 for the extra feature, the final set of weights is
0.11 0.26(=0.26*1) 0.54 0.052(=0.26*0.2) 0.052(=0.26*0.2) 0.078(=0.26*0.3) 0.09 0(=0.26*0)
and after normalization
0.093 0.220 0.457 0.044 0.044 0.066 0.076 0
You can also use the old version of MERT script mert-moses.pl, but some feature are not available.
If the script is run as a multi-job process (--jobs) on a Grid Engine cluster, you will run the script on the head node. The script submits all compute-heavy parts as jobs to the cluster. In other words, you do not submit the script itself as a job to the cluster.
Since there are currently three mert implementations in the moses trunk, after a short discussion on the moses developers' list in July 2010, it was decided that zmert and cmert would be removed, leaving "new mert" as the only mert implementation. The plan for this work is as follows (feel free to add comments)
Hieu)
that this works with syntax moses (Barry, assisted by Hieu and Phil W)
Barry or Nicola)
mert-moses.perl. The command line arguments are the same, except that you
need to specify -mertdir (Barry)
Barry)
Ondrej + students). This is
probably a low priority.
Barry)
and scorers. Make a video. (this could be a small mt marathon project)