The basic features of the decoder are explained in the Tutorial. Here, we describe some additional features that have been demonstrated to be beneficial in some cases.
The default standard model that for phrase-based statistical machine translation is only conditioned on movement distance and nothing else. However, some phrases are reordered more frequently than others. A French adjective like extérieur typically gets switched with the preceding noun, when translated into English.
Hence, we want to consider a lexicalized reordering model that conditions reordering on the actual phrases. One concern, of course, is the problem of sparse data. A particular phrase pair may occur only a few times in the training data, making it hard to estimate reliable probability distributions from these statistics.
Therefore, in the lexicalized reordering model we present here, we only consider three reordering types: (m) monotone order, (s) switch with previous phrase, or (d) discontinuous. See below for an illustration of these three different types of orientation of a phrase.

To put if more formally, we want to introduce a reordering model po that predicts an orientation type {m,s,d} given the phrase pair currently used in translation:
orientation ε {m, s, d}
po(orientation|f,e)
How can we learn such a probability distribution from the data? Again, we go back to the word alignment that was the basis for our phrase table. When we extract each phrase pair, we can also extract its orientation type in that specific occurrence.
Looking at the word alignment matrix, we note for each extracted phrase pair its corresponding orientation type. The orientation type can be detected, if we check for a word alignment point to the top left or to the top right of the extracted phrase pair. An alignment point to the top left signifies that the preceding English word is aligned to the preceding Foreign word. An alignment point to the top right indicates that the preceding English word is aligned to the following french word. See below for an illustration.

The orientation type is defined as follows:
We count how often each extracted phrase pair is found with each of the three orientation types. The probability distribution po is then estimated based on these counts using the maximum likelihood principle:
po(orientation|f,e) = count(orientation,e,f) / Σo count(o,e,f)
Given the sparse statistics of the orientation types, we may want to smooth the counts with the unconditioned maximum-likelihood probability distribution with some factor Σ:
po(orientation) = Σf Σe count(orientation,e,f) / Σo Σf Σe count(o,e,f)
po(orientation|f,e) = (σ p(orientation) + count(orientation,e,f) ) / ( σ + Σo count(o,e,f) )
There are a number of variations of this lexicalized reordering model based on orientation types:
These variations have shown to be occasionally beneficial for certain training corpus sizes and language pairs. Moses allows the arbitrary combination of these decisions to define the reordering model type (e.g. bidrectional-monotonicity-f). See more on training these models in the training section of this manual.
For larger tasks the phrase tables usually become huge, typically too large to fit into memory. Therefore, moses supports a binary phrase table with on-demand loading, i.e. only the part of the phrase table that is required to translate a sentence is loaded into memory.
You have to convert the standard ascii phrase tables into the binary format. Here is an example (standard phrase table phrase-table, with 5 scores):
export LC_ALL=C cat phrase-table | sort | mosesdecoder/misc/processPhraseTable \ -ttable 0 0 - -nscores 5 -out phrase-table
Options:
-ttable int int string -- translation table file, use '-' for stdin
-out string -- output file name prefix for binary ttable
-nscores int -- number of scores in ttable
If you just want to convert a phrase table, the two ints in the -ttable option do not matter, so use 0's.
Important: If your data is encoded in UTF8, make sure you set the environment variable with the export LC_ALL=C before sorting. If your phrase table is already sorted, you can skip that.
The output files will be:
phrase-table.binphr.idx phrase-table.binphr.srctree phrase-table.binphr.srcvoc phrase-table.binphr.tgtdata phrase-table.binphr.tgtvoc
In the moses config file, specify phrase-table as phrase table. Moses will check if the binary version exists and use it.
Word-to-word alignment:
To include in the binary phrase table the word-to-word alignments between source and target phrases
which are contained in the textual phrase table (see Training Step "Score Phrases"),
specify the option -alignment-info in the processPhraseTable command.
The two output files ".srctree" and " .tgtdata" will end with the suffix ".wa".
Note:
Moses will check if the binary version with word-to-word alignments exists and optionally use it through the options use-alignment-info, -print-alignment-info and -print-alignment-info-in-n-best.
The reordering tables may be also converted into a binary format. The command is slightly simpler:
mosesdecoder/misc/processLexicalTable -in reordering-table -out reordering-table
The file names for input and output are typically the same, since the actual output file names have similar extensions to the phrase table file names.
Sometimes we have external knowledge that we want to bring to the decoder. For instance, we might have a better translation system for translating numbers of dates. We would like to plug in these translations to the decoder without changing the model.
The -xml-input flag is used to activate this feature. It can have one of four values:
exclusive Only the XML-specified translation is used for the input phrase. Any phrases from the phrase table that overlap with that span are ignored.
inclusive The XML-specified translation competes with all the phrase table choices for that span.
ignore The XML-specified translation is ignored completely.
pass-through (default) For backwards compatibility, the XML data is fed straight through to the decoder. This will produce erroneous results if the decoder is fed data that contains XML markup.
The decoder has an XML markup scheme that allows the specification of translations for parts of the sentence. In its simplest form, we can tell the decoder what to use to translate certain words or phrases in the sentence:
% echo 'das ist <np translation="a cute place">ein kleines haus</np>' \ | moses -xml-input exclusive -f moses.ini this is a cute place % echo 'das ist ein kleines <n translation="dwelling">haus</n>' \ | moses -xml-input exclusive -f moses.ini this is a little dwelling
The words have to be surrounded by tags, such as <np...> and </np>. The name of the tags can be chosen freely. The target output is specified in the opening tag as a parameter value for a parameter that is called english for historical reasons (the canonical target language).
We can also provide a probability along with these translation choice. The parameter must be named prob and should contain a single float value. If not present, an XML translation option is given a probability of 1.
% echo 'das ist ein kleines <n translation="dwelling" prob="0.8">haus</n>' \ | moses -xml-input exclusive -f moses.ini \ this is a little dwelling
This probability isn't very useful without letting the decoder have other phrase table entries "compete" with the XML entry, so we switch to inclusive mode. This allows the decoder to use either translations from the model or the specified xml translation:
% echo 'das ist ein kleines <n translation="dwelling" prob="0.8">haus</n>' \ | moses -xml-input inclusive -f moses.ini this is a small house
The switch -xml-input inclusive gives the decoder a choice between using the specified translations or its own. This choice, again, is ultimately made by the language model, which takes the sentence context into account.
This doesn't change the output from the non-XML sentence because that prob value is first logged, then split evenly among the number of scores present in the phrase table. Additionally, in the toy model used here, we are dealing with a very dumb language model and phrase table. Setting the probability value to something astronomical forces our option to be chosen.
% echo 'das ist ein kleines <n translation="dwelling" prob="0.8">haus</n>' \ | moses -xml-input inclusive -f moses.ini this is a little dwelling
The XML-input implementation is NOT currently compatible with factored models or confusion networks.
Options
-xml-input ('pass-through' (default), 'inclusive', 'exclusive', 'ignore')
The generation of n-best lists (the top n translations found by the search according to the model) is pretty straight-forward. You simple have to specify the file where the n-best list will be stored and the size of the n-best list for each sentence.
Example: The command
% moses -f moses.ini -n-bestlist listfile 100 < in
stores the n-best list in the file listfile with up to 100 translations per input sentence.
Here an example n-best list:
0 ||| we must discuss on greater vision . ||| d: 0 -5.56438 0 0 -7.07376 0 0 \ lm: -36.0974 -13.3428 tm: -39.6927 -47.8438 -15.4766 -20.5003 4.99948 w: -7 ||| -9.2298 0 ||| we must also discuss on a vision . ||| d: -10 -2.3455 0 -1.92155 -3.21888 0 -1.51918 \ lm: -31.5841 -9.96547 tm: -42.3438 -48.4311 -18.913 -20.0086 5.99938 w: -8 ||| -9.26197 0 ||| it is also discuss a vision . ||| d: -10 -1.63574 -1.60944 -2.70802 -1.60944 -1.94589 -1.08417 \ lm: -31.9699 -12.155 tm: -40.4555 -46.8605 -14.3549 -13.2247 4.99948 w: -7 ||| -9.31777
Each line of the n-best list file is made up of (separated by |||):
Note that it is possible (and very likely) that the n-best list contains many sentences that look the same on the surface, but have different scores. The most common reason for this is different phrase segmentation (two words may be mapped by a single phrase mapping, or by two individual phrase mappings for each word).
To produce an n-best list that only contains the first occurrence of an output sentence, add the word distinct after the file and size specification:
% moses -f moses.ini -n-bestlist listfile 100 distinct < in
This creates an n-best list file that contains up to 100 distinct output sentences for each input sentences. Note that potentially a large numbers of candidate translations have to be examined to find the top 100. To keep memory usage in check only 20 times the specified number of distinct entries are examined. This factor can be changed with the switch -n-best-factor.
Options
-n-best-list FILE SIZE [distinct] --- output an n-best list of size SIZE to file FILE
-n-best-factor FACTOR --- exploring at most FACTOR*SIZE candidates for distinct
-include-alignment-in-n-best --- output of word-to-word alignments in the n-best list; it requires that w2w alignments are included in the phrase tabel and that -use-alignment-info is set. (See here for further details).
If the phrase table (binary or textual) includes word-to-word alignments between source and target phrases (see "Score Phrases" and "Binary Phrase Table"), Moses can report them in the output.
There are three options that control the output of alignment infotmation: -use-alignment-info, -print-alignment-info, and -print-alignment-info-in-n-best.
For instance, by translating the sentence "ich frage" from German into English and activating all parameters, you get in the verbose output:
BEST TRANSLATION: i ask [11] [total=-1.429] <<features>> [f2e: 0=0 1=1] [e2f: 0=0 1=1]
The last two fields report the word-to-word alignments from source to target and from target to source, respectively.
In the n-best list you get:
0 ||| i ask ||| ...feature_scores.... ||| -1.42906 ||| 0-1=0-1 ||| 0=0 1=1 ||| 0=0 1=1 0 ||| i am asking ||| ...feature_scores.... ||| -2.61281 ||| 0-1=0-2 ||| 0=0 1=1,2 ||| 0=0 1=1 2=1 0 ||| i ask you ||| ...feature_scores.... ||| -3.1068 ||| 0-1=0-2 ||| 0=0 1=1,2 ||| 0=0 1=1 2=1 0 ||| i ask this ||| ...feature_scores.... ||| -3.48919 ||| 0-1=0-2 ||| 0=0 1=1 ||| 0=0 1=1 2=-1
Indexes (starting from 0) are used to refer to words. '2=-1' means that the word of index 2 (i.e. the word) is not associated with any word in the other language. For instance, by considering the last translation hypothesis "i ask this" of "ich frage", the source to target alignment ("0=0 1=1") means that:
German -> English ich -> i frage -> ask
and viceversa the target to source alignment ("0=0 1=1 2=-1") means that:
English -> German i -> ich ask -> frage this ->
Note: in the same translation hypothesis, the the field "0-1=0-2" after the global score refers to the phrase-to-phrase alignment and means that "ich frage" is translated as a phrase into the three-word English phrase "i ask you".
This information is generated if the option -include-alignment-in-n-best is activated.
Important: the phrase table can include different word-to-word alignments for the source-to-target and target-to-source directions, at least in principle. Hence, the two alignments can differ.
Options
-use-alignment-info -- to activate this feature
-print-alignment-info -- to output the word-to-word alignments into the verbose output.
-print-alignment-info-in-n-best -- to output the word-to-word alignments into the verbose output.
Minumum Bayes Risk (MBR) decoding was proposed by Shankar Kumar and Bill Byrne (HLT/NAACL 2004). Roughly speaking, instead of outputting the translation with the highest probability, MBR decoding outputs the translation that is most similar to the most likely translations. This requires a similarity measure to establish similar. In Moses, this is a smoothed BLEU score.
Using MBR decoding is straight-forward, just use the switch -mbr when invoking the decoder.
Example:
% moses -f moses.ini -mbr < in
MBR decoding uses by default the top 200 distinct candidate translations to find the translation with minimum Bayes risk. If you want to change this to some other number, use the switch -mbr-size:
% moses -f moses.ini -decoder-type 1 -mbr-size 100 < in
MBR decoding requires that the translation scores are converted into probabilities that add up to one. The default is to take the log-scores at face value, but you may get better results with scaling the scores. This may be done with the switch -mbr-scale, so for instance:
% moses -f moses.ini -decoder-type 1 -mbr-scale 0.5 < in
Options
-mbr -- use MBR decoding
-mbr-size SIZE -- number of translation candidates to consider (default 200)
-mbr-scale SCALE -- scaling factor used to adjust the translation scores (default 1.0)
These are extensions to MBR which may run faster or give better results. For more details see Tromble et al (2008), Kumar et al (2009) and De Nero et al (2009). The ngram posteriors (required for Lattice MBR) and the ngram expectations (for Consensus decoding) are both calculated using an algorithm descrived in De Nero et al (2010). Currently both lattice MBR and consensus decoding are implemented as n-best list rerankers, in other words the hypothesis space is an n-best list (not a lattice).
Here's the list of options which affect both lattice mbr and consensus-decoding.
-lmbr -- use Lattice MBR decoding
-con -- use Consensus decoding
-mbr-size SIZE -- as for MBR
-mbr-scale SCALE -- as for MBR
-lmbr-pruning-factor FACTOR -- mean words per node in pruned lattice, as described in Tromble et al (2008) (default 30)
Lattice MBR has several further parameters which are described in the Tromble et al 2008 paper.
-lmbr-p P -- The unigram precision (default 0.8)
-lmbr-r R -- The ngram precision ratio (default 0.6)
-lmbr-thetas THETAS Instead of specifying p and r, lattice MBR can be configured by specifying all the ngram weights and the length penalty (5 numbers). This is described fully in the references.
-lmbr-map-weight WEIGHT The weight given to the map hypothesis (default 0)
Since Lattice MBR has so many parameters, a utility to perform a grid search is provided. This is in moses-cmd/src and is called lmbrgrid. A typical usage would be
% ./lmbrgrid -lmbr-p 0.4,0.6,0.8 -lmbr-r 0.4,0.6,0.8 -mbr-scale 0.1,0.2,0.5,1 -lmbr-pruning-factor 30 -mbr-size 1000 -f moses.ini -i input.txt
In other words, the same Lattice MBR parameters as for moses are used, but this time a comma seperated list can be supplied. Each line in the output takes the following format:
<sentence-id> ||| <p> <r> <pruning-factor> <scale> ||| <translation>
In the moses Lattice MBR experiments that have been done to date, lattice MBR showed small overall improvements on a NIST Arabic data set (+0.4 over map, +0.1 over mbr), once the parameters were chosen carefully. Parameters were optimised by grid search on 500 sentences of heldout, and the following were found to be optimal
-lmbr-p 0.8 -lmbr-r 0.8 -mbr-scale 5 -lmbr-pruning-factor 50
Unknown words are copied verbatim to the output. They are also scored by the language model, and may be placed out of order. Alternatively, you may want to drop unknown words. To do so add the switch -drop-unknown.
When translating between languages that use different writing sentences (say, Chinese-English), dropping unknown words results in better BLEU scores. However, it is misleading to a human reader, and it is unclear what the effect on human judgment is.
Options
-drop-unknown -- drop unknown words instead of copying them into the output
It may be useful for many downstream applications to have a dump of the search graph, for instance to compile a word lattice. One the one hand you can use the -verbose 3 option, which will give a trace of all generated hypotheses, but this creates logging of many hypotheses that get immediately discarded. If you do not want this, a better option is using the switch -output-search-graph FILE, which also provides some additional information.
The generated file contains lines that could be seen as both a dump of the states in the graph and the transitions in the graph. The state graph more closely reflects the hypotheses that are generated in the search. There are three types of hypotheses:
0 hyp=0 stack=0 [...]
0 hyp=17 stack=1 back=0 score=-1.33208 [...] covered=0-0 out=from now on
0 hyp=5994 stack=2 back=108 score=-1.57388 [...] recombined=13061 [...] covered=2-2 out=be
The relevant information for viewing each line as a state in the search graph is the sentence number (initial 0), the hypothesis id (hyp), the stack where the hypothesis is placed (same as number of foreign words covered, stack), the back-pointer to the previous hypotheses (back), the score so far (score), the last output phrase (out) and that phrase's foreign coverage (covered). For recombined hypotheses, also the superior hypothesis id is given (recombined).
The search graph output includes additional information that is computed after the fact. While the backpointer and score (back, score) point to the cheapest path and cost to the beginning of the graph, the generated output also inclused the pointer to the cheapest path and score (forward, fscore) to the end of the graph.
One way to view the output of this option is a reflection of the search and all (relevant) hypotheses that are generated along the way. But often, we want to generate a word lattice, where the states are less relevant, but the information is in the transitions from one state to the next, each transition emitting a phrase at a certain cost. The initial empty hypothesis is irrelevant here, so we need to consider only the other two hypothesis types:
0 hyp=17 [...] back=0 [...] transition=-1.33208 [...] covered=0-0 out=from now on
0 [...] back=108 [...] transition=-0.640114 recombined=13061 [...] covered=2-2 out=be
For the word lattice, the relevant information is the cost of the transition (transition), its output (out), maybe the foreign coverage (covered), and the start (back) and endpoint (hyp). Note that the states generated by recombined hypothesis are ignored, since the transition points to the superior hypothesis (recombined).
Here, for completeness sake, the full lines for the three examples we used above:
0 hyp=0 stack=0 forward=9 fscore=-107.279 0 hyp=17 stack=1 back=0 score=-1.33208 transition=-1.33208 \ forward=517 fscore=-106.484 covered=0-0 out=from now on 0 hyp=5994 stack=2 back=108 score=-1.57388 transition=-0.640114 \ recombined=13061 forward=22455 fscore=-106.807 covered=2-2 out=be
What is the difference between the search graph output file generated with this switch and the true search graph?
-verbose 3 log shows the recombinations as they happen (recall that momentarily superior hypotheses may be recombined to even better ones down the road).
Note again that you can get the full search graph with the -verbose 3 option. It is, however, much larger and mostly consists of discarded hypotheses.
Options
-output-search-graph FILE -- output the search graph for each sentence in a file
During the beam search, many hypotheses are created that are too bad to be even entered on a stack. For many of them, it is even clear before the construction of the hypothesis that it would be not useful. Early discarding of such hypotheses hazards a guess about their viability. This is based on correct score except for the actual language model costs which are very expensive to compute. Hypotheses that, according to this estimate, are worse than the worst hypothesis of the target stack, even given an additional specified threshold as cushion, are not constructed at all. This often speeds up decoding significantly. Try threshold factors between 0.5 and 1.
Options
-early-discarding-threshold THRESHOLD -- use early discarding of hypotheses with the specified threshold (default: 0 = not used)
The beam search organizes and compares hypotheses based on the number of foreign words they have translated. Since they may have different foreign words translated, we use future score estimates about the remaining sentence translation score.
Instead of comparing such apples and oranges, we could also organize hypotheses by their exact foreign word coverage. The disadvantage of this is that it would require an exponential number of stacks, but with reordering limits the number of stacks is only exponential with regard to maximum reordering distance.
Such coverage stacks are implemented in the search, and their maximum size is specified with the switch -stack-diversity (or -sd), which sets the maximum number of hypotheses per coverage stack.
The actual implementation is a hybrid of coverage stacks and foreign word count stacks: the stack diversity is a constraint on which hypotheses are kept on the traditional stack. If the stack diversity limits leave room for additional hypotheses according to the stack size limit (specified by -s, default 200), then the stack is filled up with the best hypotheses, using score so far and the future score estimate.
Options
-stack-diversity LIMIT -- keep a specified number of hypotheses for each foreign word coverage (default: 0 = not used)
Cube pruning, as described by Liang Huang and David Chiang (2007), has been implemented in the Moses decoder. This is in addition to the traditional search algorithm. The code offers developers the opportunity to implement different search algorithms using an extensible framework.
Cube pruning is faster than the traditional search at comparable levels of search errors. To get faster performance than the default Moses setting at roughly the same performance, use the parameter settings:
-search-algorithm 1 -cube-pruning-pop-limit 2000 -s 2000
This uses cube pruning (-search-algorithm) that adds 2000 hypotheses to each stack (-cube-pruning-pop-limit 2000) and also increases the stack size to 2000 (-s 2000). Note that with cube pruning, the size of the stack has little impact on performance, so it should be set rather high. The speed/quality trade-off is mostly regulated by the cube pruning pop limit, i.e. the number of hypotheses added to each stack.
Stacks are organized by the number of foreign words covered, so they may differ by which words are covered. You may also require that a minimum number of hypotheses is added for each word coverage (they may be still pruned out, however). This is done using the switch -cube-pruning-diversity MINIMUM which sets the minimum. The default is 0.
Options
-search-algorithm 1 -- turns on cube pruning
-cube-pruning-pop-limit LIMIT -- number of hypotheses added to each stack
-cube-pruning-diversity MINIMUM -- minimum number of hypotheses from each coverage pattern
For various reasons, it may be useful to specify reordering constraints to the decoder, for instance because of punctuation. Consider the sentence:
I said " This is a good idea . " , and pursued the plan .
The quoted material should be translated as a block, meaning that once we start translating some of the quoted words, we need to finish all of them. We call such a block a zone and allow the specification of such constraints using XML markup.
I said <zone> " This is a good idea . " </zone> , and pursued the plan .
Another type of constraints are walls which are hard reordering constraints: First all words before a wall have to be translated, before words afterwards are translated. For instance:
This is the first part . <wall /> This is the second part .
Walls may be specified within zones, where they act as local walls, i.e. they are only valid within the zone.
I said <zone> " <wall /> This is a good idea . <wall /> " </zone> , and pursued the plan .
If you add such markup to the input, you need to use the option -xml-input with either exclusive or inclusive (there is no difference between these options in this context).
Specifying reordering constraints around punctuation is often a good idea. The switch -monotone-at-punctuation introduces walls around the punctuation tokens ,.!?:;".
Options
<zone>, </zone>, and <wall>.
-xml-input -- needs to be exclusive or inclusive
-monotone-at-punctuation (-mp) -- adds walls around punctuation ,.!?:;".
Moses allows the use of multiple translation tables, but there are two different ways how they are used:
In any case, each translation table has its own set of weights.
First, you need to specify the translation tables in the section [ttable-file] of the moses.ini configuration file, for instance:
[ttable-file] 0 0 5 /my-dir/table1 0 0 5 /my-dir/table2
Secondly, you need to set the appropriate number of weights in the section [weight-t], in our example that would be 10 weights (5 for each table).
Thirdly, you need to specify how the tables are used in the section [mapping]. As mentioned above, there are two choices:
[mapping] T 0 T 1
[mapping] 0 T 0 1 T 1
Note: what we are really doing here is using Moses' capabilities to use different encoding paths. The number before "T" defines a decoding path, so in the second example are two different decoding paths specified. Decoding paths may also contain additional mapping steps, such as generation steps and translation steps using different factors.
Also note that there is no way to have the option "use both tables, if the phrase pair is in both table, otherwise use only the table where you can find it". Keep in mind, that scoring a phrase pair involves a cost and lowers the chances that the phrase pair is used. To effectively use this option, you may create a third table that consists of the intersection of the two phrase tables, and remove shared phrase pairs from each table.
The translation table contains all phrase pairs found in the parallel corpus, which includes a lot of noise. To reduce the noise, recent work by Johnson et al. has suggested to prune out unlikely phrase pairs. For more detail, please refer to the paper:
H. Johnson, J. Martin, G. Foster and R. Kuhn. (2007) '''Improving Translation Quality by Discarding Most of the Phrasetable'''. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 967-975.
Moses includes a re-implementation of this method in the directory sigtest-filter. You first need to build it from the source files.
This implementation relies on Joy Zhang's SALM Suffix Array toolkit.
SALM/Distribution/Linux type: make
sigtest-filter in the main Moses distribution directory
SALMDIR=/path/to/SALM
Using the SALM/Bin/Linux/Index/IndexSA.O32, create a suffix array index of the source and target sides of your training bitext (SOURCE, TARGET).
% SALM/Bin/Linux/Index/IndexSA.O32 TARGET % SALM/Bin/Linux/Index/IndexSA.O32 SOURCE
Prune the phrase table:
% cat phrase-table | ./filter-pt -e TARGET -f SOURCE -l FILTER-VALUE > phrase-table.pruned
FILTER-VALUE is the -log prob threshold described in Johnson et al. (2007)'s paper. It may be either 'a+e', 'a-e', or a positive real value. Run with no options to see more use-cases. A good setting is -l a+e -n 30, which also keeps only the top 30 phrase translations for each source phrase, based on p(e|f).
The latest svn version of moses now supports multi-threaded operation, enabling faster decoding on multi-core machines. The current limitations of mult-threaded moses are:
To configure and build multi-threaded moses, you'll need to have boost installed (1.35 or higher) and use the configure line
% ./configure --with-srilm=<path-to-srilm> --with-boost=<path-to-boost> --enable-threads
The boost path can be omitted if you have boost installed in a standard place. On 64-bit machines you may have to add --with-boost-thread=boost_thread-gcc43-mt (or similar) to the configure arguments.
After moses has been configured this way, running make will build two moses binaries, moses and mosesmt. The latter takes the same arguments as moses (although it doesn't currently support all of moses' i/o options) but it also admits an additional -threads n argument, specifying the size of the threadpool to use when running the decoder. Using a small number of threads (3-5) has been found to speed up decoding, although larger numbers do not seem to offer any further increase in speed. Multi-threaded moses is still experimental, and any feedback on its use would be greatly appreciated. Either mail me or the moses list.
Update: As of revision 3274 there is only one moses main (for the phrase-based decoder, at least). The moses binary can be run in both single-threaded and multi-threaded modes, depending on how it is configured. If it is configured without --enable-threads, then only single-threaded is available, however if built with --enable-threads then the -threads argument can be used to run it in multi-threaded mode.
The moses server enables you to run the decoder as a server process, and send it sentences to be translated via xmlrpc. This means that one moses process can service distributed clients coded in Java, perl, python, php, or any of the many other languages which have xmlrpc libraries.
To build the moses server, you need to have xmlrpc-c installed - it has been tested with the latest stable version, 1.16.19, and you need to add the argument --with-xmlrpc-c=<path-xmlrpc-c-config> to the configure arguments. You will also need to configure moses for multi-threaded operation, as described above.
Running make should then build an executable server/mosesserver. This can be launched using the same command-line arguments as moses, with two additional arguments to specify the listening port and log-file (--server-port and --server-log). These default to 8080 and /dev/null respectively.
A sample client is included in the server directory (in perl), which requires the SOAP::Lite perl module installed. To access the moses server, an xmlrpc request should be sent to http://host:port/RPC2 where the parameter is a map containing the keys text and (optionally) align. The value of the first of these parameters is the text to be translated and the second, if present, causes alignment information to be returned to the client. The client will receive a map containing the same two keys, where the value associated with the text key is the translated text, and the align key (if present) maps to a list of maps. The alignment gives the segmentation in target order, with each list element specifying the target start position (tgt-start), source start position (src-start) and source end position (src-end).
Note that although the moses server needs to be built against multi-threaded moses, it can be run in single-threaded mode using the --serial option. This enables it to be used with non-threadsafe libraries such as (currently) irstlm.
The moses server is now able to load multiple translation systems within the same server, and the client is able to decide which translation system that the server should use, on a per-sentence basis. The client does this by passing a system argument in the translate operation.
One possible use-case for this multiple models feature is if you want to build a server that translates both French and German into English, and uses a large English language model. Instead of running two copies of the moses server, each with a copy of the English language model in memory, you can now run one moses server instance, with the language model in memory, thus saving on RAM.
To use the multiple models feature, you need to make some changes to the standard moses configuration file. A sample configuration file can be found here.
The first piece of extra configuration required for a multiple models setup is to specify the available systems, for example
[translation-systems] de D 0 R 0 L 0 fr D 1 R 1 L 1
This specifies that there are two systems (de and fr), and that the first uses decode path 0, reordering model 0, and language model 0, whilst the second uses the models with id 1. The multiple decode paths are specified with a stanza like
[mapping] 0 T 0 1 T 1
which indicates that the 0th decode path uses the 0th translation model, and the 1st decode path uses the 1st translation model. Using a language model specification like
[lmodel-file] 0 0 5 /disk4/translation-server/models/interpolated-lm 0 0 5 /disk4/translation-server/models/interpolated-lm
means that the same language model can be used in two different systems with two different weights, but moses will only load it once. The weights sections of the configuration file must have the correct numbers of weights for each of the models, and there must be a word penalty and linear distortion weight for each translation system. The lexicalised reordering weights (if any) must be specified in the [weight-lr] stanza, with the distortion penalty in the [weight-d] stanza.
Achim Ruopp has created a package to run the Moses pipeline on the Amazon cloud. This would be very useful for people who don't have their own SGE cluster. More details from the Amazon webpage, or from Achim directly. Achim has also created a tutorial.
Kamil Kos has created scripts/training/zmert-moses.pl, a replacement for mert-moses(-new).pl for those who wish to use ZMERT. The zmert-moses.pl script supports most of the mert-moses-new.pl parameters, therefore the transition to the new zmert version should be relatively easy. For more details on supported parameters run scripts/training/zmert-moses.pl --help.
The main advantage of ZMERT is that, in common with new mert, it can optimize towards different metrics (BLEU and TER are built-in). How to implement new metrics is described on ZMERT homepage. At Charles University in Prague, we're experimenting with SemPOS which is based on the tectogrammatical layer, see TectoMT.
ZMERT JAR, version 1.41, is located in zmert/zmert.jar. It contains one bug fix, which occured when optimizing a parameter with almost only zero values. The original version zmert/zmert_v1.41.jar is also included. If you would like to add a new metric, please, modify the zmert/zmert.jar file in the following way:
zmert.jar content by typing jar xf zmert.jar
NewMetric.java.template)
javac *.java
zmert.jar by typing jar cvfM zmert.jar *.java* *.class
This option forces Moses to start generating the translation from a non-empty hypothesis. This can be useful in situations, when you have already translated some part of the sentence and want to get a suggestion or an n-best-list of continuations.
Use -continue-partial-translation (-cpt) to activate this feature. With -cpt, moses accepts also a special format of the input: three parameters delimited by the triple bar (|||). The first parameter is the string of output produced so far (used for LM scoring). The second parameter is the coverage vector of input words are already translated by the output so far, written as a string of "1"s and "0"s of the same length as there are words in the input sentence. The third parameter is the source sentence.
Example:
% echo "that is ||| 11000 ||| das ist ein kleines haus" | moses -f moses.ini -continue-partial-translation that is a small house % echo "that house ||| 10001 ||| das ist ein kleines haus" | moses -f moses.ini -continue-partial-translation that house is a little
If the input does not fit to this pattern, it is treated like normal input with no words translated yet.
This type of input is currently not compatible with factored models or confusion networks. The standard non-lexicalized distortion works but more or less as one would expect (note that some input coverage vectors may prohibit translation under low distortion limits). The lexicalized reordering has not been tested.
Options
-continue-partial-translation (-cpt) -- activate the feature