Moses
statistical
machine translation
system

Advanced Features of the Decoder

The basic features of the decoder are explained in the Tutorial. Here, we describe some additional features that have been demonstrated to be beneficial in some cases.

Content

Lexicalized Reordering Models

The default standard model that for phrase-based statistical machine translation is only conditioned on movement distance and nothing else. However, some phrases are reordered more frequently than others. A French adjective like extérieur typically gets switched with the preceding noun, when translated into English.

Hence, we want to consider a lexicalized reordering model that conditions reordering on the actual phrases. One concern, of course, is the problem of sparse data. A particular phrase pair may occur only a few times in the training data, making it hard to estimate reliable probability distributions from these statistics.

Therefore, in the lexicalized reordering model we present here, we only consider three reordering types: (m) monotone order, (s) switch with previous phrase, or (d) discontinuous. See below for an illustration of these three different types of orientation of a phrase.

To put if more formally, we want to introduce a reordering model po that predicts an orientation type {m,s,d} given the phrase pair currently used in translation:

orientation ε {m, s, d}

po(orientation|f,e)

How can we learn such a probability distribution from the data? Again, we go back to the word alignment that was the basis for our phrase table. When we extract each phrase pair, we can also extract its orientation type in that specific occurrence.

Looking at the word alignment matrix, we note for each extracted phrase pair its corresponding orientation type. The orientation type can be detected, if we check for a word alignment point to the top left or to the top right of the extracted phrase pair. An alignment point to the top left signifies that the preceding English word is aligned to the preceding Foreign word. An alignment point to the top right indicates that the preceding English word is aligned to the following french word. See below for an illustration.

The orientation type is defined as follows:

  • monotone: if a word alignment point to the top left exists, we have evidence for monotone orientation.
  • swap: if a word alignment point to the top right exists, we have evidence for a swap with the previous phrase.
  • discontinuous: if neither a word alignment point to top left nor to the top right exists, we have neither monotone order nor a swap, and hence evidence for discontinuous orientation.

We count how often each extracted phrase pair is found with each of the three orientation types. The probability distribution po is then estimated based on these counts using the maximum likelihood principle:

po(orientation|f,e) = count(orientation,e,f) / Σo count(o,e,f)

Given the sparse statistics of the orientation types, we may want to smooth the counts with the unconditioned maximum-likelihood probability distribution with some factor Σ:

po(orientation) = Σf Σe count(orientation,e,f) / Σo Σf Σe count(o,e,f)

po(orientation|f,e) = (σ p(orientation) + count(orientation,e,f) ) / ( σ + Σo count(o,e,f) )

There are a number of variations of this lexicalized reordering model based on orientation types:

  • bidirectional: Certain phrases may not only flag, if they themselves are moved out of order, but also if subsequent phrases are reordered. A lexicalized reordering model for this decision could be learned in addition, using the same method.
  • f and e: Out of sparse data concerns, we may want to condition the probability distribution only on the foreign phrase (f) or the English phrase (e).
  • monotonicity: To further reduce the complexity of the model, we might merge the orientation types swap and discontinuous, leaving a binary decision about the phrase order.

These variations have shown to be occasionally beneficial for certain training corpus sizes and language pairs. Moses allows the arbitrary combination of these decisions to define the reordering model type (e.g. bidrectional-monotonicity-f). See more on training these models in the training section of this manual.

Binary Phrase Tables with On-demand Loading

For larger tasks the phrase tables usually become huge, typically too large to fit into memory. Therefore, moses supports a binary phrase table with on-demand loading, i.e. only the part of the phrase table that is required to translate a sentence is loaded into memory.

You have to convert the standard ascii phrase tables into the binary format. Here is an example (standard phrase table phrase-table, with 5 scores):

 cat phrase-table | sort | mosesdecoder/misc/processPhraseTable \
   -ttable 0 0 - -nscores 5 -out phrase-table

Options:

  • -ttable int int string -- translation table file, use '-' for stdin
  • -out string -- output file name prefix for binary ttable
  • -nscores int -- number of scores in ttable

If you just want to convert a phrase table, the two ints in the -ttable option do not matter, so use 0's.

Important: Make sure you set the environment variable LC_ALL=C for sorting. If your phrase table is already sorted, you can skip that.

The output files will be:

 phrase-table.binphr.idx 
 phrase-table.binphr.srctree
 phrase-table.binphr.srcvoc
 phrase-table.binphr.tgtdata
 phrase-table.binphr.tgtvoc

In the moses config file, specify phrase-table as phrase table. Moses will check if the binary version exists and use it.

Word-to-word alignment: To include in the binary phrase table the word-to-word alignments between source and target phrases which are contained in the textual phrase table (see Training Step "Score Phrases"), specify the option -alignment-info in the processPhraseTable command. The two output files ".srctree" and " .tgtdata" will end with the suffix ".wa".

Note:

  • if your textual PT does NOT contain w2w alignments, you CANNOT use "-alignment-info" to output w2w alignments in the binary format: you get an error message
  • if your textual PT does contain w2w alignments you CAN use "-alignment-info" to create binary PT WITH w2w alignments: two of the binary PT data files (srctree and tgtdata) have suffix (.wa)
  • if your textual PT does contain w2w alignments you CAN avoid "-alignment-info" to create binary PT WITHOUT w2w alignments
  • by comparing data files of binary PT with and without w2w alignments, three of them (idx, srcvoc and tgtvoc) are identical and two of them (srctree and tgtdata) differ

Moses will check if the binary version with word-to-word alignments exists and optionally use it through the options use-alignment-info, -print-alignment-info and -print-alignment-info-in-n-best.

Binary Reordering Tables with On-demand Loading

The reordering tables may be also converted into a binary format. The command is slightly simpler:

 mosesdecoder/misc/processLexicalTable -in reordering-table -out reordering-table

The file names for input and output are typically the same, since the actual output file names have similar extensions to the phrase table file names.

XML Markup

Sometimes we have external knowledge that we want to bring to the decoder. For instance, we might have a better translation system for translating numbers of dates. We would like to plug in these translations to the decoder without changing the model.

The -xml-input flag is used to activate this feature. It can have one of four values:

  • exclusive Only the XML-specified translation is used for the input phrase. Any phrases from the phrase table that overlap with that span are ignored.
  • inclusive The XML-specified translation competes with all the phrase table choices for that span.
  • ignore The XML-specified translation is ignored completely.
  • pass-through (default) For backwards compatibility, the XML data is fed straight through to the decoder. This will produce erroneous results if the decoder is fed data that contains XML markup.

The decoder has an XML markup scheme that allows the specification of translations for parts of the sentence. In its simplest form, we can tell the decoder what to use to translate certain words or phrases in the sentence:

 % echo 'das ist <np translation="a cute place">ein kleines haus</np>' \
   | moses -xml-input exclusive -f moses.ini
 this is a cute place

 % echo 'das ist ein kleines <n translation="dwelling">haus</n>' \
   | moses -xml-input exclusive -f moses.ini
 this is a little dwelling

The words have to be surrounded by tags, such as <np...> and </np>. The name of the tags can be chosen freely. The target output is specified in the opening tag as a parameter value for a parameter that is called english for historical reasons (the canonical target language).

We can also provide a probability along with these translation choice. The parameter must be named prob and should contain a single float value. If not present, an XML translation option is given a probability of 1.

 % echo 'das ist ein kleines <n translation="dwelling" prob="0.8">haus</n>' \
   | moses -xml-input exclusive -f moses.ini \
 this is a little dwelling

This probability isn't very useful without letting the decoder have other phrase table entries "compete" with the XML entry, so we switch to inclusive mode. This allows the decoder to use either translations from the model or the specified xml translation:

 % echo 'das ist ein kleines <n translation="dwelling" prob="0.8">haus</n>' \
   | moses -xml-input inclusive -f moses.ini
 this is a small house

The switch -xml-input inclusive gives the decoder a choice between using the specified translations or its own. This choice, again, is ultimately made by the language model, which takes the sentence context into account.

This doesn't change the output from the non-XML sentence because that prob value is first logged, then split evenly among the number of scores present in the phrase table. Additionally, in the toy model used here, we are dealing with a very dumb language model and phrase table. Setting the probability value to something astronomical forces our option to be chosen.

 % echo 'das ist ein kleines <n translation="dwelling" prob="0.8">haus</n>' \
   | moses -xml-input inclusive -f moses.ini
 this is a little dwelling

The XML-input implementation is NOT currently compatible with factored models or confusion networks.

Options

  • -xml-input ('pass-through' (default), 'inclusive', 'exclusive', 'ignore')

Generating n-Best Lists

The generation of n-best lists (the top n translations found by the search according to the model) is pretty straight-forward. You simple have to specify the file where the n-best list will be stored and the size of the n-best list for each sentence.

Example: The command

 % moses -f moses.ini -n-bestlist listfile 100 < in

stores the n-best list in the file listfile with up to 100 translations per input sentence.

Here an example n-best list:

 0 ||| we must discuss on greater vision .  ||| d: 0 -5.56438 0 0 -7.07376 0 0 \
   lm: -36.0974 -13.3428 tm: -39.6927 -47.8438 -15.4766 -20.5003 4.99948 w: -7 ||| -9.2298
 0 ||| we must also discuss on a vision .  ||| d: -10 -2.3455 0 -1.92155 -3.21888 0 -1.51918 \
   lm: -31.5841 -9.96547 tm: -42.3438 -48.4311 -18.913 -20.0086 5.99938 w: -8 ||| -9.26197
 0 ||| it is also discuss a vision .  ||| d: -10 -1.63574 -1.60944 -2.70802 -1.60944 -1.94589 -1.08417 \
   lm: -31.9699 -12.155 tm: -40.4555 -46.8605 -14.3549 -13.2247 4.99948 w: -7 ||| -9.31777

Each line of the n-best list file is made up of (separated by |||):

  • sentence number (in above example 0, the first sentence)
  • output sentence
  • individual component scores (unweighted)
  • weighted overall score

Note that it is possible (and very likely) that the n-best list contains many sentences that look the same on the surface, but have different scores. The most common reason for this is different phrase segmentation (two words may be mapped by a single phrase mapping, or by two individual phrase mappings for each word).

To produce an n-best list that only contains the first occurrence of an output sentence, add the word distinct after the file and size specification:

 % moses -f moses.ini -n-bestlist listfile 100 distinct < in

This creates an n-best list file that contains up to 100 distinct output sentences for each input sentences. Note that potentially a large numbers of candidate translations have to be examined to find the top 100. To keep memory usage in check only 20 times the specified number of distinct entries are examined. This factor can be changed with the switch -n-best-factor.

Options

  • -n-best-list FILE SIZE [distinct] --- output an n-best list of size SIZE to file FILE
  • -n-best-factor FACTOR --- exploring at most FACTOR*SIZE candidates for distinct
  • -include-alignment-in-n-best --- output of word-to-word alignments in the n-best list; it requires that w2w alignments are included in the phrase tabel and that -use-alignment-info is set. (See here for further details).

Word-to-word alignment

If the phrase table (binary or textual) includes word-to-word alignments between source and target phrases (see "Score Phrases" and "Binary Phrase Table"), Moses can report them in the output.

There are three options that control the output of alignment infotmation: -use-alignment-info, -print-alignment-info, and -print-alignment-info-in-n-best.

For instance, by translating the sentence "ich frage" from German into English and activating all parameters, you get in the verbose output:

 BEST TRANSLATION: i ask [11]  [total=-1.429] <<features>> [f2e: 0=0 1=1] [e2f: 0=0 1=1]

The last two fields report the word-to-word alignments from source to target and from target to source, respectively.

In the n-best list you get:

 0 ||| i ask  ||| ...feature_scores.... ||| -1.42906 ||| 0-1=0-1 ||| 0=0 1=1 ||| 0=0 1=1
 0 ||| i am asking  ||| ...feature_scores.... ||| -2.61281 ||| 0-1=0-2 ||| 0=0 1=1,2 ||| 0=0 1=1 2=1
 0 ||| i ask you  ||| ...feature_scores.... ||| -3.1068 ||| 0-1=0-2 ||| 0=0 1=1,2 ||| 0=0 1=1 2=1
 0 ||| i ask this  ||| ...feature_scores.... ||| -3.48919 ||| 0-1=0-2 ||| 0=0 1=1 ||| 0=0 1=1 2=-1

Indexes (starting from 0) are used to refer to words. '2=-1' means that the word of index 2 (i.e. the word) is not associated with any word in the other language. For instance, by considering the last translation hypothesis "i ask this" of "ich frage", the source to target alignment ("0=0 1=1") means that:

 German   -> English
 ich      -> i
 frage    -> ask

and viceversa the target to source alignment ("0=0 1=1 2=-1") means that:

 English  -> German
 i        -> ich
 ask      -> frage
 this      -> 

Note: in the same translation hypothesis, the the field "0-1=0-2" after the global score refers to the phrase-to-phrase alignment and means that "ich frage" is translated as a phrase into the three-word English phrase "i ask you". This information is generated if the option -include-alignment-in-n-best is activated.

Important: the phrase table can include different word-to-word alignments for the source-to-target and target-to-source directions, at least in principle. Hence, the two alignments can differ.

Options

  • -use-alignment-info -- to activate this feature
  • -print-alignment-info -- to output the word-to-word alignments into the verbose output.
  • -print-alignment-info-in-n-best -- to output the word-to-word alignments into the verbose output.

Minimum Bayes Risk Decoding

Minumum Bayes Risk (MBR) decoding was proposed by Shankar Kumar and Bill Byrne (HLT/NAACL 2004). Roughly speaking, instead of outputting the translation with the highest probability, MBR decoding outputs the translation that is most similar to the most likely translations. This requires a similarity measure to establish similar. In Moses, this is a smoothed BLEU score.

Using MBR decoding is straight-forward, just use the switch -mbr when invoking the decoder.

Example:

 % moses -f moses.ini -mbr < in

MBR decoding uses by default the top 200 distinct candidate translations to find the translation with minimum Bayes risk. If you want to change this to some other number, use the switch -mbr-size:

 % moses -f moses.ini -decoder-type 1 -mbr-size 100 < in

MBR decoding requires that the translation scores are converted into probabilities that add up to one. The default is to take the log-scores at face value, but you may get better results with scaling the scores. This may be done with the switch -mbr-scale, so for instance:

 % moses -f moses.ini -decoder-type 1 -mbr-scale 0.5 < in

Options

  • -mbr -- use MBR decoding
  • -mbr-size SIZE -- number of translation candidates consider (default 200)
  • -mbr-scale SCALE -- scaling factor used to adjust the translation scores (default 1.0)

Handling Unknown Words

Unknown words are copied verbatim to the output. They are also scored by the language model, and may be placed out of order. Alternatively, you may want to drop unknown words. To do so add the switch -drop-unknown.

When translating between languages that use different writing sentences (say, Chinese-English), dropping unknown words results in better BLEU scores. However, it is misleading to a human reader, and it is unclear what the effect on human judgment is.

Options

  • -drop-unknown -- drop unknown words instead of copying them into the output

Output Search Graph

It may be useful for many downstream applications to have a dump of the search graph, for instance to compile a word lattice. One the one hand you can use the -verbose 3 option, which will give a trace of all generated hypotheses, but this creates logging of many hypotheses that get immediately discarded. If you do not want this, a better option is using the switch -output-search-graph FILE, which also provides some additional information.

The generated file contains lines that could be seen as both a dump of the states in the graph and the transitions in the graph. The state graph more closely reflects the hypotheses that are generated in the search. There are three types of hypotheses:

  • The initial empty hypothesis is the only one that is not generated by a phrase translation
 0 hyp=0 stack=0 [...]
  • Regular hypotheses
 0 hyp=17 stack=1 back=0 score=-1.33208 [...] covered=0-0 out=from now on
  • Recombined hypotheses
 0 hyp=5994 stack=2 back=108 score=-1.57388 [...] recombined=13061 [...] covered=2-2 out=be

The relevant information for viewing each line as a state in the search graph is the sentence number (initial 0), the hypothesis id (hyp), the stack where the hypothesis is placed (same as number of foreign words covered, stack), the back-pointer to the previous hypotheses (back), the score so far (score), the last output phrase (out) and that phrase's foreign coverage (covered). For recombined hypotheses, also the superior hypothesis id is given (recombined).

The search graph output includes additional information that is computed after the fact. While the backpointer and score (back, score) point to the cheapest path and cost to the beginning of the graph, the generated output also inclused the pointer to the cheapest path and score (forward, fscore) to the end of the graph.

One way to view the output of this option is a reflection of the search and all (relevant) hypotheses that are generated along the way. But often, we want to generate a word lattice, where the states are less relevant, but the information is in the transitions from one state to the next, each transition emitting a phrase at a certain cost. The initial empty hypothesis is irrelevant here, so we need to consider only the other two hypothesis types:

  • Regular hypotheses
 0 hyp=17 [...] back=0 [...] transition=-1.33208 [...] covered=0-0 out=from now on 
  • Recombined hypotheses
 0 [...] back=108 [...] transition=-0.640114 recombined=13061 [...] covered=2-2 out=be

For the word lattice, the relevant information is the cost of the transition (transition), its output (out), maybe the foreign coverage (covered), and the start (back) and endpoint (hyp). Note that the states generated by recombined hypothesis are ignored, since the transition points to the superior hypothesis (recombined).

Here, for completeness sake, the full lines for the three examples we used above:

 0 hyp=0 stack=0 forward=9 fscore=-107.279
 0 hyp=17 stack=1 back=0 score=-1.33208 transition=-1.33208 \
   forward=517 fscore=-106.484 covered=0-0 out=from now on 
 0 hyp=5994 stack=2 back=108 score=-1.57388 transition=-0.640114 \
   recombined=13061 forward=22455 fscore=-106.807 covered=2-2 out=be

What is the difference between the search graph output file generated with this switch and the true search graph?

  • It contains the additional forward costs and forward paths.
  • It also only contains the hypotheses that are part of a fully connected path from the initial empty hypothesis to a final hypothesis that covers the full foreign input sentence.
  • The recombined hypotheses already point to the correct superior hypothesis, while the -verbose 3 log shows the recombinations as they happen (recall that momentarily superior hypotheses may be recombined to even better ones down the road).

Note again that you can get the full search graph with the -verbose 3 option. It is, however, much larger and mostly consists of discarded hypotheses.

Options

  • -output-search-graph FILE -- output the search graph for each sentence in a file

Early Discarding of Hypotheses

During the beam search, many hypotheses are created that are too bad to be even entered on a stack. For many of them, it is even clear before the construction of the hypothesis that it would be not useful. Early discarding of such hypotheses hazards a guess about their viability. This is based on correct score except for the actual language model costs which are very expensive to compute. Hypotheses that, according to this estimate, are worse than the worst hypothesis of the target stack, even given an additional specified threshold as cushion, are not constructed at all. This often speeds up decoding significantly. Try threshold factors between 0.5 and 1.

Options

  • -early-discarding-threshold THRESHOLD -- use early discarding of hypotheses with the specified threshold (default: 0 = not used)

Maintaining stack diversity

The beam search organizes and compares hypotheses based on the number of foreign words they have translated. Since they may have different foreign words translated, we use future score estimates about the remaining sentence translation score.

Instead of comparing such apples and oranges, we could also organize hypotheses by their exact foreign word coverage. The disadvantage of this is that it would require an exponential number of stacks, but with reordering limits the number of stacks is only exponential with regard to maximum reordering distance.

Such coverage stacks are implemented in the search, and their maximum size is specified with the switch -stack-diversity (or -sd), which sets the maximum number of hypotheses per coverage stack.

The actual implementation is a hybrid of coverage stacks and foreign word count stacks: the stack diversity is a constraint on which hypotheses are kept on the traditional stack. If the stack diversity limits leave room for additional hypotheses according to the stack size limit (specified by -s, default 200), then the stack is filled up with the best hypotheses, using score so far and the future score estimate.

Options

  • -stack-diversity LIMIT -- keep a specified number of hypotheses for each foreign word coverage (default: 0 = not used)

Cube Pruning

Cube pruning, as described by Liang Huang and David Chiang (2007), has been implemented in the Moses decoder. This is in addition to the traditional search algorithm. The code offers developers the opportunity to implement different search algorithms using an extensible framework.

Cube pruning is faster than the traditional search at comparable levels of search errors. To get faster performance than the default Moses setting at roughly the same performance, use the parameter settings:

 -search-algorithm 1 -cube-pruning-pop-limit 2000 -s 2000

This uses cube pruning (-search-algorithm) that adds 2000 hypotheses to each stack (-cube-pruning-pop-limit 2000) and also increases the stack size to 2000 (-s 2000). Note that with cube pruning, the size of the stack has little impact on performance, so it should be set rather high. The speed/quality trade-off is mostly regulated by the cube pruning pop limit, i.e. the number of hypotheses added to each stack.

Stacks are organized by the number of foreign words covered, so they may differ by which words are covered. You may also require that a minimum number of hypotheses is added for each word coverage (they may be still pruned out, however). This is done using the switch -cube-pruning-diversity MINIMUM which sets the minimum. The default is 0.

Options

  • -search-algorithm 1 -- turns on cube pruning
  • -cube-pruning-pop-limit LIMIT -- number of hypotheses added to each stack
  • -cube-pruning-diversity MINIMUM -- minimum number of hypotheses from each coverage pattern

Specifying Reordering Constraints

For various reasons, it may be useful to specify reordering constraints to the decoder, for instance because of punctuation. Consider the sentence:

 I said " This is a good idea . " , and pursued the plan .

The quoted material should be translated as a block, meaning that once we start translating some of the quoted words, we need to finish all of them. We call such a block a zone and allow the specification of such constraints using XML markup.

 I said <zone> " This is a good idea . " </zone> , and pursued the plan .

Another type of constraints are walls which are hard reordering constraints: First all words before a wall have to be translated, before words afterwards are translated. For instance:

 This is the first part . <wall /> This is the second part .

Walls may be specified within zones, where they act as local walls, i.e. they are only valid within the zone.

 I said <zone> " <wall /> This is a good idea . <wall /> " </zone> , and pursued the plan .

If you add such markup to the input, you need to use the option -xml-input with either exclusive or inclusive (there is no difference between these options in this context).

Specifying reordering constraints around punctuation is often a good idea. The switch -monotone-at-punctuation introduces walls around the punctuation tokens ,.!?:;".

Options

  • walls and zones have to specified in the input using the tags <zone>, </zone>, and <wall>.
  • -xml-input -- needs to be exclusive or inclusive
  • -monotone-at-punctuation (-mp) -- adds walls around punctuation ,.!?:;".

Multiple Translation Tables

Moses allows the use of multiple translation tables, but there are two different ways how they are used:

  • both translation tables are used for scoring: This means that every translation option is collected from each table and scored by each table. This implies that each translation option has to be contained in each table: if it is missing in one of the tables, it can not be used.
  • either translation table is used for scoring: Translation options are collected from one table, and additional options are collected from the other tables. If the same translation option (in terms of identical input phrase and output phrase) is found in multiple tables, separate translation options are created for each occurrence, but with different scores.

In any case, each translation table has its own set of weights.

First, you need to specify the translation tables in the section [ttable-file] of the moses.ini configuration file, for instance:

 [ttable-file]
 0 0 5 /my-dir/table1
 0 0 5 /my-dir/table2

Secondly, you need to set the appropriate number of weights in the section [weight-t], in our example that would be 10 weights (5 for each table).

Thirdly, you need to specify how the tables are used in the section [mapping]. As mentioned above, there are two choices:

  • scoring with both tables:
 [mapping]
 T 0
 T 1
  • scoring with either table:
 [mapping]
 0 T 0
 1 T 1

Note: what we are really doing here is using Moses' capabilities to use different encoding paths. The number before "T" defines a decoding path, so in the second example are two different decoding paths specified. Decoding paths may also contain additional mapping steps, such as generation steps and translation steps using different factors.

Also note that there is no way to have the option "use both tables, if the phrase pair is in both table, otherwise use only the table where you can find it". Keep in mind, that scoring a phrase pair involves a cost and lowers the chances that the phrase pair is used. To effectively use this option, you may create a third table that consists of the intersection of the two phrase tables, and remove shared phrase pairs from each table.

Pruning the Translation Table

The translation table contains all phrase pairs found in the parallel corpus, which includes a lot of noise. To reduce the noise, recent work by Johnson et al. has suggested to prune out unlikely phrase pairs. For more detail, please refer to the paper:

H. Johnson, J. Martin, G. Foster and R. Kuhn. (2007) '''Improving Translation Quality by Discarding Most of the Phrasetable'''. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 967-975.

Build Instructions

Moses includes a re-implementation of this method in the directory sigtest-filter. You first need to build it from the source files.

This implementation relies on Joy Zhang's SALM Suffix Array toolkit.

  1. download and extract the SALM source release.
  2. in SALM/Distribution/Linux type: make
  3. enter the directory sigtest-filter in the main Moses distribution directory
  4. type make SALMDIR=/path/to/SALM

Usage Instructions

Using the SALM/Bin/Linux/Index/IndexSA.O32, create a suffix array index of the source and target sides of your training bitext (SOURCE, TARGET).

 % SALM/Bin/Linux/Index/IndexSA.O32 TARGET
 % SALM/Bin/Linux/Index/IndexSA.O32 SOURCE

Prune the phrase table:

 % cat phrase-table | ./filter-pt -e TARGET -f SOURCE -l FILTER-VALUE > phrase-table.pruned

FILTER-VALUE is the -log prob threshold described in Johnson et al. (2007)'s paper. It may be either 'a+e', 'a-e', or a positive real value. Run with no options to see more use-cases. A good setting is -l a+e -n 30, which also keeps only the top 30 phrase translations for each source phrase, based on p(e|f).

Multi-threaded Moses

The latest svn version of moses now supports multi-threaded operation, enabling faster decoding on multi-core machines. The current limitations of mult-threaded moses are:

  1. irstlm is not supported, since it uses a non-threadsafe cache
  2. lattice input may not work - this has not been tested
  3. increasing the verbosity of moses will probably cause mult-threaded moses to crash

To configure and build multi-threaded moses, you'll need to have boost installed (1.35 or higher) and use the configure line

 % ./configure --with-srilm=<path-to-srilm> --with-boost=<path-to-boost> --enable-threads

The boost path can be omitted if you have boost installed in a standard place. On 64-bit machines you may have to add --with-boost-thread=boost_thread-gcc43-mt (or similar) to the configure arguments.

After moses has been configured this way, running make will build two moses binaries, moses and mosesmt. The latter takes the same arguments as moses (although it doesn't currently support all of moses' i/o options) but it also admits an additional -threads n argument, specifying the size of the threadpool to use when running the decoder. Using a small number of threads (3-5) has been found to speed up decoding, although larger numbers do not seem to offer any further increase in speed. Multi-threaded moses is still experimental, and any feedback on its use would be greatly appreciated. Either mail me or the moses list.

Moses Server

The moses server enables you to run the decoder as a server process, and send it sentences to be translated via xmlrpc. This means that one moses process can service distributed clients coded in Java, perl, python, php, or any of the many other languages which have xmlrpc libraries.

To build the moses server, you need to have xmlrpc-c installed - it has been tested with the latest stable version, 1.16.19, and you need to add the argument --with-xmlrpc-c=<path-xmlrpc-c-config> to the configure arguments. You will also need to configure moses for mult-threaded operation, as described above.

Running make should then build an executable server/mosesserver. This can be launched using the same command-line arguments as moses, with two additional arguments to specify the listening port and log-file (--server-port and --server-log). These default to 8080 and /dev/null respectively.

A sample client is included in the server directory (in perl), which requires the SOAP::Lite perl module installed. To access the moses server, an xmlrpc request should be sent to http://host:port/RPC2 where the parameter is a map containing the keys text and (optionally) align. The value of the first of these parameters is the text to be translated and the second, if present, causes alignment information to be returned to the client. The client will receive a map containing the same two keys, where the value associated with the text key is the translated text, and the align key (if present) maps to a list of maps. The alignment gives the segmentation in target order, with each list element specifying the target start position (tgt-start), source start position (src-start) and source end position (src-end).

Amazon EC2 cloud

Achim Ruopp has created a package to run the Moses pipeline on the Amazon cloud. This would be very useful for people who don't have their own SGI cluster. More details from the Amazon webpage, or from Achim directly

   http://developer.amazonwebservices.com/connect/entry.jspa?externalID=3058&ca

Achim has also created a tutorial

   http://www.digitalsilkroad.net/walkthrough.pdf 
print
Page last modified on November 24, 2009, at 04:48 PM