For training a factored model, you will specify a number of additional training parameters:
--alignment-factors FACTORMAP --translation-factors FACTORMAPSET --reordering-factors FACTORMAPSET --generation-factors FACTORMAPSET --decoding-steps LIST
It is usually better to carry out the word alignment (step 2-3 of the training process) on more general word representations with rich statistics. Even successful word alignment with words stemmed to 4 characters have been reported. For factored models, this suggests that word alignment should be done only on either the surface form or the stem/lemma.
Which factors are used during word alignment is set with the
--alignment-factors switch. Let us formally define the parameter syntax:
The switch requires a FACTORMAP as argument, for instance
0-0 (using only factor 0 from source and target language) or
0,1,2-0,1 (using factors 0, 1, and 2 from the source language and 0 and 1 from the target language).
Typically you may want to train the word alignment using surface forms or lemmas.
The purpose of training factored translation model is to create one or more translation tables between a subset of the factors. All translation tables are trained from the same word alignment, and are specified with the switch
To define the syntax, we have to extend our parameter syntax with
since we want to specify multiple mappings.
One example is
--translation-factors 0-0+1-1,2, which create the two tables
Reordering tables can be trained with
Syntax is the same as for translation factors.
Finally, we also want to create generation tables between target factors. Which tables to generate is specified with
--generation-factors, which takes a FACTORMAPSET as a parameter. Note that this time the mapping is between target factors, not between source and target factors.
One example is
--generation-factors 0-1 with creates a generation table between factor factor 0 and 1.
The mapping from source words in factored representation into target words in factored representation takes place in a number of mapping steps (either using a translation table or a generation table). These steps are specified with the switch
--decoding-steps t0,g0,t1,t2,g1 specifies that mapping takes place in form of an initial translation step using translation table 0, then a generation step using generation table 0, followed by two translation steps using translation tables 1 and 2, and finally a generation step using generation table 1. (The specific names
t0, t1, ... are automatically assigned to translation tables in the order you define them with
--translation-factors, and likewise for
It is possible to specify multiple decoding paths, for instance by
--decoding-steps t0,g0,t1,t2,g1:t3, where colons separate the paths. Translation options are generated from each decoding path and used during decoding.