machine translation

Sparse Features

Sparse feature functions in Moses allow for thousands of features that follow a specific pattern, typically lexical instantiations of a general feature function. Take for instance the target word insertion feature function, which allows the training of lexical indicators for any word (say, the or fish). Each lexicalized instantiation has its own feature weight, which is typically trained during tuning. Inserting a the should be fine, inserting the word fish not so much, and the learned feature weight should reflect this.

In Moses, all feature functions can contain sparse features and dense features. The number of dense feature has to be specified in advance in moses.ini file, e.g.,

  KENLM num-features=1 ...

The decoder doesn't have to know whether a feature function contains sparse features. And by definition, the number of sparse features is not specified beforehand.

Sparse lexical features require a special weight file that contains the weight for each instantiation of a feature.

The weight file has to be specified in the moses.ini file:


This file may look like:

 twi_fish -0.5
 twi_of -0.001

By convention, the format for sparse features is


Of course, you want to learn these feature weights during tuning, which requires the use of either PRO or kbMIRA - it does not work with plain MERT.

Word Translation Features

There are three types of lexical feature function:

  • word translation feature, which indicates if a specific source word was translated as a specific target word
  • target word insertion, which indicates if a specific target word has no alignment point (aligns to no source word in the word alignment stored for the phrase pair)
  • source word deletion, which indicates if a specific source word has no alignment point

Specification in moses.ini

The following lines need to be added to the configuration file:

 TargetWordInsertionFeature factor=FACTOR [path=FILE]
 SourceWordDeletionFeature factor=FACTOR [path=FILE]
 WordTranslationFeature input-factor=FACTOR output-factor=FACTOR \ 
                        [source-path=FILE] [target-path=FILE]-path= \ 
                        simple=1 source-context=0 target-context=0

Note that there is no corresponding weight setting for these features.

The optional word list files (one token per line) restrict the feature function to the specified words. If no word list file is specified, then features for all words a generated.

Specification with experiment.perl

Word translation features can be specified as follows:

 TRAINING:sparse-features = \
   "target-word-insertion top 50, source-word-deletion top 50, \
   word-translation top 50 50"

This specifications includes

  • target word insertion features for the top 50 most frequent target words
  • source word deletion features for the top 50 most frequent source words
  • word translation features for the top 50 most frequent target words and top 50 most frequent source words

Instead of top 50, you can also specify all when you do not want to have a restricted word list.

Moreover, for the word translation feature, by specifying factor 1-2, you can change input and output factor for the feature. For the deletion and insertion features, there is only one factor to specify, e.g., factor 1.

Phrase Length Features

The phrase length feature function creates three features for each phrase pair:

  • the length of the source phrase (in tokens)
  • the length of the target phrase
  • the pair of the two values above

For instance, when the phrase ein Riesenhaus is translated into a giant house, then the three features pl_s2 (2 source words), pl_t3 (3 target words), and pl_2,3 (2 source words into 3 target words) are triggered.

Specification in moses.ini

The following lines need to be added to the configuration file:


Specification with experiment.perl

The inclusion of the phrase length feature is similar to the word translation feature:

 TRAINING:sparse-features = "phrase-length"

In case of using both the phrase length feature and the word translation features, you will need to include them in the same line.

Domain Features

Domain features flag each phrase pair on in which domain (or more accurately: which subset of the training data) they occur in.

Specification in moses.ini

Domain features are part of the phrase table, there is no specific support for his particular type of feature function. A sparse phrase table may include any other arbitrary features. Each line in the phrase table has to contain an additional field that lists the feature name and its log-probability value.

For example, the following phrase pair contains the domain feature flagging that the phrase pair occurred in the europarl part of the training corpus:

 das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 ||| 0-0 1-1 \
 ||| 5000 5000 2500 ||| dom_europarl 1

If a phrase table contains sparse features, then this needs to be flagged in the configuration file by adding the word sparse after the phrase table file name.

Specification with experiment.perl

 TRAINING:domain-features = "[sparse ](indicator|ratio|subset)"

There are various settings for domain adaptation features. It requires a domain file that indicates at which lines in the parallel corpus cover lines that stem from different [CORPUS] blocks (default, when used in experiment.perl, but a different domain-file can be also specified.

These features may included as sparse features or as core features in the phrase table, depending in having the prefix Sparse in the parameter.

There are three kind of features:

  • Indicator: Each phrase pair is marked if it occurs in a specific domain
  • Ratio: Each phrase pair is marked with exp(0) <= log(r) <= exp(1) float feature depending on the ratio r how often it occurs in corpus r.
  • Subset: Similar to the indicator feature, but if a phrase pair occurs in multiple domains if is marked with these domains in one feature
  • Bin (not implemented, the idea is the count bin feature mentioned below but with marking count intervals for each domain).

Count Bin Features

The frequency of a phrase pair in the training data may be a useful to determine its reliability. The count bin features are integrated into the phrase table, just like the domain features, so please check that documentation.

Specification with experiment.perl

The counts of phrase pairs get very sparse for frequent phrases. There are just not that many phrase pairs that occur exactly 634,343 times. Hence, we bin phrase pairs counts, for instance phrase pairs that occur once, twice, three to nine times, and more often.

In experiment.perl this is accomplished with an additional switch in score settings. For the example above this looks like this:

 TRAINING:score-settings = "--[Sparse]CountBinFeature 1 2 3 10"

Based on the values that are given, different indicator features are included, depending on which interval count the phrase pair falls, e.g., ]2;3] = third bin.

Bigram Features


Soft Matching Features

Models with target syntax require an exact match between nonterminals in a rule and the left-hand-side label of rules that can be substituted into it. With the following rules, a model could be used to decode 'she slept here', but not 'she slept on the floor'.

  S --> she slept AVP1 ||| sie schlief AVP1
  AVP --> here ||| hier
  PP --> on the floor ||| auf dem boden

With soft matching, we can allow substitutions of nonterminals even if they do not match.

Specification in moses.ini

The following lines need to be added to the configuration file:

 SoftMatchingFeature path=FILE

with FILE containing a user-defined list of allowed substitutions. For the example above, the file needs to contain the following line:


Each substitution (even exact matches) triggers a sparse feature which can be used to prefer some substitutions over others.

The SoftMatchingFeature operates on the target-side labels and is not (yet) implemented for the Scope3 and OnDisk phrase tables.

Edit - History - Print
Page last modified on April 04, 2014, at 04:06 PM