Moses
statistical
machine translation
system

Adding Sparse Feature Functions

Moses allows for sparse feature functions, i.e., feature functions that have a large, maybe unbounded, set of features, of which only a small subset applies to a given hypothesis.

ALL feature functions can contain sparse features. They don't have to specify if or how many sparse features they will have. Contrast this with dense features, where feature function must specify how many scores they have at construction time.

To give an example: In addition to a regular n-gram language model, we could introduce a discriminative bigram language model that discounts or promotes hypotheses that contain specific bigrams. Each bigram in this feature function is its own feature with its own feature weight.

These features cannot be tuned with MERT, but Moses has several other suitable tuning methods.

The incorporation of sparse features into the training pipeline is ongoing.

Implementation

For basics, please refer to the respective section on Feature Functions.

Header

 class PhraseLengthFeature : public StatelessFeatureFunction {
 public:
   PhraseLengthFeature(const string &line):
      StatelessFeatureFunction(0, line)
   {}

This creates a feature function PhraseLengthFeature with no dense features, but it can have sparse features.

Setting feature values

As with all feature functions, sparse feature functions should implement the appropriate Evaluate() methods described in FeatureFunctions.

In the Evaluate() methods, a particular sparse score can can set using by calling the function

  accumulator->PlusEquals(this, <name>, <value>);

where <name> is a key (a string) of the sparse feature.

Contrast this with setting a dense score:

  accumulator->PlusEquals(this, <vector-of-values>);

Weights

There is no need to define a switch [weight] for the feature function. Each feature of the feature function has its own named weight, which is a concatenation of the short name of the feature function, and underscore (_) and its individual name for which the feature function sets.

These weights are placed into a weight file which is specified with the switch --weight-file. For example, the target bigram feature weights (feature function short name dlmb for discriminative language model, bigrams) may have weights defined in lines such this:

 dlmb_of:the 0.1
 dlmb_in:the -0.1
 dlmb_the:way 0.2

Features that do not have weights that are defined in this file are set to 0.

Edit - History - Print
Page last modified on December 06, 2013, at 03:15 PM