The goal of regression testing is to ensure that any changes made to the decoder do not break what has been determined to be correct, previously. The regression test suite is fast enough to run often, but still should provide adequate confidence that nothing substantial has changed about the internal workings of moses. The regression test suite is designed to run on most UNIX-like systems. The regression test suite is run as part of the nightly build, so if you have problems with the regression tests you should first check if the nightly build succeeded.
The following regression tests are currently implemented (and many more have been added since this list was written):
basic-surface-onlyTests basic translation, compares output strings and probabality scores.
basic-surface-binptableTests binary phrase table
consensus-decoding-surfaceBasic test of consensus decoding
ptable-filteringTests the filtering of the phrase table by estimated phrase cost, ensures that the estimated phrase cost stays the same and that the same list of phrases is consistent. Matches pharaoh.
multi-factorTest that moses can do translation with two factors (Currently does a very basic test- it should be enhanced to at least include OOV words).
multi-factor-binptableTests factored setup with binary phrase table.
multi-factor-dropTest of dropping words in a multi-factor model.
nbest-multi-factorTests n-best list generation for multi-factor models
n-bestTest n-best filtering, ensure consistency of top scores and score components. This will require ensuring that any moses binary is capable of generating n-best lists.
lattice-surfaceTests lattice decoding
lattice-distortionTests lattice decoding with distortion (?)
confusionNet-surface-onlyTests confusion network decoding
confusionNet-multi-factorTests confusion network decoding with multiple factors
lexicalized-reorderingTests lexical reordering model
lexicalized-reordering-cnTests lexical reordering model in combination with confusion network
xml-markupTests XML Markup in input to specify translations
Download the regression tests
From the Moses root, run
./bjam --with-irstlm=/path/to/irst --with-cmph=/path/to/cmph --with-regtest=/path/to/moses-regression-tests -j8
This will run the regression tests in parallel (-j8) so be sure to set a number of CPUs that your machine can handle.
If all goes well, you will see a list of the tests run, their status (hopefully pass), and a path where the results are archived.
You can run a specific test by providing the name followed by ".passed"
./bjam --with-irstlm=/path/to/irst --with-cmph=/path/to/cmph --with-regtest=/path/to/moses-regression-tests mert.basic.passed
The test name is the same as the directory name in /path/to/moses-regression-tests/tests .
The test suite invokes moses to decode a few sample phrases with well-known models. The output from these invocations is then scraped for information (for example, the output translation of a sentence or its probability score) which is stored in a file called
results.dat. These values are then compared to a ground truth, which was established either by hand, from a prior moses run, or from a pharaoh run.
This will provide a point-by-point analysis of each failure or success in the test as well as information.
Note: Since the test suite relies on the output of moses, changes to the output format may result in broken tests. If you make changes that affect presentation only, you will need to update the testing filters (which convert the raw moses output into the
Writing regression tests is easy, but since these tests must be able to be run anywhere, it is important to keep a few things in mind. First, check out the regression-testing module from the Git repository. Settle on what you would like to test in and choose a test name (henceforth, this name will be
TEST-NAME). Create a directory for it under regression testing.
Place the following into the directory
to-translate, which contains the text that will be translated by moses.
moses.ini. This moses.ini file should have no absolute paths. All paths should be expressed in terms of the variables
filter-stdout. These files should read from
STDINand write results of the form
KEY = valueto
STDOUT. No other output should be generated. Numeric values (such as times) that do not require exact matches can have the form
KEY ~ value. These files are the trickiest part about writing a new regression test. However, they allow great flexibility in verifying specific aspects of a decoding run.
truth/results.txtThis file should have the values (as produced by filter-stderr and filter-stdout) that are expected from the test run.
If you need to add language models, phrase tables, generation tables or anything like this, you will need to increment the required data version number in
MosesRegressionTesting.pm. Then, you will need to create a new
.tgz file that contains the data for all the tests (the data dependencies are not checked into the Git repository because they are extremely large). This must then be made available for download.