Moses » Releases

Releases

Release 4.0 (5th Oct, 2017)

This is the current stable release.

Get the code on github
Download Binaries
Pre-made models
Virtual Machines files
Release notes

Release 3.0 (3rd Feb, 2015)

Get the code on github
Download Binaries
Pre-made models
Virtual Machines files
Release notes

Release 2.1.1 (3rd March, 2014)

This is a minor patch for a bug that prevent Moses from linking with tcmalloc when it is available on the compilation machine. Using tcmalloc can substantially speed up decoding, at the cost of more memory usage.

Get the code on github

Release 2.1 (21th Jan, 2014)

Get the code on github
Download Binaries

Overview

The broad aim of this release is to tackle more complicated issues to enable better expandability and reliability.

Specifically, the decoder has been refactored to create a more modular framework to enable easier incorporation of new feature functions into Moses. This has necessitate major changes in many other parts of the toolkit, including training and tuning.

As well as the refactored code, this release also incorporate a host of new features donated by other developers. Transliteration modules, better error handling, small and fast language models, and placeholders are just some of the new features that spring to mind.

We have also continue to expand the testing regime to maintain the reliability of the toolkit, while enable more developers to contribute to the project.

We distribute Moses as: 1. source code, 2. binaries for Windows (32 and 64 bit), Mac OSX (Mavericks), and various flavours of Linux (32 and 64 bit). 3. pre-installed in a Linux virtual machine, using the open source VirtualBox application. 4. Amazon cloud server image.

Release 1.0 (28th Jan, 2013)

Get the code on github

Overview

The Moses community has grown tremendously over the last few years. From the beginning as a purely research-driven project, we are now a diverse community of academic and business users, ranging in experience from hardened developers to new users.

Therefore, the first priority of this release has been to concentrate on resolving long-standing, but straightforward, issues to make the toolkit easier to use and more efficient. The provision of full-time development team devoted to the maintenance and enhancement of the Moses toolkit has allowed has to tackle many useful engineering problems.

A second priority was to put in place a multi-tiered testing regime to enable more developers to contribute to the project, more quickly, while ensuring the reliability of the toolkit. However, we have not stopped adding new features to the toolkit; the next section lists a number of major features added in the last 9 months.

New Features

The following is a list of the major new features in the Moses toolkit since May 2012, in roughly chronological order.

Parallel Training by Hieu Hoang and Rohit Gupta.

The training process has been improved and can take advantage of multi-core machines. Parallelization was achieved by partitioning the input data, then running the translation rule extraction processes in parallel before merging the data. The following is the timing for the extract process on different number of cores:

Cores	One	Two	Three	Four
Time taken (mins)	48:55	33:13	27:52	25:35

The training processes have also been redesigned to decrease disk access, and to use less disk space. This is important for parallel processing as disk IO often becomes the limiting factor with a large number of simultaneous disk access. It is also important when training syntactically inspired models or using large amounts of training data, which can result in very large translation models.

IRST LM training integration by Hieu Hoang and Philipp Koehn

The IRST toolkit for training language models have been integrated into the Experiment Management System. The SRILM software previously carried out this functionality. Substituting IRST for SRI means that the entire training pipeline can be run using only free, open-source software. Not only is the IRST toolkit unencumbered by a proprietary license, it is also parallelizable and capable of training with a larger amount of data than was otherwise possible with SRI.

Distributed Language Model by Oliver Wilson.

Language models can be distributed across many machines, allowing more data to be used at the cost of a performance overhead. This is still experimental code.

Incremental Search Algorithm by Kenneth Heafield.

A replacement for the cube pruning algorithm in CKY++ decoding, used in hierarchical and syntax models. It offers better tradeoff between decoding speed and translation quality.

Compressed Phrase-Table and Reordering-Tables by Marcin Junczys-Dowmunt.

A phrase-table and lexicalized reordering-table implementation which is both small and fast. More details.

Sparse features by Eva Hasler, Barry Haddow, Philipp Koehn

A framework to allow a large number of sparse features in the decoder. A number of sparse feature functions described in the literature have been reproduced in Moses. Currently, the available sparse feature functions are:

TargetBigramFeature
TargetNgramFeature
SourceWordDeletionFeature
SparsePhraseDictionaryFeature
GlobalLexicalModelUnlimited
PhraseBoundaryState
PhraseLengthFeature
PhrasePairFeature
TargetWordInsertionFeature

Suffix array for hierarchical models by Hieu Hoang

The training of syntactically-inspired hierarchical models requires a large amount of time and resource. An alternative to training a translation is to only extract the required translation rules for each input sentence.

We have integrated Adam Lopez's suffix array implementation into Moses. This is a well-known and mature implementation, which is hosted and maintained by the cdec community.

Multi-threaded tokenizer by Pidong Wang

Batched MIRA by Colin Cherry.

A replacement for MERT, especially suited for tuning a large number of sparse features. (Cherry and Foster, NAACL 2012).

LR score by Lexi Birch and others.

The BLEU score commonly used in MT is insensitive to reordering errors. We have integrated another metric , LR score, described in (Birch and Osborne, 2011) which better accounts for reordering, in the Moses toolkit.

Convergence of Translation Memory and Statistical Machine Translation by Philipp Koehn and Hieu Hoang

An alternative extract algorithm, (Koehn, Senellart, 2010 AMTA), which is inspired by the use of translation memories has been integrated into the Moses toolkit.

Word Alignment Information is turned on by default by Hieu Hoang and Barry Haddow

The word alignment produced by GIZA++/mgiza is carried by the phrase-table and made available to the decoder. This information is required by some feature functions. The use of these word alignment is now optimized for memory and speed, and enabled by default.

Modified Moore-Lewis filtering by Barry Haddow and Philipp Koehn

Reimplementation of domain adaptation of parallel corpus described by Axelrod et al. (EMNLP 2011).

Lots and lots of cleanups and bug fixes

By Ales Tamchyna, Wilker Aziz, Mark Fishel, Tetsuo Kiso, Rico Sennrich, Lane Schwartz, Hiroshi Umemoto, Phil Williams, Tom Hoar, Arianna Bisazza, Jacob Dlougach, Jonathon Clark, Nadi Tomeh, Karel Bilek, Christian Buck, Oliver Wilson, Alex Fraser, Christophe Servan, Matous Machecek, Christian Federmann, Graham Neubig.

Building and Installing

The structure and installation of the Moses toolkit has been simplified to make compilation and installation easier. The training and decoding process can be run from the directory in which the toolkit was downloaded, without the need for separate installation step.

This allows binary, ready-to-run versions of Moses to distributed which can be downloaded and executed immediately. Previously, the installation needed to be configured specifically for the user's machine.

A new build system has been implemented to build the Moses toolkit. This uses the boost library's build framework. The new system offers several advantages over the previous build system.

Firstly, the source code for the new build system is included in the Moses repository which is then bootstrapped the first time Moses is compiled. It does not rely on the the cmake, automake, make, and libtool applications. These have issues with cross-platform compatibility and running on older operating systems.

Secondly, the new build system integrates the running of the unit tests and regression tests with compilation.

Thirdly, the new system is significantly more powerful, allowing us to support a number of new build features such as static and debug compilation, linking to external libraries such as mpi and tmalloc, and other non-standard builds.

Testing

The MosesCore team has implemented several layers of testing to ensure the reliability of the toolkit. We describe each below.

Unit Testing

Unit testing tests each function or class method in isolation. Moses uses the unit testing framework available from the Boost library to implement unit testing.

The source code for the unit tests are integrated into the Moses source. The tests are executed every time the Moses source is compiled.

The unit testing framework has recently been implemented. There are currently 20 unit tests for various features in mert, mira, phrase extraction, and decoding.

Regression Testing

The regression tests ensure that changes to source code do not have unknown consequences to existing functionality. The regression tests are typically applied to a larger body of work than unit tests. They are designed to test specific functionality rather than a specific function. Therefore, regression tests are applied to the actual Moses programs, rather than tested in isolation.

The regression test framework forms the core of testing within the Moses toolkit. However, it was created many years ago at the beginning of the Moses project and was only designed to test the decoder. During the past 6 months, the scope of the regression test framework has been expanded to test any part of the Moses toolkit, in addition to testing the decoder. The test are grouped into the following types:

Phrase-based decoder
Hierarchical/Syntax decoder
Mert
Rule Extract
Phrase-table scoring
Miscellaneous, including domain adaptation features, binarizing phrase tables, parallel rule extract, and so forth.

The number of tests has increased from 46 in May 2012 to 73 currently.

We have also overhauled the regression test to make it easier to add new tests. Previously, the data for the regression tests could only be updated by developers who had access to the web server at Edinburgh University. This has now been changed so that the data now resides in a versioned repository on github.com.

This can be accessed and changed by any Moses developer, and is subject to the same checks and controls as the rest of the Moses source code.

Every Moses developer is obliged to ensure the regression test are successfully executed before they commit their changes to the master repository.

Cruise Control

This is a daily task run on a server at the University of Edinburgh which compiles the Moses source code and executes the unit tests and regressions tests. Additionally, it also runs a small training pipeline to completion. The results of this testing is publicly available online.

This provides an independent check that all unit tests and regression tests passed, and that the entirety of the SMT pipeline is working. Therefore, it tests not only the Moses toolkit, but also external tools such as GIZA++ that are essential to Moses and the wider SMT community.

All failures are investigated by the MosesCore team and any remedial action is taken. This is done to enforce the testing regime and maintain reliability.

The cruise control is a subproject of Moses initiated by Ales Tamchyna with contribution by Barry Haddow.

Operating-System Compatibility

The Moses toolkit has always strived to be compatible on multiple platforms, particularly on the most popular operating systems used by researchers and commercial users.

Before each release, we make sure that Moses compiles and the unit tests and regression test successfully runs on various operating systems.

Moses, GIZA++ mgiza, and IRSTLM was compiled for

Linux 32-bit
Linux 64-bit
Cygwin
Mac OSX 10.7 64-bit

Effort was made to make the executables runnable on as many platforms as possible. Therefore, they were statically linked when possible. Moses was then tested on the following platforms:

Windows 7 (32-bit) with Cygwin 6.1
Mac OSX 10.7 with MacPorts
Ubuntu 12.10, 32 and 64-bit
Debian 6.0, 32 and 64-bit
Fedora 17, 32 and 64-bit
openSUSE 12.2, 32 and 64-bit

All the binary executables are made available for download for users who do not wish to compile their own version.

GIZA++, mgiza, and IRSTLM are also available for download as binaries to enable users to run the entire SMT pipeline without having to download and compile their own software.

Issues:

IRSTLM was not linked statically. The 64-bit version fails to execute on Debian 6.0. All other platforms can run the downloaded executables without problem.
Mac OSX does not support static linking. Therefore, it is not known if the executables would work on any other platforms, other than the one on which it was tested.
mgiza compilation failed on Mac OSX with gcc v4.2. It could only be successfully compilednwith gcc v4.5, available via MacPorts.

End-to-End Testing

Before each Moses release, a number of full scale experiments are run. This is the final test to ensure that the Moses pipeline can run from beginning to end, uninterrupted, with "real-world" datasets. The translation quality, as measured by BLEU, is also noted, to ensure that there is no decrease in performance due to any interaction between components in the pipeline.

This testing takes approximately 2 weeks to run. The following datasets and experiments are currently used for end-to-end testing:

Europarl en-es: phrase-based, hierarchical
Europarl en-es: phrase-based, hierarchical
Europarl cs-en: phrase-based, hierarchical
Europarl en-cs: phrase-based, hierarchical
Europarl de-en: phrase-based, hierarchical, factored German POS, factored German+English POS
Europarl en-de: phrase-based, hierarchical, factored German POS, factored German+English POS
Europarl fr-en: phrase-based, hierarchical, recased (as opposed to truecased), factored English POS
Europarl en-fr: phrase-based, hierarchical, recased (as opposed to truecased), factored English POS

Pre-Made Models

The end-to-end tests produces a large number of tuned models. The models, as well as all configuration and data files, are made available for download. This is useful as a template for users setting up their own experimental environment, or for those who just want the models without running the experiments.

Release 0.91 (12th October, 2012)

The code is available in a branch on github.

This version was tested on 8 Europarl language pairs, phrase-based, hierarchical, and phrase-base factored models. All runs through without major intervention. Known issues:

Hierarchical models crashes on evaluation when threaded. Strangely, run OK during tuning
EMS bugs when specifying multiple language models
Complex factored models not tested
Hierarchical models with factors does not work