This is the current stable release.
This is a minor patch for a bug that prevent Moses from linking with tcmalloc when it is available on the compilation machine. Using tcmalloc can substantially speed up decoding, at the cost of more memory usage.
The broad aim of this release is to tackle more complicated issues to enable better expandability and reliability.
Specifically, the decoder has been refactored to create a more modular framework to enable easier incorporation of new feature functions into Moses. This has necessitate major changes in many other parts of the toolkit, including training and tuning.
As well as the refactored code, this release also incorporate a host of new features donated by other developers. Transliteration modules, better error handling, small and fast language models, and placeholders are just some of the new features that spring to mind.
We have also continue to expand the testing regime to maintain the reliability of the toolkit, while enable more developers to contribute to the project.
We distribute Moses as: 1. source code, 2. binaries for Windows (32 and 64 bit), Mac OSX (Mavericks), and various flavours of Linux (32 and 64 bit). 3. pre-installed in a Linux virtual machine, using the open source VirtualBox application. 4. Amazon cloud server image.
The Moses community has grown tremendously over the last few years. From the beginning as a purely research-driven project, we are now a diverse community of academic and business users, ranging in experience from hardened developers to new users.
Therefore, the first priority of this release has been to concentrate on resolving long-standing, but straightforward, issues to make the toolkit easier to use and more efficient. The provision of full-time development team devoted to the maintenance and enhancement of the Moses toolkit has allowed has to tackle many useful engineering problems.
A second priority was to put in place a multi-tiered testing regime to enable more developers to contribute to the project, more quickly, while ensuring the reliability of the toolkit. However, we have not stopped adding new features to the toolkit; the next section lists a number of major features added in the last 9 months.
The following is a list of the major new features in the Moses toolkit since May 2012, in roughly chronological order.
The training process has been improved and can take advantage of multi-core machines. Parallelization was achieved by partitioning the input data, then running the translation rule extraction processes in parallel before merging the data. The following is the timing for the extract process on different number of cores:
Cores | One | Two | Three | Four |
Time taken (mins) | 48:55 | 33:13 | 27:52 | 25:35 |
The training processes have also been redesigned to decrease disk access, and to use less disk space. This is important for parallel processing as disk IO often becomes the limiting factor with a large number of simultaneous disk access. It is also important when training syntactically inspired models or using large amounts of training data, which can result in very large translation models.
The IRST toolkit for training language models have been integrated into the Experiment Management System. The SRILM software previously carried out this functionality. Substituting IRST for SRI means that the entire training pipeline can be run using only free, open-source software. Not only is the IRST toolkit unencumbered by a proprietary license, it is also parallelizable and capable of training with a larger amount of data than was otherwise possible with SRI.
Language models can be distributed across many machines, allowing more data to be used at the cost of a performance overhead. This is still experimental code.
A replacement for the cube pruning algorithm in CKY++ decoding, used in hierarchical and syntax models. It offers better tradeoff between decoding speed and translation quality.
A phrase-table and lexicalized reordering-table implementation which is both small and fast. More details.
A framework to allow a large number of sparse features in the decoder. A number of sparse feature functions described in the literature have been reproduced in Moses. Currently, the available sparse feature functions are:
The training of syntactically-inspired hierarchical models requires a large amount of time and resource. An alternative to training a translation is to only extract the required translation rules for each input sentence.
We have integrated Adam Lopez's suffix array implementation into Moses. This is a well-known and mature implementation, which is hosted and maintained by the cdec community.
A replacement for MERT, especially suited for tuning a large number of sparse features. (Cherry and Foster, NAACL 2012).
The BLEU score commonly used in MT is insensitive to reordering errors. We have integrated another metric , LR score, described in (Birch and Osborne, 2011) which better accounts for reordering, in the Moses toolkit.
An alternative extract algorithm, (Koehn, Senellart, 2010 AMTA), which is inspired by the use of translation memories has been integrated into the Moses toolkit.
The word alignment produced by GIZA++/mgiza is carried by the phrase-table and made available to the decoder. This information is required by some feature functions. The use of these word alignment is now optimized for memory and speed, and enabled by default.
Reimplementation of domain adaptation of parallel corpus described by Axelrod et al. (EMNLP 2011).
By Ales Tamchyna, Wilker Aziz, Mark Fishel, Tetsuo Kiso, Rico Sennrich, Lane Schwartz, Hiroshi Umemoto, Phil Williams, Tom Hoar, Arianna Bisazza, Jacob Dlougach, Jonathon Clark, Nadi Tomeh, Karel Bilek, Christian Buck, Oliver Wilson, Alex Fraser, Christophe Servan, Matous Machecek, Christian Federmann, Graham Neubig.
The structure and installation of the Moses toolkit has been simplified to make compilation and installation easier. The training and decoding process can be run from the directory in which the toolkit was downloaded, without the need for separate installation step.
This allows binary, ready-to-run versions of Moses to distributed which can be downloaded and executed immediately. Previously, the installation needed to be configured specifically for the user's machine.
A new build system has been implemented to build the Moses toolkit. This uses the boost library's build framework. The new system offers several advantages over the previous build system.
Firstly, the source code for the new build system is included in the Moses repository which is then bootstrapped the first time Moses is compiled. It does not rely on the the cmake, automake, make, and libtool applications. These have issues with cross-platform compatibility and running on older operating systems.
Secondly, the new build system integrates the running of the unit tests and regression tests with compilation.
Thirdly, the new system is significantly more powerful, allowing us to support a number of new build features such as static and debug compilation, linking to external libraries such as mpi and tmalloc, and other non-standard builds.
The MosesCore team has implemented several layers of testing to ensure the reliability of the toolkit. We describe each below.
Unit testing tests each function or class method in isolation. Moses uses the unit testing framework available from the Boost library to implement unit testing.
The source code for the unit tests are integrated into the Moses source. The tests are executed every time the Moses source is compiled.
The unit testing framework has recently been implemented. There are currently 20 unit tests for various features in mert, mira, phrase extraction, and decoding.
The regression tests ensure that changes to source code do not have unknown consequences to existing functionality. The regression tests are typically applied to a larger body of work than unit tests. They are designed to test specific functionality rather than a specific function. Therefore, regression tests are applied to the actual Moses programs, rather than tested in isolation.
The regression test framework forms the core of testing within the Moses toolkit. However, it was created many years ago at the beginning of the Moses project and was only designed to test the decoder. During the past 6 months, the scope of the regression test framework has been expanded to test any part of the Moses toolkit, in addition to testing the decoder. The test are grouped into the following types:
The number of tests has increased from 46 in May 2012 to 73 currently.
We have also overhauled the regression test to make it easier to add new tests. Previously, the data for the regression tests could only be updated by developers who had access to the web server at Edinburgh University. This has now been changed so that the data now resides in a versioned repository on github.com.
This can be accessed and changed by any Moses developer, and is subject to the same checks and controls as the rest of the Moses source code.
Every Moses developer is obliged to ensure the regression test are successfully executed before they commit their changes to the master repository.
This is a daily task run on a server at the University of Edinburgh which compiles the Moses source code and executes the unit tests and regressions tests. Additionally, it also runs a small training pipeline to completion. The results of this testing is publicly available online.
This provides an independent check that all unit tests and regression tests passed, and that the entirety of the SMT pipeline is working. Therefore, it tests not only the Moses toolkit, but also external tools such as GIZA++ that are essential to Moses and the wider SMT community.
All failures are investigated by the MosesCore team and any remedial action is taken. This is done to enforce the testing regime and maintain reliability.
The cruise control is a subproject of Moses initiated by Ales Tamchyna with contribution by Barry Haddow.
The Moses toolkit has always strived to be compatible on multiple platforms, particularly on the most popular operating systems used by researchers and commercial users.
Before each release, we make sure that Moses compiles and the unit tests and regression test successfully runs on various operating systems.
Moses, GIZA++ mgiza, and IRSTLM was compiled for
Effort was made to make the executables runnable on as many platforms as possible. Therefore, they were statically linked when possible. Moses was then tested on the following platforms:
All the binary executables are made available for download for users who do not wish to compile their own version.
GIZA++, mgiza, and IRSTLM are also available for download as binaries to enable users to run the entire SMT pipeline without having to download and compile their own software.
Issues:
Before each Moses release, a number of full scale experiments are run. This is the final test to ensure that the Moses pipeline can run from beginning to end, uninterrupted, with "real-world" datasets. The translation quality, as measured by BLEU, is also noted, to ensure that there is no decrease in performance due to any interaction between components in the pipeline.
This testing takes approximately 2 weeks to run. The following datasets and experiments are currently used for end-to-end testing:
The end-to-end tests produces a large number of tuned models. The models, as well as all configuration and data files, are made available for download. This is useful as a template for users setting up their own experimental environment, or for those who just want the models without running the experiments.
The code is available in a branch on github.
This version was tested on 8 Europarl language pairs, phrase-based, hierarchical, and phrase-base factored models. All runs through without major intervention. Known issues:
A roundup of the new features that have been implemented in the past year:
-snt2cooc
option to use mgiza's reduced memory snt2cooc program. Finished coding: YES. Tested: YES. Documented: YES. Developer: Hieu Hoang.
queryOnDiskPt
program. Finished coding: YES. Tested: YES. Documented: NO. Developer: Hieu Hoang. First/Main user: Daniel Schaut.
-report-segmentation
is used. Finished coding: YES. Tested: UNKNOWN. Developer: UNKNOWN. First/Main user: Jonathon Clark.
extract-ghkm
. Finished coding: UNKNOWN. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Philip Williams.
Changes since the last status report:
Changes since the last status report:
Changes since the last status report:
Changes since the last status report:
Changes since the last status report:
Some ongoing issues have not yet been resolved. If you fancy helping out, email the Moses developers.
Feature | Finished coding | Tested | Documented | Developer | First/Main user | Notes |
---|---|---|---|---|---|---|
Link to tcmalloc for fast C++ execution | YES | YES | YES | Ken | Ken Heafield, everyone | |
Include Word alignment on by default during training and decoding | YES | Nearly | No | Barry Haddow, Hieu Hoang | Barry Haddow, Eva Hasler, other developers | Phrase-based phrase table are all OK. Checking chart decoding. There may also be multi-threading issues |
Integrating Philipp Koehn's TM-MT into Moses | NO | NO | YES | Hieu Hoang | Philipp Koehn, everyone | Added multi-threading. TODO - Add switches for different arguments. Switch to Abby Levenberg's dynamic-SA implementation |
Integrating Marcin's compressed phrase table into EMS. Regression test added | YES | YES | YES | Barry Haddow, Marcin, Hieu Hoang | Everyone | |
Testing cygwin build | Ongoing | - | - | Hieu Hoang | - | Currently 1 of the kenLM unit test fails. VM on thor died after server died. Set up ssh on Windows |
Simplify feature function framework. Merge all [weights-*] sections in moses.ini file | Branch | - | - | Hieu Hoang | - | Redo after MIRA merge |
Lattice decoding in chart decoding | Not started | - | - | Hieu Hoang | - | What about training? |
Sparse feature | YES | YES | YES | Eva Hasler, Barry Haddow | - | Need a consistent way to turn features on/off |
Placeholder | NO | NO | NO | Hieu Hoang, anyone else interested | especially commercial users | |
Preserving formatting | NO | NO | NO | Hieu Hoang, anyone else interested | especially commercial users |