Moses
statistical
machine translation
system

Releases

Release 2.1.1 (3rd March, 2014)

This is a minor patch for a bug that prevent Moses from linking with tcmalloc when it is available on the compilation machine. Using tcmalloc can substantially speed up decoding, at the cost of more memory usage.

Release 2.1 (21th Jan, 2014)

This is the current stable release.

Overview

The broad aim of this release is to tackle more complicated issues to enable better expandability and reliability.

Specifically, the decoder has been refactored to create a more modular framework to enable easier incorporation of new feature functions into Moses. This has necessitate major changes in many other parts of the toolkit, including training and tuning.

As well as the refactored code, this release also incorporate a host of new features donated by other developers. Transliteration modules, better error handling, small and fast language models, and placeholders are just some of the new features that spring to mind.

We have also continue to expand the testing regime to maintain the reliability of the toolkit, while enable more developers to contribute to the project.

We distribute Moses as: 1. source code, 2. binaries for Windows (32 and 64 bit), Mac OSX (Mavericks), and various flavours of Linux (32 and 64 bit). 3. pre-installed in a Linux virtual machine, using the open source VirtualBox application. 4. Amazon cloud server image.

Release 1.0 (28th Jan, 2013)

Overview

The Moses community has grown tremendously over the last few years. From the beginning as a purely research-driven project, we are now a diverse community of academic and business users, ranging in experience from hardened developers to new users.

Therefore, the first priority of this release has been to concentrate on resolving long-standing, but straightforward, issues to make the toolkit easier to use and more efficient. The provision of full-time development team devoted to the maintenance and enhancement of the Moses toolkit has allowed has to tackle many useful engineering problems.

A second priority was to put in place a multi-tiered testing regime to enable more developers to contribute to the project, more quickly, while ensuring the reliability of the toolkit. However, we have not stopped adding new features to the toolkit; the next section lists a number of major features added in the last 9 months.

New Features

The following is a list of the major new features in the Moses toolkit since May 2012, in roughly chronological order.

Parallel Training by Hieu Hoang and Rohit Gupta.

The training process has been improved and can take advantage of multi-core machines. Parallelization was achieved by partitioning the input data, then running the translation rule extraction processes in parallel before merging the data. The following is the timing for the extract process on different number of cores:

CoresOneTwoThreeFour
Time taken (mins)48:5533:1327:5225:35

The training processes have also been redesigned to decrease disk access, and to use less disk space. This is important for parallel processing as disk IO often becomes the limiting factor with a large number of simultaneous disk access. It is also important when training syntactically inspired models or using large amounts of training data, which can result in very large translation models.

IRST LM training integration by Hieu Hoang and Philipp Koehn

The IRST toolkit for training language models have been integrated into the Experiment Management System. The SRILM software previously carried out this functionality. Substituting IRST for SRI means that the entire training pipeline can be run using only free, open-source software. Not only is the IRST toolkit unencumbered by a proprietary license, it is also parallelizable and capable of training with a larger amount of data than was otherwise possible with SRI.

Distributed Language Model by Oliver Wilson.

Language models can be distributed across many machines, allowing more data to be used at the cost of a performance overhead. This is still experimental code.

Incremental Search Algorithm by Kenneth Heafield.

A replacement for the cube pruning algorithm in CKY++ decoding, used in hierarchical and syntax models. It offers better tradeoff between decoding speed and translation quality.

Compressed Phrase-Table and Reordering-Tables by Marcin Junczys-Dowmunt.

A phrase-table and lexicalized reordering-table implementation which is both small and fast. More details.

Sparse features by Eva Hasler, Barry Haddow, Philipp Koehn

A framework to allow a large number of sparse features in the decoder. A number of sparse feature functions described in the literature have been reproduced in Moses. Currently, the available sparse feature functions are:

  1. TargetBigramFeature
  2. TargetNgramFeature
  3. SourceWordDeletionFeature
  4. SparsePhraseDictionaryFeature
  5. GlobalLexicalModelUnlimited
  6. PhraseBoundaryState
  7. PhraseLengthFeature
  8. PhrasePairFeature
  9. TargetWordInsertionFeature

Suffix array for hierarchical models by Hieu Hoang

The training of syntactically-inspired hierarchical models requires a large amount of time and resource. An alternative to training a translation is to only extract the required translation rules for each input sentence.

We have integrated Adam Lopez's suffix array implementation into Moses. This is a well-known and mature implementation, which is hosted and maintained by the cdec community.

Multi-threaded tokenizer by Pidong Wang

Batched MIRA by Colin Cherry.

A replacement for MERT, especially suited for tuning a large number of sparse features. (Cherry and Foster, NAACL 2012).

LR score by Lexi Birch and others.

The BLEU score commonly used in MT is insensitive to reordering errors. We have integrated another metric , LR score, described in (Birch and Osborne, 2011) which better accounts for reordering, in the Moses toolkit.

Convergence of Translation Memory and Statistical Machine Translation by Philipp Koehn and Hieu Hoang

An alternative extract algorithm, (Koehn, Senellart, 2010 AMTA), which is inspired by the use of translation memories has been integrated into the Moses toolkit.

Word Alignment Information is turned on by default by Hieu Hoang and Barry Haddow

The word alignment produced by GIZA++/mgiza is carried by the phrase-table and made available to the decoder. This information is required by some feature functions. The use of these word alignment is now optimized for memory and speed, and enabled by default.

Modified Moore-Lewis filtering by Barry Haddow and Philipp Koehn

Reimplementation of domain adaptation of parallel corpus described by Axelrod et al. (EMNLP 2011).

Lots and lots of cleanups and bug fixes

By Ales Tamchyna, Wilker Aziz, Mark Fishel, Tetsuo Kiso, Rico Sennrich, Lane Schwartz, Hiroshi Umemoto, Phil Williams, Tom Hoar, Arianna Bisazza, Jacob Dlougach, Jonathon Clark, Nadi Tomeh, Karel Bilek, Christian Buck, Oliver Wilson, Alex Fraser, Christophe Servan, Matous Machecek, Christian Federmann, Graham Neubig.

Building and Installing

The structure and installation of the Moses toolkit has been simplified to make compilation and installation easier. The training and decoding process can be run from the directory in which the toolkit was downloaded, without the need for separate installation step.

This allows binary, ready-to-run versions of Moses to distributed which can be downloaded and executed immediately. Previously, the installation needed to be configured specifically for the user's machine.

A new build system has been implemented to build the Moses toolkit. This uses the boost library's build framework. The new system offers several advantages over the previous build system.

Firstly, the source code for the new build system is included in the Moses repository which is then bootstrapped the first time Moses is compiled. It does not rely on the the cmake, automake, make, and libtool applications. These have issues with cross-platform compatibility and running on older operating systems.

Secondly, the new build system integrates the running of the unit tests and regression tests with compilation.

Thirdly, the new system is significantly more powerful, allowing us to support a number of new build features such as static and debug compilation, linking to external libraries such as mpi and tmalloc, and other non-standard builds.

Testing

The MosesCore team has implemented several layers of testing to ensure the reliability of the toolkit. We describe each below.

Unit Testing

Unit testing tests each function or class method in isolation. Moses uses the unit testing framework available from the Boost library to implement unit testing.

The source code for the unit tests are integrated into the Moses source. The tests are executed every time the Moses source is compiled.

The unit testing framework has recently been implemented. There are currently 20 unit tests for various features in mert, mira, phrase extraction, and decoding.

Regression Testing

The regression tests ensure that changes to source code do not have unknown consequences to existing functionality. The regression tests are typically applied to a larger body of work than unit tests. They are designed to test specific functionality rather than a specific function. Therefore, regression tests are applied to the actual Moses programs, rather than tested in isolation.

The regression test framework forms the core of testing within the Moses toolkit. However, it was created many years ago at the beginning of the Moses project and was only designed to test the decoder. During the past 6 months, the scope of the regression test framework has been expanded to test any part of the Moses toolkit, in addition to testing the decoder. The test are grouped into the following types:

  1. Phrase-based decoder
  2. Hierarchical/Syntax decoder
  3. Mert
  4. Rule Extract
  5. Phrase-table scoring
  6. Miscellaneous, including domain adaptation features, binarizing phrase tables, parallel rule extract, and so forth.

The number of tests has increased from 46 in May 2012 to 73 currently.

We have also overhauled the regression test to make it easier to add new tests. Previously, the data for the regression tests could only be updated by developers who had access to the web server at Edinburgh University. This has now been changed so that the data now resides in a versioned repository on github.com.

This can be accessed and changed by any Moses developer, and is subject to the same checks and controls as the rest of the Moses source code.

Every Moses developer is obliged to ensure the regression test are successfully executed before they commit their changes to the master repository.

Cruise Control

This is a daily task run on a server at the University of Edinburgh which compiles the Moses source code and executes the unit tests and regressions tests. Additionally, it also runs a small training pipeline to completion. The results of this testing is publicly available online.

This provides an independent check that all unit tests and regression tests passed, and that the entirety of the SMT pipeline is working. Therefore, it tests not only the Moses toolkit, but also external tools such as GIZA++ that are essential to Moses and the wider SMT community.

All failures are investigated by the MosesCore team and any remedial action is taken. This is done to enforce the testing regime and maintain reliability.

The cruise control is a subproject of Moses initiated by Ales Tamchyna with contribution by Barry Haddow.

Operating-System Compatibility

The Moses toolkit has always strived to be compatible on multiple platforms, particularly on the most popular operating systems used by researchers and commercial users.

Before each release, we make sure that Moses compiles and the unit tests and regression test successfully runs on various operating systems.

Moses, GIZA++ mgiza, and IRSTLM was compiled for

  1. Linux 32-bit
  2. Linux 64-bit
  3. Cygwin
  4. Mac OSX 10.7 64-bit

Effort was made to make the executables runnable on as many platforms as possible. Therefore, they were statically linked when possible. Moses was then tested on the following platforms:

  1. Windows 7 (32-bit) with Cygwin 6.1
  2. Mac OSX 10.7 with MacPorts
  3. Ubuntu 12.10, 32 and 64-bit
  4. Debian 6.0, 32 and 64-bit
  5. Fedora 17, 32 and 64-bit
  6. openSUSE 12.2, 32 and 64-bit

All the binary executables are made available for download for users who do not wish to compile their own version.

GIZA++, mgiza, and IRSTLM are also available for download as binaries to enable users to run the entire SMT pipeline without having to download and compile their own software.

Issues:

  1. IRSTLM was not linked statically. The 64-bit version fails to execute on Debian 6.0. All other platforms can run the downloaded executables without problem.
  2. Mac OSX does not support static linking. Therefore, it is not known if the executables would work on any other platforms, other than the one on which it was tested.
  3. mgiza compilation failed on Mac OSX with gcc v4.2. It could only be successfully compilednwith gcc v4.5, available via MacPorts.

End-to-End Testing

Before each Moses release, a number of full scale experiments are run. This is the final test to ensure that the Moses pipeline can run from beginning to end, uninterrupted, with "real-world" datasets. The translation quality, as measured by BLEU, is also noted, to ensure that there is no decrease in performance due to any interaction between components in the pipeline.

This testing takes approximately 2 weeks to run. The following datasets and experiments are currently used for end-to-end testing:

  • Europarl en-es: phrase-based, hierarchical
  • Europarl en-es: phrase-based, hierarchical
  • Europarl cs-en: phrase-based, hierarchical
  • Europarl en-cs: phrase-based, hierarchical
  • Europarl de-en: phrase-based, hierarchical, factored German POS, factored German+English POS
  • Europarl en-de: phrase-based, hierarchical, factored German POS, factored German+English POS
  • Europarl fr-en: phrase-based, hierarchical, recased (as opposed to truecased), factored English POS
  • Europarl en-fr: phrase-based, hierarchical, recased (as opposed to truecased), factored English POS

Pre-Made Models

The end-to-end tests produces a large number of tuned models. The models, as well as all configuration and data files, are made available for download. This is useful as a template for users setting up their own experimental environment, or for those who just want the models without running the experiments.

Release 0.91 (12th October, 2012)

The code is available in a branch on github.

This version was tested on 8 Europarl language pairs, phrase-based, hierarchical, and phrase-base factored models. All runs through without major intervention. Known issues:

  1. Hierarchical models crashes on evaluation when threaded. Strangely, run OK during tuning
  2. EMS bugs when specifying multiple language models
  3. Complex factored models not tested
  4. Hierarchical models with factors does not work

Status 11th July, 2012

A roundup of the new features that have been implemented in the past year:

  1. Lexi Birch's LR score integrated into tuning. Finished coding: YES. Tested: NO. Documented: NO. Developer: Hieu, Lexi. First/Main user: Yvette Graham.
  2. Asynchronous, batched LM requests for phrase-based models. Finished coding: YES. Tested: UNKNOWN. Documented: YES. Developer: Oliver Wilson, Miles Osborne. First/Main user: Miles Osborne.
  3. Multithreaded tokenizer. Finished coding: YES. Tested: YES. Documented: NO. Developer: Pidong Wang.
  4. KB Mira. Finished coding: YES. Tested: YES. Documented: YES. Developer: Colin Cherry.
  5. Training & decoding more resilient to non-printing characters and Moses' reserved characters. Escaping the reserved characters and throwing away lines with non-printing chars. Finished coding: YES. Tested: YES. Documented: NO. Developer: Philipp Koehn and Tom Hoar.
  6. Simpler installation. Finished coding: YES. Tested: YES. Documented: YES. Developer: Hieu Hoang. First/Main user: Hieu Hoang.
  7. Factors work with chart decoding. Finished coding: YES. Tested: NO. Documented: NO. Developer: Hieu Hoang. First/Main user: Fabienne Braune.
  8. Less IO and disk space needed during training. Everything written directly to gz files. Finished coding: YES. Tested: YES. Documented: NO. Developer: Hieu. First/Main user: Hieu.
  9. Parallel training. Finished coding: YES. Tested: YES. Documented: YES. Developer: Hieu. First/Main user: Hieu
  10. Adam Lopez's suffix array integrated into Moses's training & decoding. Finished coding: YES. Tested: NO. Documented: YES. Developer: Hieu.
  11. Major MERT code cleanup. Finished coding: YES. Tested: NO. Documented: NO. Developer: Tetsuo Kiso.
  12. Wrapper for Berkeley parser (german). Finished coding: YES. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Philipp Koehn.
  13. Option to use p(RHS_t|RHS_s,LHS) or p(LHS,RHS_t|RHS_s), as a grammar rule's direct translation score. Finished coding: YES. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Philip Williams. First/Main user: Philip Williams.
  14. Optional PCFG scoring feature for target syntax models. Finished coding: YES. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Philip Williams. First/Main user: Philip Williams.
  15. Add -snt2cooc option to use mgiza's reduced memory snt2cooc program. Finished coding: YES. Tested: YES. Documented: YES. Developer: Hieu Hoang.
  16. queryOnDiskPt program. Finished coding: YES. Tested: YES. Documented: NO. Developer: Hieu Hoang. First/Main user: Daniel Schaut.
  17. Output phrase segmentation to n-best when -report-segmentation is used. Finished coding: YES. Tested: UNKNOWN. Developer: UNKNOWN. First/Main user: Jonathon Clark.
  18. CDER and WER metric in tuning. Finished coding: UNKNOWN. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Matous Machacek.
  19. Lossy Distributed Hash Table Language Model. Finished coding: UNKNOWN. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Oliver Wilson.
  20. Interpolated scorer for MERT. Finished coding: YES. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Matous Machacek.
  21. IRST LM training integrated into Moses. Finished coding: YES. Tested: YES. Documented: YES. Developer: Hieu Hoang.
  22. GlobalLexiconModel. Finished coding: UNKNOWN. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Jiri Marsik, Christian Buck and Philipp Koehn.
  23. TM Combine (translation model combination). Finished coding: YES. Tested: YES. Documented: YES. Developer: Rico Sennrich.
  24. Alternative to CKY+ for scope-3 grammar. Reimplementation of Hopkins and Langmead (2010). Finished coding: YES. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Philip Williams.
  25. Sample Java client for Moses server. Finished coding: YES. Tested: NO. Documented: NO. Developer: Marwen Azouzi. First/Main user: Mailing list users.
  26. Support for mgiza, without having to install GIZA++ as well. Finished coding: YES. Tested: YES. Documented: NO. Developer: Marwen Azouzi.
  27. Interpolated language models. Finished coding: YES. Tested: YES. Documented: YES. Developer: Philipp Koehn.
  28. Duplicate removal in MERT. Finished coding: YES. Tested: YES. Documented: NO. Developer: Thomas Schoenemann.
  29. Use bjam instead of automake to compile. Finished coding: YES. Tested: YES. Documented: YES. Developer: Ken Heafield.
  30. Recaser train script updated to support IRSTLM as well. Finished coding: YES. Tested: YES. Documented: YES. Developer: Jehan.
  31. extract-ghkm. Finished coding: UNKNOWN. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Philip Williams.
  32. PRO tuning algorithm. Finished coding: YES. Tested: YES. Documented: YES. Developer: Philipp Koehn and Barry Haddow.
  33. Cruise control. Finished coding: YES. Tested: YES. Documented: YES. Developer: Ales Tamchyna.
  34. Faster SCFG rule table format. Finished coding: YES. Tested: UNKNOWN. Documented: NO. Developer: Philip Williams.
  35. LM OOV feature. Finished coding: YES. Tested: UNKNOWN. Documented: NO. Developer: Barry Haddow and Ken Heafield.
  36. TER Scorer in MERT. Finished coding: UNKNOWN. Tested: UNKNOWN. Documented: NO. Developer: Matous Machacek & Christophe Servan.
  37. Multi-threading for decoder & MERT. Finished coding: YES. Tested: YES. Documented: YES. Developer: Barry Haddow et al.
  38. Expose n-gram length as part of LM state calculation.Finished coding: YES. Tested: UNKNOWN. Documented: NO. Developer: Ken Heafield and Marc Legendre.
  39. Changes to chart decoder cube pruning: create one cube per dotted rule instead of one per translation. Finished coding: YES. Tested: YES. Documented: NO. Developer: Philip Williams.
  40. Syntactic LM. Finished coding: YES. Tested: YES. Documented: YES. Developer: Lane Schwartz.
  41. Czech detokenization. Finished coding: YES. Tested: UNKNOWN. Documented: UNKNOWN. Developer: Ondrej Bojar.

Status 13th August, 2010

Changes since the last status report:

  1. change or delete character Ř to 0 in extract-rules.cpp (Raphael and Hieu Hoang)

Status 9th August, 2010

Changes since the last status report:

  1. Add option of retaining alignment information in the phrase-based phrase table. Decoder loads this information if present. (Hieu Hoang & Raphael Payen)
  2. When extracting rules, if the source or target syntax contains an unsupported escape sequence (anything other than "<", ">", "&", "&apos", and "&quot") then write a warning message and skip the sentence pair (instead of asserting).
  3. In bootstrap-hypothesis-difference-significance.pl, calculates the p-value and confidence intervals not only using BLEU, but also the NIST score. (Mark Fishel)
  4. Dynamic Suffix Arrays (Abby Levenberg)
  5. Merge multi-threaded Moses into Moses (Barry Haddow)
  6. Continue partial translation (Ondrej Bojar and Ondrej Odchazel)
  7. Bug fixes, minor bits & bobs. (Philipp Koehn, Christian Hardmeier, Hieu Hoang, Barry Haddow, Philip Williams, Ondrej Bojar, Abbey, Mark Mishel, Lane Schwartz, Nicola Bertoldi, Raphael, ...)

Status 26th April, 2010

Changes since the last status report:

  1. Synchronous CFG based decoding, a la Hiero (Chiang 2005), plus with syntax. And all the scripts to go with it. (Thanks to Philip Williams and Hieu Hoang)
  2. caching clearing in IRST LM (Nicola Bertoldi)
  3. Factored Language Model. (Ondrej Bojar)
  4. Fixes to lattice (Christian Hardmeier, Arianna Bisazza, Suzy Howlett)
  5. zmert (Ondrej Bojar)
  6. Suffix arrays (Abby Levenberg)
  7. Lattice MBR and consensus decoding (Barry Haddow and Abhishek Arun)
  8. Simple program that illustrates how to access a phrase table on disk from an external program (Felipe Sánchez-Martínez)
  9. Odds and sods by Raphael Payen and Sara Stymne.

Status 1st April, 2010

Changes since the last status report:

  1. Fix for Visual Studio, and potentially other compilers (thanks to Barry, Christian, Hieu)
  2. Memory leak in unique n-best fixed (thanks to Barry)
  3. Makefile fix for Moses server (thanks to Barry)

Status 26th March, 2010

Changes since the last status report:

  1. Minor bug fixes & tweaks, especially to the decoder, MERT scripts (thanks to too many people to mention)
  2. Fixes to make decoder compile with most versions of gcc, Visual studio and other compilers (thanks to Tom Hoar, Jean-Bapist Fouet).
  3. Multi-threaded decoder (thanks to Barry Haddow)
  4. Update for IRSTLM (thanks to Nicola Bertoldi and Marcello Federico)
  5. Run mert on a subset of features (thanks to Nicola Bertoldi)
  6. Training using different alignment models (thanks to Mark Fishel)
  7. "A handy script to get many translations from Google" (thanks to Ondrej Bojar)
  8. Lattice MBR (thanks to Abhishek Arun and Barry Haddow)
  9. Option to compile Moses as a dynamic library (thanks to Jean-Bapist Fouet).
  10. Hierarchical re-ordering model (thanks to Christian Harmeier, Sara Styme, Nadi, Marcello, Ankit Srivastava, Gabriele Antonio Musillo, Philip Williams, Barry Haddow).
  11. Global Lexical re-ordering model (thanks to Philipp Koehn)
  12. Experiment.perl scripts for automating the whole MT pipeline (thanks to Philipp Koehn)

Work in Progress

Some ongoing issues have not yet been resolved. If you fancy helping out, email the Moses developers.

FeatureFinished codingTestedDocumentedDeveloperFirst/Main userNotes
Link to tcmalloc for fast C++ executionYESYESYESKenKen Heafield, everyone
Include Word alignment on by default during training and decodingYESNearlyNoBarry Haddow, Hieu HoangBarry Haddow, Eva Hasler, other developersPhrase-based phrase table are all OK. Checking chart decoding. There may also be multi-threading issues
Integrating Philipp Koehn's TM-MT into MosesNONOYESHieu HoangPhilipp Koehn, everyoneAdded multi-threading. TODO - Add switches for different arguments. Switch to Abby Levenberg's dynamic-SA implementation
Integrating Marcin's compressed phrase table into EMS. Regression test addedYESYESYESBarry Haddow, Marcin, Hieu HoangEveryone
Testing cygwin buildOngoing--Hieu Hoang-Currently 1 of the kenLM unit test fails. VM on thor died after server died. Set up ssh on Windows
Simplify feature function framework. Merge all [weights-*] sections in moses.ini fileBranch--Hieu Hoang-Redo after MIRA merge
Lattice decoding in chart decodingNot started--Hieu Hoang-What about training?
Sparse featureYESYESYESEva Hasler, Barry Haddow-Need a consistent way to turn features on/off
PlaceholderNONONOHieu Hoang, anyone else interestedespecially commercial users 
Preserving formattingNONONOHieu Hoang, anyone else interestedespecially commercial users 
Edit - History - Print
Page last modified on March 03, 2014, at 06:25 PM