Moses
statistical
machine translation
system

Getting Started with Moses

This section will show you how to install and build Moses, and how to use Moses to translate with some simple models. If you experience problems, then please check the support page. If you do not want to build Moses from source, then there are packages available for Windows and popular Linux distributions.

Download Moses

It's best to download the source code via git from github.

Basic Setup

To compile Moses, you need the following installed on your machine:

   g++
   Boost

It's best to install these using the software manager of your operating system.

Once these are installed, run bjam.

This is the exact commands I (Hieu) use to download and install Moses:

    git clone https://github.com/moses-smt/mosesdecoder.git
    cd mosesdecoder/
   ./bjam -j8

You see what options are available with bjam, run

  ./bjam --help

Manually installing Boost

Boost 1.48 has a serious bug which breaks Moses compilation. Unfornately, some Linux distributions (eg. Ubuntu 12.04) have broken versions of the Boost library. In these cases, you must download and compile Boost yourself. The instructions to do this are in the Moses file:

   BUILD-INSTRUCTIONS.txt 

This is the exact commands I use to compile boost:

   wget http://downloads.sourceforge.net/project/boost/boost/1.55.0/boost_1_55_0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fboost%2Ffiles%2Fboost%2F1.55.0%2F&ts=1389613041&use_mirror=kent
   tar zxvf boost_1_55_0.tar.gz
   cd boost_1_55_0/
   ./bootstrap.sh
   ./b2 -j8 --prefix=$PWD --libdir=$PWD/lib64 --layout=system link=static install || echo FAILURE

This create library file in the directory lib64, NOT in the system directory.

Once boost is installed, you can then compile Moses. However, you must tell Moses where boost is with the --with-boost flag. This is the exact commands I use to compile Moses:

   ./bjam --with-boost=~/workspace/temp/boost_1_55_0 -j8

Other software to install

Word Alignment

Moses requires a word alignment tool, such as giza++, mgiza, or Fast Align.

I (Hieu) use MGIZA because it is multi-threaded and give general good result, however, I've also heard good things about Fast Align. You can find instructions to compile them here.

Language Model Creation

Moses include the KenLM language model creation program, lmplz.

However, you can also create language models with IRSTLM and SRILM.

I (Hieu) regularly use IRSTLM to create language models because it is

   1. free LGPL
   2. runs on multiple-cores
   3. can prune the LM.

Please read this if you want to compile IRSTLM.

If you just want to use IRSTLM to create language models, you don't have to compile it into Moses. You just need to call the IRSTLM create LM scripts.

However, if you want to use IRSTLM to read the language model and assign scores to n-grams, then IRSTLM needs to be linked with Moses.

Once IRSTLM is successfully compiled, use the --with-irstlm switch to compile Moses with IRSTLM. This is the exact command I use:

   ./bjam --with-irstlm=/home/s0565741/workspace/temp/irstlm-5.80.03 -j8

Personally, I only use IRSTLM as a query tool in this way if the LM n-gram order is over 7. In most situation, I use KenLM because KenLM is multi-threadable and faster.

Platforms

The primary development platform for Moses is Linux, and this is the recommended platform since you will find it easier to get support for it. However Moses does work on other platforms:

Windows Installation

Moses can run on Windows under Cygwin. Installation is exactly the same as for Linux and Mac. (Are you running it on Windows? If so, please give us feedback on how it works). Cygwin is available as a 32-bit or a 64-bit application. Make sure you select the one that's appropriate for your machine. Remember 32-bit limits the size of the models Moses can use to a maximum of 4GB.

Install the following packages via Cygwin:

   boost
   automake
   libtool
   gcc-g++
   python
   git
   subversion
   openssh
   make
   tcl
   zlib0
   zlib-devel
   libbz2_devel
   unzip
   libexpat-devel
   libcrypt-devel

Also, the nist-bleu script need a perl module called XML::Twig. Install this in cygwin with the command

   cpan
   cpan XML::Twig

OSX Installation

Mac OSX is widely used by Moses developers and everything should run fine. Installation is the same as for Linux.

Mac OSX out-of-the-box doesn't have many programs that are critical to Moses, or different version of standard GNU programs. For example, split, sort, zcat are incompatible BSD-versions rather than GNU versions.

Therefore, Moses has been tested with Mac OSX with Mac Ports. Make sure you have this installed on your machine

Linux Installation

Debian

Install the following packages using the command

   su
   apt-get install [package name]

Packages:

   git
   subversion
   make
   libtool
   gcc
   g++
   libboost-dev
   tcl-dev
   tk-dev
   zlib1g-dev
   libbz2-dev
   python-dev

Ubuntu

Install the following packages using the command

   sudo apt-get install [package name]

Packages:

   g++
   git
   subversion
   automake
   libtool
   zlib1g-dev
   libboost-all-dev
   libbz2-dev
   liblzma-dev
   python-dev
   libtcmalloc-minimal0 (libtcmalloc-minimal4 on Ubuntu 14.04)

Fedora

Install the following packages using the command

   su
   yumm install [package name]

Packages:

   git
   subversion
   automake
   libtool
   gcc-c++
   zlib-devel
   python-devel
   bzip2-devel

Run Moses for the first time

Download the sample models and extract them into your working directory:

 cd ~/mosesdecoder
 wget http://www.statmt.org/moses/download/sample-models.tgz
 tar xzf sample-models.tgz
 cd sample-models

Run the decoder

 cd ~/mosesdecoder/sample-models
 ~/mosesdecoder/bin/moses -f phrase-model/moses.ini < phrase-model/in > out

If everything worked out right, this should translate the sentence "das ist ein kleines haus" (in the file in) as "this is a small house" (in the file out).

Note that the configuration file moses.ini in each directory is set to use the KenLM language model toolkit by default. If you prefer to use IRSTLM, then edit the language model entry in moses.ini, replacing KENLM with IRSTLM. You will also have to compile with ./bjam --with-irstlm, adding the full path of your IRSTLM installation.

Moses also supports SRILM and RandLM language models. See here for more details.

Chart Decoder

The chart decoder is created as a separate executable:

 ~/mosesdecoder/bin/moses_chart

You can run the chart demos from the sample-models directory as follows

 ~/mosesdecoder/bin/moses_chart -f string-to-tree/moses.ini < string-to-tree/in > out.stt
 ~/mosesdecoder/bin/moses_chart -f tree-to-tree/moses.ini < tree-to-tree/in.xml > out.ttt

The expected result of the string-to-tree demo is

 this is a small house

Next Steps

Why not try to build a Baseline translation system with freely available data?

bjam options

This is a list of options to bjam. On a system with Boost installed in a standard path, none should be required, but you may want additional functionality or control.

Optional packages

Language models

In addition to KenLM and ORLM (which are always compiled):

--with-irstlm=/path/to/irstlm
Path to IRSTLM installation
--with-randlm=/path/to/randlm
Path to RandLM installation
--with-srilm=/path/to/srilm
Path to SRILM installation.

If your SRILM install is non-standard, use these options:

--with-srilm-dynamic
Link against srilm.so.
--with-srilm-arch=arch
Override the arch setting given by /path/to/srilm/sbin/machine-type

Other packages

--with-boost=/path/to/boost
If Boost is in a non-standard location, specify it here. This directory is expected to contain include and lib or lib64.
--with-xmlrpc-c=/path/to/xmlrpc-c
Specify a non-standard libxmlrpc-c installation path. Used by Moses server.
--with-cmph=/path/to/cmph
Path where CMPH is installed. Used by the compact phrase table and compact lexical reordering table.
--with-tcmalloc
Use thread-caching malloc.
--with-regtest=/path/to/moses-regression-tests
Run the regression tests using data from this directory. Tests can be downloaded from https://github.com/moses-smt/moses-regression-tests.

Installation

--prefix=/path/to/prefix
sets the install prefix [default is source root].
--bindir=/path/to/prefix/bin
sets the bin directory [default is PREFIX/bin]
--libdir=/path/to/prefix/lib
sets the lib directory [default is PREFIX/lib]
--includedir=/path/to/prefix/include
installs headers. Does not install if missing. No argument defaults to PREFIX/include .
--install-scripts=/path/to/scripts
copies scripts into a directory. Does not install if missing. No argument defaults to PREFIX/scripts .
--git
appends the git revision to the prefix directory.

Build Options

By default, the build is multi-threaded, optimized, and statically linked.

threading=single|multi
controls threading (default multi)
variant=release|debug|profile
builds optimized (default), for debug, or for profiling
link=static|shared
controls preferred linking (default static)
--static
forces static linking (the default will fall back to shared)
debug-symbols=on|off
include (default) or exclude debugging information also known as -g
--notrace
compiles without TRACE macros
--enable-boost-pool
uses Boost pools for the memory SCFG table
--enable-mpi
switch on mpi (used for MIRA - one of the tuning algorithms)
--without-libsegfault
does not link with libSegFault
--max-kenlm-order
maximum ngram order that kenlm can process (default 6)
--max-factors
maximum number of factors (default 4)
--unlabelled-source
ignore source nonterminals (if you only use hierarchical or string-to-tree models without source syntax)

Controlling the Build

-q
quit on the first error
-a
to build from scratch
-j$NCPUS
to compile in parallel
--clean
to clean

Building with Eclipse

There is a YouTube video showing you how to set up Moses with Eclipse.

   How to compile Moses with Eclipse

Moses comes with Eclipse project files for some of the C++ executables. Currently, there are project files for

   * moses-cmd (decoder)
   * moses-chart-cmd (decoder)
   * extract
   * extract-rules
   * extract-ghkm

The Eclipse build is used primarily for development and debugging. It is not optmized and doesn't have many of the options available in the bjam build.

The advantage of using Eclipse is that it offers code-completion, and a GUI debugging environment.

NB. The recent update of Mac OSX replaces g++ with clang. Eclipse doesn't yet fully function with clang.

Follow these instructions to build with Eclipse:

  * Use the version of Eclipse for C++. My (Hieu) current version is Eclipse Kepler.
  * Get the Moses source code
      git clone git@github.com:moses-smt/mosesdecoder.git
      cd mosesdecoder
  * Create softlinks to Boost, cmph, DALM, irstlm, randlm, srilm in the Moses root directory
      eg. ln -s ~/workspace/randlm
  * Create a new Eclipse workspace. The workspace MUST be in
        contrib/other-builds/
    Eclipse should now be running. 
  * Import all the Moses Eclipse project into the workspace. 
      File >> Import >> Existing Projects into Workspace >> Select Root Directory >> 'other-builds' >> Finished
  * Compile all projects. 
      Project >> Build All  
Edit - History - Print
Page last modified on September 17, 2014, at 07:06 PM