Moses
statistical
machine translation
system

Frequently Asked Questions

Content

My system is taking a really long time to translate a sentence. What can I do to speed it up ?

The single best thing you can do is to binarize the phrase tables and language models. See question below also.

The system runs out of memory during decoding.

Filter and binarize your phrase tables. Binarize your language models using the IRSTLM. Binarize your lexicalized re-ordering table.

Binarizing the phrase table helps decrease memory usage as only phrase pairs that are needed for each sentence are read from file into memory. Similarly for language models and lexicalized reordering models.

This webpage tell you how to binarize the models.

I would like to point out a bug / contribute code.

We are always grateful for bug reports and code contribution. Send it to an existing Moses developer you work with, or send it to Hieu Hoang at Edinburgh University.

If you want to check it code yourself, create a github account here

Then ask one of the project admins to add you to the Moses project. The admins are currently

  • Barry Haddow
  • Hieu Hoang
  • Nicola Bertoldi
  • Ondrej Bojar
  • Kenneth Heafield

We will probably ask to code review you a few times before giving you free reign. However, there is less oversight if you intend to work on your own branch, rather than the trunk.

How can I get an updated version of Moses ?

The best way is using git.

From the command line, type

 git pull

Or use whatever GUI client you have.

What changed in the latest release of Moses?

See Releases

I am an undergrad/masters student looking for a project in SMT. What should I do?

Email the mailing list with the title: 'Code monkey available. Will work for peanuts' ! Seriously, there's lots and lots of projects available. There has been 3-4 months projects in the past which have made a significant contribution to the community and have been integrated into the Moses toolkit. Your contribution will be grateful appreciated. Talk to your professor in the first instance, then talk to us. We maintain a list of interesting projects.

What do the 5 numbers in the phrase table mean?

See the section on phrase scoring

What OS does Moses run on?

It depends on which part.

The decoder can be compiled and run on Linux (32 and 64-bits), Windows, Cygwin, Mac OSX (Intel and PowerPC). Unconfirmed reports of the decoder running on Solaris and BSD too.

The training and tuning scripts are regularly run on Linux (32 and 64-bits), and occasionally on Mac (Intel). The whole of the Moses pipeline should also run on Windows under Cygwin, however, this has not been confirmed. If you are able to run under Windows/Cygwin, please let us know and we can update this FAQ.

When running on non-Linux platforms, beware of the following issues:

  • File system case-sensitivity
  • zcat, gzip command line programs missing
  • Old GIZA++ versions only compilable by specific gcc versions
  • Availability of Sun Grid Engine

Therefore, the only realistic OS to run the whole SMT pipeline on is Linux and Intel Mac.

Can I use Moses on Windows ?

Yes. Moses compiles and runs in Cygwin exactly the same way as on Linux

There are a proviso though:

Cygwin is 32-bit, even on 64 bit windows. The binary language models (KenLM, IRSTLM) need 64 bit to work with language models larger than about 2GB. This is the same as for 32 bit Linux.

Do I need a computer cluster to run experiments?

The Moses toolkit uses SGE (Sun Grid Engine) cluster to parallelize tasks. Even though it is not strictly necessary to use a cluster to run your experiments, it is highly advisable to get your experiments to run faster.

The most CPU intensive task is the tuning of the weights (MERT tuning). As an indication, a Europarl trained model, using 2000 sentences for tuning, takes 1-2 days to tune using 15 CPUs. 10-15 iterations are typical.

I have compiled Moses, but it segfaults when running.

Moses should not segfault, so the Moses developers would like to hear about it.

First of all, try to identify the fault yourself. The most common error is the ini file is not correct, or the sentence input is badly formatted.

If necessary, you can debug the system by stepping through the source code. We put a lot of effort into making the code easy to read and debug. Also, the decoder comes with Visual Studio and XCode project file to help you debug in a GUI environment.

If you still can not find the solution, email the mailing list. Its useful to attach the ini file, the output just before it crashes, and any other info that you think may be useful to help resolve the problem.

How do I add a new feature function to the decoder?

This is now documented in its own section.

Compiling with SRILM or IRSTLM produces errors.

Firstly, make sure SRILM/IRSTLM themselves have compiled successfully. You should see be a libflm.a/libdstruct.a etc (for SRILM), or libirstlm.a. If these are not available, then something went wrong. SRILM and IRSTLM are external libraries so the Moses developers have limited say and knowledge of them.

SRI or IRST LM both have their own mailing list where you can ask questions if you have problem compiling them. See here for details:

If Moses still does not compile successfully, look at the compile error to see where the compiler is trying to find these external libraries. Occasionally (especially when compiling on 64-bit machines), Moses expects the .a file in 1 sub-directory but they are in another. This is easily solved by moving copying the .a file to the place where Moses expect it to be.

I am trying to use Moses to create a web page to do translation.

There is a subproject in Moses, in contrib/web , which allows you to set up a web page to translate other web pages. Its written in Perl and the installation is non-trivial. Follow the instructions carefully.

It doesn't translate ad-hoc sentences. If you have some code which allow translation of ad-hoc sentences, please share it with us !

How can a create a system that translate both ways, ie. X-to-Y as well as Y-to-X ?

You need to do everything twice, and run 2 decoders. There is a lot of overlap between them, but the toolkit is designed to go 1 way at a time.

PhraseScore dies with signal 11 - why?

This may happen means because you have a null byte in your data. Look at line 2 of model/lex.f2e.

Try this to find lines with null bytes in your original data:

  grep -Pc '[\000]' <files ...>

(If your grep does not support Perl-style regular expression syntax (-P), you will have to express that a different way.)

If this turns out to be the problem, and you don't want to run GIZA++ again from scratch, you can try the following:

First go into working-dir/model and delete everything but the following:

  aligned.grow-diag-final-and
  aligned.0.fr
  aligned.0.en
  lex.0-0.n2f
  lex.0-0.f2n

Now run this fragment of Perl:

  perl -i.BAD -pe 's/[\000]/NULLBYTE/g;' aligned.0* lex.0*

This will replace every null byte in those four files, saving the old version out to *.BAD. (This may be overkill, for instance if only the foreign side has the problem.

Now restart the Moses training script with the same invocation as before, but tell it to start at step 5:

  train-model.perl ... --first-step 5

Does Moses do Hierarchical decoding, like Hiero etc?

Yes. Check the Syntax Tutorial.

Can I use Moses in proprietary software ?

Moses is licensed under the LGPL. See here for a thorough explanation of what this means.

Basically, if you are just using Moses unchanged, there are no license issues. You can also use the Moses library (libmoses.a) in your own applications. But if you want to distribute a modified version of Moses, you have to distribute the source code to the modifications.

GIZA++ crashes with error "parameter 'coocurrencefile' does not exist."

You have a version of GIZA++ which does not support cooccurrence files. To add support for cooccurrence files, you need to edit the GIZA++ Makefile and add the flag -DBINARY_SEARCH_FOR_TTABLE to CFLAGS_OPT. Then you should rebuild GIZA++.

Running regenerate-makefiles.sh gives me lots of errors about *GREP and *SED macros

You should not be running this script. Moses moved from autotools to bjam in Autumn 2011.

Running training I got the following error "*** buffer overflow detected ***: ../giza-pp/GIZA++-v2/GIZA++ terminated"

This error occurs during the word alignment step and is related to GIZA++, and not directly to the Moses Toolkit. Neverthless, the solution is described here.

I retrained my model and got different BLEU scores. Why?

In general, Machine Translation training is non-convex. this means that there are multiple solutions and each time you run a full training job, you will get different results. In particular, you will see different results when running GIZA++ (any flavour) and MERT.

The best way to deal with this (and most expensive) would be to run the full pipe-line, from scratch and multiple times. This will give you a feel for variance --differences in results. In general, variance arising from GIZA++ is less damaging than variance from MERT.

To reduce variance it is best to use as much data as possible at each stage. It is possible to reduce this variability by using better machine learning, but in general it will always be there.

Another strategy is to fix everything once you have a set of good weights and never rerun MERT. Should you need to change say the language model, you will then manually alter the associated weight. This will mean stability, but at the obvious cost of generality. it is also ugly.

See Clark et al. for a discussion of some of these issues.

I specified ranges for mert weights, but it returned weights which are outwith those ranges

The ranges that you pass to mert-moses.pl (using the --range argument) are only used in the random restarts, so serve to guide mert rather than restrict it.

Who do I ask if my question has not been answered by this FAQ?

Search the mailing list archive. If you still do not find the answer, then send questions to the mailing list 'moses-support'. However, you have to sign up before emailing.

Edit - History - Print
Page last modified on March 24, 2020, at 05:00 PM