EMNLP 2011 SIXTH WORKSHOP
ON STATISTICAL MACHINE TRANSLATION

Baseline System: Joshua

July 30 - 31, 2011
Edinburgh, UK

[HOME] | [TRANSLATION TASK] | [FEATURED TRANSLATION TASK] | [SYSTEM COMBINATION TASK] | [EVALUATION TASK]
[BASELINE SYSTEM] | [BASELINE SYSTEM 2]
[SCHEDULE] | [PAPERS] | [AUTHORS]

Joshua is an open-source MT system developed at Johns Hopkins University. It uses a hierarchical phrase-based translation model. What follows below are step-by-step instructions. This may look like a long list at first glance, but it should make it straightforward to build a machine translation system and all its components, and it should make the process of tuning, testing, and evaluating it transparent.

These instructions are adapted from Chris Callison-Burch's Joshua guide. More instructions and documentation for the use of Thrax, the translation model extractor, can be found on its github wiki.

If you have problems running this pipeline, please email jonny at cs dot jhu dot edu. Say something about WMT11 baseline in your subject line.

Installation

The joshua system has some requirements.

Install Additional Scripts

Prepare Data

Align Parallel Corpus

We give Berkeley instructions here; GIZA++ could also be used.

Build Language Model

Train Translation Model

This example will build a Hiero-style translation model.

Tuning (i.e., Optimize System Component Weights, a.k.a. Minimum Error Rate Training)

Run System on Development Test Set

Evaluation

supported by the EuroMatrixPlus project
P7-IST-231720-STP
funded by the European Commission
under Framework Programme 7