NAACL 2006 Workshop on Statistical Machine Translation

NAACL 2006 WORKSHOP ON
STATISTICAL MACHINE TRANSLATION

June 8 and 9, 2006
http://www.statmt.org/wmt06/

Translating documents from foreign languages into English (or between any two languages) by computer is one of the oldest goals in computational linguistics. Now, armed with vast amounts of digitally available translated text and powerful computers, we are witnessing significant progress toward achieving that goal. Statistical methods allow the analysis of parallel text corpora and the automatic construction of machine translation systems. Already, for some language pairs such as Chinese-English or Arabic-English, statistical machine translation (SMT) systems built at research labs outperform commercial systems.

The focus of this workshop is to use parallel corpora for machine translation. It can be seen as an attempt to repeat the success of the 2005 ACL Workshop on Parallel Text, organized last year, which featured a track on statistical machine translation and a shared task on building machine translation systems.

Recent experimentation has shown that the performance of SMT systems varies greatly with the source language. In this workshop we would like to encourage researchers to investigate ways to improve the performance of SMT systems for diverse languages, including morphologically complex languages (e.g., Finnish) and languages with partial free word order (e.g., German). These issues lie on the border of linguistic analysis and statistical modeling, and the ACL conference is the most appropriate forum to investigate them, as ACL has a long tradition of hosting high-quality research in both areas. Besides experimental work and system building, we also encourage linguistic analysis of problems of the current state of the art in statistical machine translation, as showcased by last year's ACL 2005 Workshop on Parallel Text shared task.

Topics of interest include, but are not limited to:

word-based, chunk-based, phrase-based, syntax-based SMT
using comparable corpora for SMT
using morphological and POS information for SMT
integration of rule-based MT and statistical MT
decoding
error analysis

SHARED TASK

In addition to submissions on the topics listed above, this track of the workshop features a shared task and we encourage participants to evaluate their approaches on that task. The shared task is to evaluate your approach to machine translation --- see the list of topics of interests above --- on the Europarl corpus.

A more detailed description of the shared task, the test and training corpora, a freely available MT system, and a number of other resources are available from http://www.statmt.org/wmt06/shared-task/. We also provide a baseline machine translation system, whose performance matches the best systems from last year's shared task.

SUBMISSION INFORMATION

Submissions will consist of regular full papers of max. 8 pages, formatted following the NAACL 2006 guidelines. Authors of regular full papers will be required to indicate a track for their submission. In addition, teams participating in the shared tasks will be invited to submit short papers (max. 4 pages) describing their systems. Both submission and review processes will be handled electronically.

IMPORTANT DATES

Regular paper submissions	March 17
Notification	April 7

(shared task) Results submissions	March 31
(shared task) Short paper submissions	April 7
(shared task) Notification	April 24

Camera-ready papers	May 2

ORGANIZERS

Philipp Koehn (University of Edinburgh)
Christof Monz (University of London)

INVITED TALK

Kevin Knight (ISI/University of Southern California)

PROGRAM COMMITTE

Yaser Al-Onaizan (IBM)
Bill Byrne (University of Cambridge)
Chris Callison-Burch (University of Edinburgh)
Francisco Casacuberta (University of Valencia)
David Chiang (University of Maryland)
Stephen Clark (Oxford University)
Marcello Federico (ITC-IRST)
George Foster (Canada National Research Council)
Alexander Fraser (ISI/University of Southern California)
Ulrich Germann (University of Toronto)
Jan Hajic (Charles University)
Kevin Knight (ISI/University of Southern California)
Greg Kondrak (University of Alberta)
Shankar Kumar (Google)
Philippe Langlais (University of Montreal)
Daniel Marcu (ISI/University of Southern California)
Dan Melamed (New York University)
Franz-Josef Och (Google)
Miles Osborne (University of Edinburgh)
Philip Resnik (University of Maryland)
Libin Shen (University of Pennsylvania)
Wade Shen (MIT-Lincoln Labs)
Michel Simard (Canada National Research Council)
Eiichiro Sumita (ATR Spoken Language Translation Research Laboratories)
Joerg Tiedemann (University of Groningen)
Christoph Tillmann (IBM)
Taro Watanabe (NTT)
Dekai Wu (HKUST)
Richard Zens (RWTH Aachen)

CONTACT

For questions, comments, etc. please send email to pkoehn@inf.ed.ac.uk.