ACL 2007 Second Workshop on Statistical Machine Translation

ACL 2007
SECOND WORKSHOP ON
STATISTICAL MACHINE TRANSLATION

Saturday, June 23, 2007
http://www.statmt.org/wmt07/

[HOME] [SHARED TASK] | RESULTS] | [BASELINE SYSTEM] | [PROCEEDINGS] | [SCHEDULE]

Translating documents between two different languages by computer has been one of the oldest goals in computational linguistics. Now, armed with vast amounts of translated text and powerful computers, we are witnessing significant progress toward achieving that goal.

Statistical methods allow the analysis of parallel corpora and the automatic construction of machine translation systems. For some language pairs such as Chinese-English or Arabic-English, statistical machine translation (SMT) systems built at research labs currently outperform commercial systems.

This workshop focuses on statistical and hybrid methods for machine translation and features a shared translation task. The evaluation of machine translation systems is a growing field and this workshop will also focus on determining the best methodology for evaluating translation quality both with automatic metrics and through subjective human evaluation.

This workshop builds on the success of the 2005 ACL Workshop on Parallel Text and the 2006 NAACL Workshop on Statistical Machine Translation.

Topics of interest include, but are not limited to:

word-based, phrase-based, syntax-based SMT
using comparable corpora for SMT
using morphological and POS information for SMT
integration of rule-based MT and statistical MT
decoding
error analysis
evaluation techniques for MT

SHARED TASK

In addition to soliciting research papers on the topics listed above, the workshop will also feature a shared translation task. The workshop organizers will provide common test sets for translation between four language pairs in both directions:

English-German and German-English
English-French and French-English
English-Spanish and Spanish-English
English-Czech and Czech-English

Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora, and additional out-of-domain corpora.

All participants who submit entries will have their translations evaluated. In addition to automatic scoring, we will also evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared task to manually judge some of the submitted translations.

A more detailed description of the shared task (including information about the test and training corpora, a freely available MT system, and a number of other resources) is available from http://www.statmt.org/wmt07/shared-task.html. We also provide a baseline machine translation system, whose performance matches the best systems from last year's shared task.

SUBMISSION INFORMATION

Submissions will consist of regular full papers of max. 8 pages, formatted following the ACL 2007 guidelines. Authors of regular full papers will be required to indicate a track for their submission. In addition, teams participating in the shared tasks will be invited to submit short papers (max. 4 pages) describing their systems. Both submission and review processes will be handled electronically.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop, so that their experiments can be repeated by others using these publicly available corpora.

Given the overlap of the paper submission timeframe with that of EMNLP 2007, we accept papers that are also submitted to the EMNLP conference, but would like to know as soon as possible after the notification if an accepted paper will be withdrawn.

IMPORTANT DATES

Regular paper submissions	April 2

(shared task) Results submissions	April 6
(shared task) Short paper submissions	April 13

Notification	April 23
Camera-ready papers	May 9

ORGANIZERS

Philipp Koehn (University of Edinburgh)
Christof Monz (University of London)
Cameron Shaw Fordyce (Center for the Evaluation of Language and Communication Technologies)
Chris Callison-Burch (University of Edinburgh)

INVITED TALK

Jean Senellart (Systran)

PROGRAM COMMITTEE (partial list)

Lars Ahrenberg (Linköping University)
Francisco Casacuberta (University of Valencia)
Colin Cherry (University of Alberta)
Stephen Clark (Oxford University)
Brooke Cowan (Massachusetts Institute of Technology)
Mona Diab (Columbia University)
Chris Dyer (University of Maryland)
Andreas Eisele (University Saarbrücken)
Marcello Federico (ITC-IRST)
George Foster (Canada National Research Council)
Alex Fraser (ISI/University of Southern California)
Ulrich Germann (University of Toronto)
Rebecca Hwa (University of Pittburgh)
Kevin Knight (ISI/University of Southern California)
Philippe Langlais (University of Montreal)
Alon Lavie (Carnegie Melon University)
Lori Levin (Carnegie Mellon University)
Daniel Marcu (ISI/University of Southern California)
Bob Moore (Microsoft Research)
Miles Osborne (University of Edinburgh)
Michel Simard (Canada National Research Council)
Eiichiro Sumita (NICT/ATR)
Jörg Tiedemann (University of Groningen)
Christoph Tillmann (IBM Research)
Dan Tufiş (Romanian Academy)
Taro Watanabe (NTT)
Dekai Wu (HKUST)
Richard Zens (RWTH Aachen)

CONTACT

For questions, comments, etc. please send email to pkoehn@inf.ed.ac.uk.

supported by the EuroMatrix project, P6-IST-5-034291-STP
funded by the European Commission under Framework Programme 6