Word Alignment with Linguistic Annotation

As with statistical machine translation models, most word alignment methods view sentences simply as strings of unique tokens, but linguistic annotation may be exploited to improve word alignment quality.

Word Alignment With Linguistic Annotation is the main subject of 24 publications. 13 are discussed here.

Topics in WordAlignment

Topics in WordBasedModels

Publications

Word alignment methods have been extended to exploit part-of-speech information (Chang and Chen, 1994; Tiedemann, 2003) in constraint methods (Tiedemann, 2004), translation divergences (Dorr et al., 2002), compositionality constraints (Simard and Langlais, 2003), and syntactic constraints (Cherry and Lin, 2003; Lin and Cherry, 2003; Zhao and Vogel, 2003).

Fraser and Marcu (2005) improve word alignments by stemming words in input and output language, thus generalizing over morphological variants. Syntactic constraints may derive from formal criteria of obtaining parallel tree structures, such as the ITG constraint, or from syntactic relationships between words on either side (Cherry and Lin, 2006).

Linguistic constraints may be modeled as priors in the generative model (Deng and Gao, 2007).

Hermjakob (2009) proposes a number of hand-crafted linguistic rules to improve word alignments obtained with traditional statistical methods.

Riesa et al. (2011) use syntactic features in a discriminative word aligner and stress that guidance from the parse structure makes search during training more manageable.

Benchmarks

Discussion

New Publications

Huang, Fei and Yates, Alexander (2014): Improving Word Alignment Using Linguistic Code Switching Data, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
add
@InProceedings{huang-yates:2014:EACL,
author = {Huang, Fei and Yates, Alexander},
title = {Improving Word Alignment Using Linguistic Code Switching Data},
booktitle = {Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Gothenburg, Sweden},
publisher = {Association for Computational Linguistics},
pages = {1--9},
url = {http://www.aclweb.org/anthology/E14-1001},
year = 2014
}
Huang and Yates (2014)
Franck Burlot and François Yvon (2015): Morphology-aware alignments for translation to and from a synthetic language, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{IWSLT-2015-Burlot,
author = {Franck Burlot and François Yvon},
title = {Morphology-aware alignments for translation to and from a synthetic language},
pages = {188-195},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Da Nang, Vietnam},
url = {http://www.mt-archive.info/15/IWSLT-2015-burlot.pdf},
month = {December},
year = 2015
}
Burlot and Yvon (2015)
Kondo, Shuhei and Duh, Kevin and Matsumoto, Yuji (2013): Hidden Markov Tree Model for Word Alignment, Proceedings of the Eighth Workshop on Statistical Machine Translation
add
@InProceedings{kondo-duh-matsumoto:2013:WMT,
author = {Kondo, Shuhei and Duh, Kevin and Matsumoto, Yuji},
title = {Hidden {Markov} Tree Model for Word Alignment},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {503--511},
url = {http://www.aclweb.org/anthology/W13-2263},
year = 2013
}
Kondo et al. (2013)
Toshiaki Nakazawa and Sadao Kurohashi (2008): Linguistically-motivated Tree-based Probabilistic Phrase Alignment, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{amta08:Nakazawa,
author = {Toshiaki Nakazawa and Sadao Kurohashi},
title = {Linguistically-motivated Tree-based Probabilistic Phrase Alignment},
url = {http://www-nagao.kuee.kyoto-u.ac.jp/~nakazawa/pubdb/AMTA2008/AMTA2008.pdf},
googlescholar = {12802904449954363637},
pages = {163--171},
booktitle = {Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Waikiki, Hawaii},
year = 2008
}
Nakazawa and Kurohashi (2008)
UNKNOWN CITATION 'iwslt04:TP_gispert'
Søgaard, Anders and Kuhn, Jonas (2009): Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation, Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009
add
@InProceedings{sogaard-kuhn:2009:SSST,
author = {S{\o}gaard, Anders and Kuhn, Jonas},
title = {Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation},
booktitle = {Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {19--27},
url = {http://www.aclweb.org/anthology/W09-2303},
year = 2009
}
Søgaard and Kuhn (2009)
Søgaard, Anders (2009): On the Complexity of Alignment Problems in Two Synchronous Grammar Formalisms, Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009
add
@InProceedings{sogaard:2009:SSST,
author = {S{\o}gaard, Anders},
title = {On the Complexity of Alignment Problems in Two Synchronous Grammar Formalisms},
booktitle = {Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {60--68},
url = {http://www.aclweb.org/anthology/W09-2308},
year = 2009
}
Søgaard (2009)
Luong, Minh-Thang and Kan, Min-Yen (2010): Enhancing Morphological Alignment for Translating Highly Inflected Languages, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
add
@InProceedings{luong-kan:2010:PAPERS,
author = {Luong, Minh-Thang and Kan, Min-Yen},
title = {Enhancing Morphological Alignment for Translating Highly Inflected Languages},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {743--751},
url = {http://www.aclweb.org/anthology/C10-1084},
year = 2010
}
Luong and Kan (2010)
Lee, Jae-Hee and Lee, Seung-Wook and Hong, Gumwon and Hwang, Young-Sook and Kim, Sang-Bum and Rim, Hae-Chang (2010): A Post-processing Approach to Statistical Word Alignment Reflecting Alignment Tendency between Part-of-speeches, Coling 2010: Posters
add
@InProceedings{lee-EtAl:2010:POSTERS1,
author = {Lee, Jae-Hee and Lee, Seung-Wook and Hong, Gumwon and Hwang, Young-Sook and Kim, Sang-Bum and Rim, Hae-Chang},
title = {A Post-processing Approach to Statistical Word Alignment Reflecting Alignment Tendency between Part-of-speeches},
booktitle = {Coling 2010: Posters},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {623--629},
url = {http://www.aclweb.org/anthology/C10-2071},
year = 2010
}
Lee et al. (2010)
Jin-Xia Huang and Key-Sun Choi (2000): Chinese-Korean Word Alignment Based on Linguistic Comparison, Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics (ACL)
add
@InProceedings{Huang:2000,
author = {Jin-Xia Huang and Key-Sun Choi},
title = {{Chinese-Korean} Word Alignment Based on Linguistic Comparison},
url = {http://www.aclweb.org/anthology/P00-1050},
booktitle = {Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2000
}
Huang and Choi (2000)
Ozdowska, Sylwia (2005): Using Bilingual Dependencies to Align Words in English/French Parallel Corpora, Proceedings of the ACL Student Research Workshop
add
@InProceedings{ozdowska:2005:Student,
author = {Ozdowska, Sylwia},
title = {Using Bilingual Dependencies to Align Words in {E}nglish/{F}rench Parallel Corpora},
booktitle = {Proceedings of the ACL Student Research Workshop},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {127--132},
url = {http://www.aclweb.org/anthology/P/P05/P05-2022},
year = 2005
}
Ozdowska (2005)
Grzegorz Kondrak (2005): Cognates and Word Alignment in Bitexts, Proceedings of the Tenth Machine Translation Summit (MT Summit X)
add
@InProceedings{Kondrak:2005:MTS,
author = {Grzegorz Kondrak},
title = {Cognates and Word Alignment in Bitexts},
url = {http://mt-archive.info/MTS-2005-Kondrak.pdf},
googlescholar = {10504796889953111683},
booktitle = {Proceedings of the Tenth Machine Translation Summit (MT Summit X)},
month = {September},
address = {Phuket, Thailand},
year = 2005
}
Kondrak (2005)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Word Alignment with Linguistic Annotation

Publications

Benchmarks

Discussion

Related Topics

New Publications