Alignment of Subsentential Units

Sometimes the goal is not to align all the words, but the targeted alignment of specific kinds of words and phrases.

Alignment Of Subsentential Units is the main subject of 42 publications. 33 are discussed here.

Topics in WordAlignment

Topics in WordBasedModels

Publications

Instead of tackling the full word alignment problem, more targeted work focuses on terminology extraction, for instance the extraction of domain-specific lexicons (Resnik and Melamed, 1997), noun phrases (Kupiec, 1993; Eijk, 1993; Fung, 1995), collocations (Smadja et al., 1996; Echizen-ya et al., 2003; Orliac and Dillinger, 2003), non-compositional compounds (Melamed, 1997), named entities (Moore, 2003), technical terms (Macken et al., 2008), or other word sequences (Kitamura and Matsumoto, 1996; Ahrenberg et al., 1998; Martinez et al., 1999; Sun et al., 2000; Moore, 2001; Yamamoto et al., 2001; Baobao et al., 2002; Wang and Zhou, 2002). Translation for noun phrases may be learned by checking automatically translated candidate translations against frequency counts on the web (Robitaille et al., 2006; Tonoike et al., 2006).

There are many methods to extract subtrees from a parallel corpus, aided either by a word-aligned corpus or a bilingual lexicon and a heuristic to disambiguate alignment points. For instance, such efforts can be traced back to work on the alignment of dependency structures by Matsumoto et al. (1993). Related to this are efforts to align syntactic phrases (Yamamoto and Matsumoto, 2000; Imamura, 2001; Imamura et al., 2003; Imamura et al., 2004), hierarchical syntactic phrases (Watanabe and Sumita, 2002; Watanabe et al., 2002), and phrase structure tree fragments (Groves et al., 2004) as well as methods to extract transfer rules, as used in traditional rule-based machine translation systems (Lavoie et al., 2001). The degree to which alignments are consistent with the syntactic structure may be measured by distance in the dependency tree (Nakazawa et al., 2007). Tinsley et al. (2007) use a greedy algorithm that uses a probabilistic lexicon trained with the IBM models to align subtrees in a parallel corpus parsed on both sides. Zhechev and Way (2008) compare it against a similar algorithm. Lavie et al. (2008) use symmetrized IBM model alignments for the same purpose and discuss effects of alignment and parse quality.

Benchmarks

Discussion

New Publications

Santanu Pal and Sudip Kumar Naskar and Sivaji Bandyopadhyay (2013): MWE Alignment in Phrase Based Statistical Machine Translation, Machine Translation Summit XIV
add
@inproceedings{MTS2013-Pal,
author = {Santanu Pal and Sudip Kumar Naskar and Sivaji Bandyopadhyay},
title = {MWE Alignment in Phrase Based Statistical Machine Translation},
url = {http://www.mt-archive.info/10/MTS-2013-Pal.pdf},
pages = {61--68},
booktitle = {Machine Translation Summit XIV},
year = 2013
}
Pal et al. (2013)
Adrien Lardilleux and Yves Lepage (2008): A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{amta08:Lardilleux,
author = {Adrien Lardilleux and Yves Lepage},
title = {A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method},
url = {http://www.mt-archive.info/AMTA-2008-Lardilleux.pdf},
googlescholar = {12212098436929465525},
pages = {125--132},
booktitle = {Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Waikiki, Hawaii},
year = 2008
}
Lardilleux and Lepage (2008)
Anton Bryl and Josef van Genabith (2010): f-align: An Open-Source Alignment Tool for LFG f-Structures, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
add
@inproceedings{AMTA-2010-Bryl,
author = {Anton Bryl and Josef van Genabith},
title = {f-align: An Open-Source Alignment Tool for {LFG} f-Structures},
url = {http://www.mt-archive.info/AMTA-2010-Bryl.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Bryl and Genabith (2010)
Nakazawa, Toshiaki and Kurohashi, Sadao (2009): Statistical Phrase Alignment Model Using Dependency Relation Probability, Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009
add
@InProceedings{nakazawa-kurohashi:2009:SSST,
author = {Nakazawa, Toshiaki and Kurohashi, Sadao},
title = {Statistical Phrase Alignment Model Using Dependency Relation Probability},
booktitle = {Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {10--18},
url = {http://www.aclweb.org/anthology/W09-2302},
year = 2009
}
Nakazawa and Kurohashi (2009)
Sun, Jun and Zhang, Min and Tan, Chew Lim (2010): Discriminative Induction of Sub-Tree Alignment using Limited Labeled Data, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
add
@InProceedings{sun-zhang-tan:2010:PAPERS,
author = {Sun, Jun and Zhang, Min and Tan, Chew Lim},
title = {Discriminative Induction of Sub-Tree Alignment using Limited Labeled Data},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {1047--1055},
url = {http://www.aclweb.org/anthology/C10-1118},
year = 2010
}
Sun et al. (2010)
A. Lardilleux and F. Yvon and Y. Lepage (2012): Hierarchical Sub-sentential Alignment with Anymalign, Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)
add
@inproceedings{EAMT-2012-Lardilleux,
author = {A. Lardilleux and F. Yvon and Y. Lepage},
title = {Hierarchical Sub-sentential Alignment with {Anymalign}},
url = {http://www.mt-archive.info/EAMT-2012-Lardilleux},
pages = {279-286},
booktitle = {Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)},
location = {Trento, Italy},
editor = {Mauro Cettolo and Marcello Federico and Lucia Specia and Andy Way},
year = 2012
}
Lardilleux et al. (2012)
Pal, Santanu and Bandyopadhyay, Sivaji (2012): Bootstrapping Method for Chunk Alignment in Phrase Based SMT, Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
add
@InProceedings{pal-bandyopadhyay:2012:ESIRMT-HyTra2012,
author = {Pal, Santanu and Bandyopadhyay, Sivaji},
title = {Bootstrapping Method for Chunk Alignment in Phrase Based SMT},
booktitle = {Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)},
month = {April},
address = {Avignon, France},
publisher = {Association for Computational Linguistics},
pages = {93--100},
url = {http://www.aclweb.org/anthology/W12-0113},
year = 2012
}
Pal and Bandyopadhyay (2012)
Le Sun and Youbing Jin and Lin Du and Yufang Sun (2000): Word Alignment of English-Chinese Bilingual Corpus Based on Chucks, 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
add
@Inproceedings{Le:2000,
author = {Le Sun and Youbing Jin and Lin Du and Yufang Sun},
title = {Word Alignment of {English-Chinese} Bilingual Corpus Based on Chucks},
url = {http://acl.ldc.upenn.edu/W/W00/W00-1314.pdf},
booktitle = {2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora},
year = 2000
}
Sun et al. (2000)
Feifan Liu and Qianli Jin and Jun Zhao and Bo Xu (2004): Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing, Proceedings of the Internation Joint Conference on Natural Language Processing (IJCNLP)
add
@inproceedings{Liu:2004,
author = {Feifan Liu and Qianli Jin and Jun Zhao and Bo Xu},
title = {Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing},
url = {http://www.nlpr.ia.ac.cn/2005papers/gjhy/gh70.pdf},
booktitle = {Proceedings of the Internation Joint Conference on Natural Language Processing (IJCNLP)},
year = 2004
}
Liu et al. (2004)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Alignment of Subsentential Units

Publications

Benchmarks

Discussion

Related Topics

New Publications