Compounds

While the words in English compounds such as machine translation remain separate, others merge them into a single new word, a highly productive process that leads to large vocabulary sizes.

Compounds is the main subject of 22 publications. 18 are discussed here.

Topics in LinguisticProblems

Publications

Translating from compounding languages like German requires compound splitting methods (Brown, 2002). A frequency-based method, supported by linguistic clues is introduced by Koehn and Knight (2003). This method is refined by Stymne (2008), such as by addressing more of the morphological changes that occur due to compounding. Macherey et al. (2011) learn the required morphological changes. Compound splitting can also be provided by morphological analysers (Nießen and Ney, 2000; Holmqvist et al., 2007). Fritzinger and Fraser (2010) combine linguistic analysis with corpus-driven statistics. Weller et al. (2014) also consider the semantic similarity (using distributional models) between the compound and its potential parts to guide splitting decisions.

Since there are multiple ways to split potential compounds, Dyer (2009) provides multiple splits to the decoder in an input lattice. Wuebker and Ney (2012) consider multiple splits also during phrase model training.

When translating into compounding languages, compounds have to be generated. Stymne et al. (2013) provide an extensive overview. Popovic et al. (2006) split compounds during training and merge them in post-processing. Stymne et al. (2008) also allow the creation of novel words by compounding. Stymne (2009) compares various methods to mark split points, and consider the part of speech of split words. Stymne and Cancedda (2011) extend this approach further by a Conditional Random Field (CRF) classifier that detects merge points. This work was integrated by Fraser et al. (2012) as a post-processing step into a machine translation system. Armed with both a corpus based approach and a morphological analyzer to split words, Cap et al. (2014) build a CRF classifier for merge points that also includes features about the source language, such as that the two words are part of the same base noun phrase.

Botha et al. (2012) develop a hierarchical Pitman-Yor language model to better handle compounds.

Benchmarks

Discussion

New Publications

Cap, Fabienne and Nirmal, Manju and Weller, Marion and Schulte im Walde, Sabine (2015): How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation, Proceedings of the 11th Workshop on Multiword Expressions
add
@InProceedings{cap-EtAl:2015:MWE,
author = {Cap, Fabienne and Nirmal, Manju and Weller, Marion and Schulte im Walde, Sabine},
title = {How to Account for Idiomatic {German} Support Verb Constructions in Statistical Machine Translation},
booktitle = {Proceedings of the 11th Workshop on Multiword Expressions},
month = {June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {19--28},
url = {http://www.aclweb.org/anthology/W15-0903},
year = 2015
}
Cap et al. (2015)
Matthews, Austin and Schlinger, Eva and Lavie, Alon and Dyer, Chris (2016): Synthesizing Compound Words for Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{matthews-EtAl:2016:P16-1,
author = {Matthews, Austin and Schlinger, Eva and Lavie, Alon and Dyer, Chris},
title = {Synthesizing Compound Words for Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1085--1094},
url = {http://www.aclweb.org/anthology/P16-1103},
year = 2016
}
Matthews et al. (2016)
Marcin Junczys-Dowmunt and Bruno Pouliquen (2014): SMT of German patents at WIPO: decompounding and verb structure pre-reordering, Proceedings of 17th Annual conference of the European Association for Machine Translation mentioned in Deployment and Compounds
add
@inproceedings{eamt-2014-Junczyns-Dowmunt,
author = {Marcin Junczys-Dowmunt and Bruno Pouliquen},
title = {SMT of {German} patents at WIPO: decompounding and verb structure pre-reordering},
booktitle = {Proceedings of 17th Annual conference of the European Association for Machine Translation},
pages = {217-220},
url = {http://www.mt-archive.info/10/EAMT-2014-Junczyns-Dowmunt.pdf},
location = {Dubrovnik, Croatia},
year = 2014
}
Junczys-Dowmunt and Pouliquen (2014)
Pu, Xiao and Mascarell, Laura and Popescu-Belis, Andrei and Fishel, Mark and Luong, Ngoc-Quang and Volk, Martin (2015): Leveraging Compounds to Improve Noun Phrase Translation from Chinese and German, Proceedings of the ACL-IJCNLP 2015 Student Research Workshop
add
@InProceedings{pu-EtAl:2015:SRW,
author = {Pu, Xiao and Mascarell, Laura and Popescu-Belis, Andrei and Fishel, Mark and Luong, Ngoc-Quang and Volk, Martin},
title = {Leveraging Compounds to Improve Noun Phrase Translation from {Chinese} and German},
booktitle = {Proceedings of the ACL-IJCNLP 2015 Student Research Workshop},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {8--15},
url = {http://www.aclweb.org/anthology/P15-3002},
year = 2015
}
Pu et al. (2015)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Compounds

Publications

Benchmarks

Discussion

Related Topics

New Publications