Vocabulary
The large number of words in natural language vocabulary is a challenge for the vector space representations used in neural networks. Several strategies have been explored to handle large vocabulary or resort to sub-word representations of words.
Vocabulary is the main subject of 22 publications.
Publications
A significant limitation of neural machine translation models is the computational burden to support very large vocabularies. To avoid this, typically the vocabulary is reduced to a shortlist of, say, 20,000 words, and the remaining tokens are replaced with the unknown word token "UNK". To translate such an unknown word,
Luong, Thang and Sutskever, Ilya and Le, Quoc and Vinyals, Oriol and Zaremba, Wojciech (2015):
Addressing the Rare Word Problem in Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@InProceedings{luong-EtAl:2015:ACL-IJCNLP,
author = {Luong, Thang and Sutskever, Ilya and Le, Quoc and Vinyals, Oriol and Zaremba, Wojciech},
title = {Addressing the Rare Word Problem in Neural Machine Translation},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {11--19},
url = {
http://www.aclweb.org/anthology/P15-1002},
year = 2015
}
Luong et al. (2015);
Jean, Sébastien and Cho, Kyunghyun and Memisevic, Roland and Bengio, Yoshua (2015):
On Using Very Large Target Vocabulary for Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@InProceedings{jean-EtAl:2015:ACL-IJCNLP,
author = {Jean, S\'{e}bastien and Cho, Kyunghyun and Memisevic, Roland and Bengio, Yoshua},
title = {On Using Very Large Target Vocabulary for Neural Machine Translation},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {1--10},
url = {
http://www.aclweb.org/anthology/P15-1001},
year = 2015
}
Jean et al. (2015) resort to a separate dictionary.
Arthur, Philip and Neubig, Graham and Nakamura, Satoshi (2016):
Incorporating Discrete Translation Lexicons into Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{arthur-neubig-nakamura:2016:EMNLP2016,
author = {Arthur, Philip and Neubig, Graham and Nakamura, Satoshi},
title = {Incorporating Discrete Translation Lexicons into Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1557--1567},
url = {
https://aclweb.org/anthology/D16-1162},
year = 2016
}
Arthur et al. (2016) argue that neural translation models are worse for rare words and interpolate a traditional probabilistic bilingual dictionary with the prediction of the neural machine translation model. They use the attention mechanism to link each target word to a distribution of source words and weigh the word translations accordingly.
Source words such as names and numbers may also be directly copied into the target.
Gulcehre, Caglar and Ahn, Sungjin and Nallapati, Ramesh and Zhou, Bowen and Bengio, Yoshua (2016):
Pointing the Unknown Words, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{gulcehre-EtAl:2016:P16-1,
author = {Gulcehre, Caglar and Ahn, Sungjin and Nallapati, Ramesh and Zhou, Bowen and Bengio, Yoshua},
title = {Pointing the Unknown Words},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {140--149},
url = {
http://www.aclweb.org/anthology/P16-1014},
year = 2016
}
Gulcehre et al. (2016) use a so-called switching network to predict either a traditional translation operation or a copying operation aided by a softmax layer over the source sentence. They preprocess the training data to change some target words into word positions of copied source words. Similarly,
Gu, Jiatao and Lu, Zhengdong and Li, Hang and Li, Victor O.K. (2016):
Incorporating Copying Mechanism in Sequence-to-Sequence Learning, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{gu-EtAl:2016:P16-1,
author = {Gu, Jiatao and Lu, Zhengdong and Li, Hang and Li, Victor O.K.},
title = {Incorporating Copying Mechanism in Sequence-to-Sequence Learning},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1631--1640},
url = {
http://www.aclweb.org/anthology/P16-1154},
year = 2016
}
Gu et al. (2016) augment the word prediction step of the neural translation model to either translate a word or copy a source word. They observe that the attention mechanism is mostly driven by semantics and the language model in the case of word translation, but by location in case of copying.
Sennrich, Rico and Haddow, Barry and Birch, Alexandra (2016):
Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{sennrich-haddow-birch:2016:P16-12,
author = {Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
title = {Neural Machine Translation of Rare Words with Subword Units},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1715--1725},
url = {
http://www.aclweb.org/anthology/P16-1162},
year = 2016
}
Sennrich et al. (2016) split up all words to sub-word units, using character n-gram models and a segmentation based on the byte pair encoding compression algorithm.
Benchmarks
Discussion
Related Topics
New Publications
Character-Based Models
Costa-jussà, Marta R. and España-Bonet, Cristina and Madhyastha, Pranava and Escolano, Carlos and Fonollosa, José A. R. (2016):
The TALP--UPC Spanish--English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System, Proceedings of the First Conference on Machine Translation

@InProceedings{costajussa-EtAl:2016:WMT,
author = {Costa-juss\`{a}, Marta R. and Espa\~{n}a-Bonet, Cristina and Madhyastha, Pranava and Escolano, Carlos and Fonollosa, Jos\'{e} A. R.},
title = {The TALP--UPC Spanish--English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {463--468},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2336},
year = 2016
}
Costa-jussà et al. (2016)
Costa-jussà, Marta R. and Fonollosa, José A. R. (2016):
Character-based Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{costajussa-fonollosa:2016:P16-2,
author = {Costa-juss\`{a}, Marta R. and Fonollosa, Jos\'{e} A. R.},
title = {Character-based Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {357--361},
url = {
http://anthology.aclweb.org/P16-2058},
year = 2016
}
Costa-jussà and Fonollosa (2016)
Yang, Zhen and Chen, Wei and Wang, Feng and Xu, Bo (2016):
A Character-Aware Encoder for Neural Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

@InProceedings{yang-EtAl:2016:COLING,
author = {Yang, Zhen and Chen, Wei and Wang, Feng and Xu, Bo},
title = {A Character-Aware Encoder for Neural Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3063--3070},
url = {
http://aclweb.org/anthology/C16-1288},
year = 2016
}
Yang et al. (2016)
Chung, Junyoung and Cho, Kyunghyun and Bengio, Yoshua (2016):
A Character-level Decoder without Explicit Segmentation for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{chung-cho-bengio:2016:P16-1,
author = {Chung, Junyoung and Cho, Kyunghyun and Bengio, Yoshua},
title = {A Character-level Decoder without Explicit Segmentation for Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1693--1703},
url = {
http://www.aclweb.org/anthology/P16-1160},
year = 2016
}
Chung et al. (2016)
Luong, Minh-Thang and Manning, Christopher D. (2016):
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{luong-manning:2016:P16-1,
author = {Luong, Minh-Thang and Manning, Christopher D.},
title = {Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1054--1063},
url = {
http://www.aclweb.org/anthology/P16-1100},
year = 2016
}
Luong and Manning (2016)
Jason Lee and Kyunghyun Cho and Thomas Hofmann (2016):
Fully Character-Level Neural Machine Translation without Explicit Segmentation, CoRR

@article{DBLP:journals/corr/LeeCH16,
author = {Jason Lee and Kyunghyun Cho and Thomas Hofmann},
title = {Fully Character-Level Neural Machine Translation without Explicit Segmentation},
journal = {CoRR},
volume = {abs/1610.03017},
url = {
http://arxiv.org/abs/1610.03017},
timestamp = {Wed, 02 Nov 2016 09:51:26 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/LeeCH16},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2016
}
Lee et al. (2016)
Eriguchi, Akiko and Hashimoto, Kazuma and Tsuruoka, Yoshimasa (2016):
Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

@InProceedings{eriguchi-hashimoto-tsuruoka:2016:WAT2016,
author = {Eriguchi, Akiko and Hashimoto, Kazuma and Tsuruoka, Yoshimasa},
title = {Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {175--183},
url = {
http://aclweb.org/anthology/W16-4617},
year = 2016
}
Eriguchi et al. (2016)
Hybrid / Use of Translation Lexicons
Neubig, Graham (2016):
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
mentioned in Attention Model and Vocabulary@InProceedings{neubig:2016:WAT2016,
author = {Neubig, Graham},
title = {Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {119--125},
url = {
http://aclweb.org/anthology/W16-4610},
year = 2016
}
Neubig (2016)
Wang, Weiyue and Alkhouli, Tamer and Zhu, Derui and Ney, Hermann (2017):
Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{wang-EtAl:2017:Short1,
author = {Wang, Weiyue and Alkhouli, Tamer and Zhu, Derui and Ney, Hermann},
title = {Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {125--131},
url = {
http://aclweb.org/anthology/P17-2020},
year = 2017
}
Wang et al. (2017)
Niehues, Jan and Cho, Eunah and Ha, Thanh-Le and Waibel, Alex (2016):
Pre-Translation for Neural Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

@InProceedings{niehues-EtAl:2016:COLING,
author = {Niehues, Jan and Cho, Eunah and Ha, Thanh-Le and Waibel, Alex},
title = {Pre-Translation for Neural Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {1828--1836},
url = {
http://aclweb.org/anthology/C16-1172},
year = 2016
}
Niehues et al. (2016)
Wang, Xing and Lu, Zhengdong and Tu, Zhaopeng and Li, Hang and Xiong, Deyi and Zhang, Min (2016):
Neural Machine Translation Advised by Statistical Machine Translation, arXiv preprint arXiv:1610.05150

@article{wang2016neural,
author = {Wang, Xing and Lu, Zhengdong and Tu, Zhaopeng and Li, Hang and Xiong, Deyi and Zhang, Min},
title = {Neural Machine Translation Advised by Statistical Machine Translation},
journal = {arXiv preprint arXiv:1610.05150},
url = {
https://arxiv.org/pdf/1610.05150v2.pdf},
year = 2016
}
Wang et al. (2016)
Thang Luong and Ilya Sutskever and Quoc V. Le and Oriol Vinyals and Wojciech Zaremba (2014):
Addressing the Rare Word Problem in Neural Machine Translation, CoRR

@article{DBLP:journals/corr/LuongSLVZ14,
author = {Thang Luong and Ilya Sutskever and Quoc V. Le and Oriol Vinyals and Wojciech Zaremba},
title = {Addressing the Rare Word Problem in Neural Machine Translation},
journal = {CoRR},
volume = {abs/1410.8206},
url = {
http://arxiv.org/abs/1410.8206},
timestamp = {Sun, 02 Nov 2014 11:25:59 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/LuongSLVZ14},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2014
}
Luong et al. (2014)
Sébastien Jean and Kyunghyun Cho and Roland Memisevic and Yoshua Bengio (2014):
On Using Very Large Target Vocabulary for Neural Machine Translation, CoRR

@article{DBLP:journals/corr/JeanCMB14,
author = {S{\'{e}}bastien Jean and Kyunghyun Cho and Roland Memisevic and Yoshua Bengio},
title = {On Using Very Large Target Vocabulary for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1412.2007},
url = {
http://arxiv.org/abs/1412.2007},
timestamp = {Thu, 01 Jan 2015 19:51:08 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/JeanCMB14},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2014
}
Jean et al. (2014)
Hashimoto, Kazuma and Eriguchi, Akiko and Tsuruoka, Yoshimasa (2016):
Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

@InProceedings{hashimoto-eriguchi-tsuruoka:2016:WAT2016,
author = {Hashimoto, Kazuma and Eriguchi, Akiko and Tsuruoka, Yoshimasa},
title = {Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {75--83},
url = {
http://aclweb.org/anthology/W16-4605},
year = 2016
}
Hashimoto et al. (2016)
Long, Zi and Utsuro, Takehito and Mitsuhashi, Tomoharu and Yamamoto, Mikio (2016):
Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

@InProceedings{long-EtAl:2016:WAT2016,
author = {Long, Zi and Utsuro, Takehito and Mitsuhashi, Tomoharu and Yamamoto, Mikio},
title = {Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {47--57},
url = {
http://aclweb.org/anthology/W16-4602},
year = 2016
}
Long et al. (2016)
Chitnis, Rohan and DeNero, John (2015):
Variable-Length Word Encodings for Neural Translation Models, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

@InProceedings{chitnis-denero:2015:EMNLP,
author = {Chitnis, Rohan and DeNero, John},
title = {Variable-Length Word Encodings for Neural Translation Models},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {2088--2093},
url = {
http://aclweb.org/anthology/D15-1249},
year = 2015
}
Chitnis and DeNero (2015)