Search Descriptions

Main Topics

Search Publications






The large number of words in natural language vocabulary is a challenge for the vector space representations used in neural networks. Several strategies have been explored to handle large vocabulary or resort to sub-word representations of words.

Vocabulary is the main subject of 23 publications.


A significant limitation of neural machine translation models is the computational burden to support very large vocabularies. To avoid this, typically the vocabulary is reduced to a shortlist of, say, 20,000 words, and the remaining token are replaced with the unknown word token "UNK". To translate such an unknown word, Luong et al. (2015); Jean et al. (2015) resort to a separate dictionary. Arthur et al. (2016) argue that neural translation models are worse for rare words and interpolate a traditional probabilistic bilingual dictionary with the prediction of the neural machine translation model. They use the attention mechanism to link each target word to a distribution of source words and weigh the word translations accordingly.
Source words such as names and numbers may also be directly copied into the target. Gulcehre et al. (2016) use a so-called switching network to predict either a traditional translation operation or a copying operation aided by a softmax layer over the source sentence. They preprocess the training data to change some target words into word positions of copied source words. Similarly, Gu et al. (2016) augment the word prediction step of the neural translation model to either translate a word or copy a source word. They observe that the attention mechanism is mostly driven by semantics and the language model in the case of word translation, but by location in case of copying.
To speed up training, Mi et al. (2016) use traditional statistical machine translation word and phrase translation models to filter the target vocabulary for mini batches.
Sennrich et al. (2016) split up all words to sub-word units, using character n-gram models and a segmentation based on the byte pair encoding compression algorithm.



Related Topics

New Publications

Character-Based Models

  • Costa-jussà et al. (2016)
  • Costa-jussà and Fonollosa (2016)
  • Yang et al. (2016)
  • Chung et al. (2016)
  • Luong and Manning (2016)
  • Lee et al. (2016)
  • Eriguchi et al. (2016)

Hybrid / Use of Translation Lexicons

  • Neubig (2016)
  • Wang et al. (2017)
  • Niehues et al. (2016)
  • Wang et al. (2016)
  • Luong et al. (2014)
  • Jean et al. (2014)
  • Hashimoto et al. (2016)
  • Long et al. (2016)
  • Chitnis and DeNero (2015)