Search Descriptions

Main Topics

Search Publications





Neural Network Models

Neural network models have received little attention until a recent explosion of research in the 2010s, caused by their success in vision and speech recognition. Such models allow for clustering of related words and flexible use of context.

Neural Network Models and its 11 sub-topics are the main subject of 329 publications.


Basic models to use neural networks for machine translation were already proposed in the 20th century (Waibel et al., 1991), but not seriously pursued due to lack of computational resources. In fact, quite similar models as the ones currently in use date back to that era (Forcada and Ñeco, 1997; Castaño et al., 1997).
Schwenk et al. (2006) introduce neural language models to machine translation (also called "continuous space language models"), and use them in re-ranking, similar to the earlier work in speech recognition.
The first competitive fully neural machine translation system participated in the WMT evaluation campaign in 2015 (Jean et al., 2015), reaching state-of-the-art performance at IWLST 2015 (Luong and Manning, 2015) and WMT 2016 (Sennrich et al., 2016), The same year, Systran (Crego et al., 2016), Google (Wu et al., 2016), and WIPO (Junczys-Dowmunt et al., 2016) reported large-scale deployments.
Neubig (2017) presents a hands-on tutorial on neural machine translation models.
Technical Background: A good introduction to modern neural network research is the textbook Deep Learning (Goodfellow et al., 2016). There is also book on neural network methods applied to the natural language processing in general (Goldberg, 2017).
A number of key techniques that have been recently developed have entered the standard repertoire of neural machine translation research. Training is made more robust by methods such as drop-out (Srivastava et al., 2014), where during training intervals a number of nodes are randomly masked. To avoid exploding or vanishing gradients during back-propagation over several layers, gradients are typically clipped (Pascanu et al., 2013). Layer normalization (Lei Ba et al., 2016) has similar motivations, by ensuring that node values are within reasonable bounds.
An active topic of research are optimization methods that adjust the learning rate of gradient descent training. Popular methods are Adagrad (Duchi et al., 2011), Adadelta (Zeiler, 2012), and currently Adam (Kingma and Ba, 2015).
Analysis and Visualization To better understand the behavior of neural machine translation models, researchers compared performance to phrase-based systems, explored linguistic abilities of the models, and developed method to visualize their processing. Shi et al. (2016) correlated activation values of specific nodes in the state of a simple LSTM encoder-decoder translation model (without attention) with the length of the output and discovered nodes that count the number of words to ensure proper output length.
Bentivogli et al. (2016) considered different linguistic categories when comparing the performance of neural vs. statistical machine translation systems for English-German. Toral and Sánchez-Cartagena (2017) compared different broad aspects such as fluency and reordering for nine language directions. Sennrich (2017) developed an automatic method to detect specific morphosyntactic errors. First a test set is created by taking sentence pairs, and modifying the target sentence to exhibit specific types of error, such as wrong gender of determiners, wrong particles for verbs, wrong transliteration. Then a neural translation model is evaluated by how often it scores the correct translation higher then the faulty translations. The paper compares byte-pair encoding against character-based models for rare and unknown words.



Related Topics

New Publications

  • Weng et al. (2017)
  • Sperber et al. (2017)
  • Feng et al. (2017)
  • Dahlmann et al. (2017)
  • Wang et al. (2017)
  • Zhang et al. (2017)
  • Stahlberg and Byrne (2017)
  • Devlin (2017)
  • Wang et al. (2017)
  • Stahlberg et al. (2017)
  • Lee et al. (2017)
  • Melo (2015)
  • Wu et al. (2017)
  • Costa-jussá et al. (2017)
  • Gupta et al. (2015)
  • Müller et al. (2014)
  • Sennrich et al. (2015)
  • Sennrich et al. (2015)
  • Zhao et al. (2015)
  • Heyman et al. (2017)
  • Carvalho and Nguyen (2017)
  • Carpuat et al. (2017)
  • Denkowski and Neubig (2017)
  • Goto and Tanaka (2017)
  • Morishita et al. (2017)
  • Niehues et al. (2017)
  • Shu and Nakayama (2017)
  • Eger et al. (2016)

Overcoming Low Resource

  • Zoph et al. (2016)
  • Fadaee et al. (2017)
  • Adams et al. (2017)
  • Chen et al. (2017)
  • Zhang and Zong (2016)

System Descriptions (incomplete)

  • Junczys-Dowmunt et al. (2016)
  • Chung et al. (2016)
  • Guasch and Costa-jussà (2016)
  • Sánchez-Cartagena and Toral (2016)
  • Bradbury and Socher (2016)

Analytical Evaluation

  • Burlot and Yvon (2017)
  • Ding et al. (2017)
  • Shi et al. (2016)
  • Belinkov et al. (2017)
  • Sennrich (2016)
  • Hirschmann et al. (2016)
  • Schnober et al. (2016)
  • Guta et al. (2015)
  • Koehn and Knowles (2017)


  • Mallinson et al. (2017)
  • Jakubina and Langlais (2017)
  • Östling and Tiedemann (2017)
  • Yang et al. (2017)
  • Zhang et al. (2017)
  • Marie and Fujita (2017)
  • See et al. (2016)
  • Ammar et al. (2014)
  • Zhang et al. (2016)
  • Pal et al. (2016)
  • Duong et al. (2016)
  • Clark et al. (2014)

Unpublished ArXiv

  • Pezeshki (2015)
  • Williams et al. (2015)
  • Zhang (2015)
  • Wang et al. (2015)
  • Tu et al. (2015)
  • Huang et al. (2015)