Search Descriptions

Main Topics

Search Publications





Neural Network Models

Neural network models have received little attention until a recent explosion of research in the 2010s, caused by their success in vision and speech recognition. Such models allow for clustering of related words and flexible use of context.

Neural Network Models and its 14 sub-topics are the main subject of 379 publications.


Basic models to use neural networks for machine translation were already proposed in the 20th century (Waibel et al., 1991), but not seriously pursued due to lack of computational resources. In fact, quite similar models as the ones currently in use date back to that era (Forcada and Ñeco, 1997; Castaño et al., 1997).
Schwenk et al. (2006) introduce neural language models to machine translation (also called "continuous space language models"), and use them in re-ranking, similar to the earlier work in speech recognition.
The first competitive fully neural machine translation system participated in the WMT evaluation campaign in 2015 (Jean et al., 2015), reaching state-of-the-art performance at IWLST 2015 (Luong and Manning, 2015) and WMT 2016 (Sennrich et al., 2016), The same year, Systran (Crego et al., 2016), Google (Wu et al., 2016), and WIPO (Junczys-Dowmunt et al., 2016) reported large-scale deployments.
Neubig (2017) presents a hands-on tutorial on neural machine translation models.
Technical Background: A good introduction to modern neural network research is the textbook Deep Learning (Goodfellow et al., 2016). There is also book on neural network methods applied to the natural language processing in general (Goldberg, 2017).
A number of key techniques that have been recently developed have entered the standard repertoire of neural machine translation research. Training is made more robust by methods such as drop-out (Srivastava et al., 2014), where during training intervals a number of nodes are randomly masked. To avoid exploding or vanishing gradients during back-propagation over several layers, gradients are typically clipped (Pascanu et al., 2013). Layer normalization (Lei Ba et al., 2016) has similar motivations, by ensuring that node values are within reasonable bounds.
An active topic of research are optimization methods that adjust the learning rate of gradient descent training. Popular methods are Adagrad (Duchi et al., 2011), Adadelta (Zeiler, 2012), and currently Adam (Kingma and Ba, 2015).



Related Topics

New Publications

  • Weng et al. (2017)
  • Sperber et al. (2017)
  • Feng et al. (2017)
  • Dahlmann et al. (2017)
  • Wang et al. (2017)
  • Zhang et al. (2017)
  • Stahlberg and Byrne (2017)
  • Devlin (2017)
  • Wang et al. (2017)
  • Stahlberg et al. (2017)
  • Melo (2015)
  • Costa-jussá et al. (2017)
  • Gupta et al. (2015)
  • Müller et al. (2014)
  • Sennrich et al. (2015)
  • Sennrich et al. (2015)
  • Zhao et al. (2015)
  • Heyman et al. (2017)
  • Carvalho and Nguyen (2017)
  • Carpuat et al. (2017)
  • Denkowski and Neubig (2017)
  • Goto and Tanaka (2017)
  • Morishita et al. (2017)
  • Shu and Nakayama (2017)

Overcoming Low Resource

  • Zoph et al. (2016)
  • Fadaee et al. (2017)
  • Adams et al. (2017)
  • Chen et al. (2017)
  • Zhang and Zong (2016)

System Descriptions (incomplete)

  • Junczys-Dowmunt et al. (2016)
  • Chung et al. (2016)
  • Guasch and Costa-jussà (2016)
  • Sánchez-Cartagena and Toral (2016)
  • Bradbury and Socher (2016)


  • Mallinson et al. (2017)
  • Jakubina and Langlais (2017)
  • Östling and Tiedemann (2017)
  • Yang et al. (2017)
  • Zhang et al. (2017)
  • Marie and Fujita (2017)
  • See et al. (2016)
  • Zhang et al. (2016)
  • Pal et al. (2016)
  • Duong et al. (2016)
  • Clark et al. (2014)

Unpublished ArXiv

  • Pezeshki (2015)
  • Williams et al. (2015)
  • Zhang (2015)
  • Wang et al. (2015)
  • Tu et al. (2015)
  • Huang et al. (2015)
  • Gouws et al. (2014)