Attention Model

The currently dominant model in neural machine translation is the sequence-to-sequence model with attention.

Attention Model is the main subject of 31 publications. 8 are discussed here.

Topics in NeuralNetworkModels

Publications

The attention model has its roots in a sequence-to-sequence model.

Cho, Kyunghyun and van Merrienboer, Bart and Bahdanau, Dzmitry and Bengio, Yoshua (2014): On the Properties of Neural Machine Translation: Encoder--Decoder Approaches, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

Cho et al. (2014) use recurrent neural networks for the approach.

Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V. (2014): Sequence to Sequence Learning with Neural Networks, Advances in Neural Information Processing Systems 27

Sutskever et al. (2014) use a LSTM (long short-term memory) network and reverse the order of the source sentence before decoding.

The seminal work by

Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, ICLR

Bahdanau et al. (2015) adds an alignment model (so called "attention mechanism") to link generated output words to source words, which includes conditioning on the hidden state that produced the preceding target word. Source words are represented by the two hidden states of recurrent neural networks that process the source sentence left-to-right and right-to-left.

Luong, Thang and Pham, Hieu and Manning, Christopher D. (2015): Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Luong et al. (2015) propose variants to the attention mechanism (which they call "global" attention model) and also a hard-constraint attention model ("local" attention model) which is restricted to a Gaussian distribution around a specific input word.

To explicitly model the trade-off between source context (the input words) and target context (the already produced target words),

Zhaopeng Tu and Yang Liu and Zhengdong Lu and Xiaohua Liu and Hang Li (2016): Context Gates for Neural Machine Translation, CoRR

Tu et al. (2016) introduce an interpolation weight (called "context gate") that scales the impact of the (a) source context state and (b) the previous hidden state and the last word when predicting the next hidden state in the decoder.

Deep Models:

There are several various to add layers to the encoder and the decoder of he neural translation model.

Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Lukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian and Nishant Patil and Wei Wang and Cliff Young and Jason Smith and Jason Riesa and Alex Rudnick and Oriol Vinyals and Greg Corrado and Macduff Hughes and Jeffrey Dean (2016): Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, CoRR

mentioned in Neural Network Models and Attention Model

Wu et al. (2016) first use the traditional bidirectional recurrent neural networks to compute input word representations and then refine them with several stacked recurrent layers.

Zhou, Jie and Cao, Ying and Wang, Xuguang and Li, Peng and Xu, Wei (2016): Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation, Transactions of the Association for Computational Linguistics

Zhou et al. (2016) alternate between forward and backward recurrent layers.

Miceli Barone, Antonio Valerio and Helcl, Jindřich and Sennrich, Rico and Haddow, Barry and Birch, Alexandra (2017): Deep architectures for Neural Machine Translation, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper

Barone et al. (2017) show good results with 4 stacks and 2 deep transitions each for encoder and decoder, as well as alternating networks for the encoder. There are a large number of variations (including the use of skip connections, the choice of LSTM vs. GRU, number of layers of any type) that still need to be explored empirical for various data conditions.

Benchmarks

Discussion

New Publications

Indurthi, Sathish Reddy and Chung, Insoo and Kim, Sangha (2019): Look Harder: A Neural Machine Translation Model with Hard Attention, Proceedings of the 57th Conference of the Association for Computational Linguistics
add
@inproceedings{indurthi-etal-2019-look,
author = {Indurthi, Sathish Reddy and Chung, Insoo and Kim, Sangha},
title = {Look Harder: A Neural Machine Translation Model with Hard Attention},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/P19-1290},
pages = {3037--3043},
year = 2019
}
Indurthi et al. (2019)
Mino, Hideya and Utiyama, Masao and Sumita, Eiichiro and Tokunaga, Takenobu (2017): Key-value Attention Mechanism for Neural Machine Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
add
@inproceedings{mino-etal-2017-key,
author = {Mino, Hideya and Utiyama, Masao and Sumita, Eiichiro and Tokunaga, Takenobu},
title = {Key-value Attention Mechanism for Neural Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {https://www.aclweb.org/anthology/I17-2049},
pages = {290--295},
year = 2017
}
Mino et al. (2017)
Samee Ibraheem and Nicholas Altieri and John DeNero (2017): Learning an Interactive Attention Policy for Neural Machine Translation, Machine Translation Summit XVI
add
@inproceedings{mtsummit2017:Ibraheem,
author = {Samee Ibraheem and Nicholas Altieri and John DeNero},
title = {Learning an Interactive Attention Policy for Neural Machine Translation},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Ibraheem et al. (2017)
Matïss Rikters and Mark Fishel (2017): Confidence through Attention, Machine Translation Summit XVI
add
@inproceedings{mtsummit2017:Rikters,
author = {Mat{\"i}ss Rikters and Mark Fishel},
title = {Confidence through Attention},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Rikters and Fishel (2017)
Li, Xintong and Liu, Lemao and Tu, Zhaopeng and Shi, Shuming and Meng, Max (2018): Target Foresight Based Attention for Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
add
@InProceedings{N18-1125,
author = {Li, Xintong and Liu, Lemao and Tu, Zhaopeng and Shi, Shuming and Meng, Max},
title = {Target Foresight Based Attention for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1380--1390},
location = {New Orleans, Louisiana},
url = {http://aclweb.org/anthology/N18-1125},
year = 2018
}
Li et al. (2018)
Malaviya, Chaitanya and Ferreira, Pedro and Martins, André F. T. (2018): Sparse and Constrained Attention for Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
add
@InProceedings{P18-2059,
author = {Malaviya, Chaitanya and Ferreira, Pedro and Martins, Andr{\'e} F. T.},
title = {Sparse and Constrained Attention for Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {370--376},
location = {Melbourne, Australia},
url = {http://aclweb.org/anthology/P18-2059},
year = 2018
}
Malaviya et al. (2018)
Shankar, Shiv and Garg, Siddhant and Sarawagi, Sunita (2018): Surprisingly Easy Hard-Attention for Sequence to Sequence Learning, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1065,
author = {Shankar, Shiv and Garg, Siddhant and Sarawagi, Sunita},
title = {Surprisingly Easy Hard-Attention for Sequence to Sequence Learning},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1065},
pages = {640--645},
year = 2018
}
Shankar et al. (2018)
Lin, Junyang and Sun, Xu and Ren, Xuancheng and Li, Muyu and Su, Qi (2018): Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1331,
author = {Lin, Junyang and Sun, Xu and Ren, Xuancheng and Li, Muyu and Su, Qi},
title = {Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1331},
pages = {2985--2990},
year = 2018
}
Lin et al. (2018)
Yang, Baosong and Wong, Derek F. and Xiao, Tong and Chao, Lidia S. and Zhu, Jingbo (2017): Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{D17-1151,
author = {Yang, Baosong and Wong, Derek F. and Xiao, Tong and Chao, Lidia S. and Zhu, Jingbo},
title = {Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1443--1452},
location = {Copenhagen, Denmark},
url = {http://aclweb.org/anthology/D17-1151},
year = 2017
}
Yang et al. (2017)

Attention Model

Zhang, Jinchao and Wang, Mingxuan and Liu, Qun and Zhou, Jie (2017): Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{zhang-EtAl:2017:Long3,
author = {Zhang, Jinchao and Wang, Mingxuan and Liu, Qun and Zhou, Jie},
title = {Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1524--1534},
url = {http://aclweb.org/anthology/P17-1140},
year = 2017
}
Zhang et al. (2017)
Yu, Lei and Buys, Jan and Blunsom, Phil (2016): Online Segment to Segment Neural Transduction, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{yu-buys-blunsom:2016:EMNLP2016,
author = {Yu, Lei and Buys, Jan and Blunsom, Phil},
title = {Online Segment to Segment Neural Transduction},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1307--1316},
url = {https://aclweb.org/anthology/D16-1138},
year = 2016
}
Yu et al. (2016)
Huang, Po-Yao and Liu, Frederick and Shiang, Sz-Rung and Oh, Jean and Dyer, Chris (2016): Attention-based Multimodal Neural Machine Translation, Proceedings of the First Conference on Machine Translation
add
@InProceedings{huang-EtAl:2016:WMT,
author = {Huang, Po-Yao and Liu, Frederick and Shiang, Sz-Rung and Oh, Jean and Dyer, Chris},
title = {Attention-based Multimodal Neural Machine Translation},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {639--645},
url = {http://www.aclweb.org/anthology/W/W16/W16-2360},
year = 2016
}
Huang et al. (2016)
Mi, Haitao and Sankaran, Baskaran and Wang, Zhiguo and Ittycheriah, Abe (2016): Coverage Embedding Models for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{mi-EtAl:2016:EMNLP2016,
author = {Mi, Haitao and Sankaran, Baskaran and Wang, Zhiguo and Ittycheriah, Abe},
title = {Coverage Embedding Models for Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {955--960},
url = {https://aclweb.org/anthology/D16-1096},
year = 2016
}
Mi et al. (2016)
Calixto, Iacer and Stein, Daniel and Matusov, Evgeny and Lohar, Pintu and Castilho, Sheila and Way, Andy (2017): Using Images to Improve Machine-Translating E-Commerce Product Listings., Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
add
@InProceedings{calixto-EtAl:2017:EACLshort,
author = {Calixto, Iacer and Stein, Daniel and Matusov, Evgeny and Lohar, Pintu and Castilho, Sheila and Way, Andy},
title = {Using Images to Improve Machine-Translating E-Commerce Product Listings.},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {637--643},
url = {http://www.aclweb.org/anthology/E17-2101},
year = 2017
}
Calixto et al. (2017)
Press, Ofir and Wolf, Lior (2017): Using the Output Embedding to Improve Language Models, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
add
@InProceedings{press-wolf:2017:EACLshort,
author = {Press, Ofir and Wolf, Lior},
title = {Using the Output Embedding to Improve Language Models},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {157--163},
url = {http://www.aclweb.org/anthology/E17-2025},
year = 2017
}
Press and Wolf (2017)
Yang, Zichao and Hu, Zhiting and Deng, Yuntian and Dyer, Chris and Smola, Alex (2017): Neural Machine Translation with Recurrent Attention Modeling, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
add
@InProceedings{yang-EtAl:2017:EACLshort1,
author = {Yang, Zichao and Hu, Zhiting and Deng, Yuntian and Dyer, Chris and Smola, Alex},
title = {Neural Machine Translation with Recurrent Attention Modeling},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {383--387},
url = {http://www.aclweb.org/anthology/E17-2061},
year = 2017
}
Yang et al. (2017)

Advanced Modelling

Tu, Zhaopeng and Liu, Yang and Lu, Zhengdong and Liu, Xiaohua and Li, Hang (2017): Context Gates for Neural Machine Translation, Transactions of the Association for Computational Linguistics
add
@article{TACL948,
author = {Tu, Zhaopeng and Liu, Yang and Lu, Zhengdong and Liu, Xiaohua and Li, Hang },
title = {Context Gates for Neural Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {5},
keywords = {{}},
issn = {2307-387X},
url = {https://transacl.org/ojs/index.php/tacl/article/view/948},
pages = {87--99},
year = 2017
}
Tu et al. (2017)
Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann (2017): A Convolutional Encoder Model for Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{gehring-EtAl:2017:Long,
author = {Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann},
title = {A Convolutional Encoder Model for Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {123--135},
url = {http://aclweb.org/anthology/P17-1012},
year = 2017
}
Gehring et al. (2017)
Oda, Yusuke and Arthur, Philip and Neubig, Graham and Yoshino, Koichiro and Nakamura, Satoshi (2017): Neural Machine Translation via Binary Code Prediction, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{oda-EtAl:2017:Long,
author = {Oda, Yusuke and Arthur, Philip and Neubig, Graham and Yoshino, Koichiro and Nakamura, Satoshi},
title = {Neural Machine Translation via Binary Code Prediction},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {850--860},
url = {http://aclweb.org/anthology/P17-1079},
year = 2017
}
Oda et al. (2017)
Wang, Mingxuan and Lu, Zhengdong and Zhou, Jie and Liu, Qun (2017): Deep Neural Machine Translation with Linear Associative Unit, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{wang-EtAl:2017:Long1,
author = {Wang, Mingxuan and Lu, Zhengdong and Zhou, Jie and Liu, Qun},
title = {Deep Neural Machine Translation with Linear Associative Unit},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {136--145},
url = {http://aclweb.org/anthology/P17-1013},
year = 2017
}
Wang et al. (2017)
Wang, Mingxuan and Lu, Zhengdong and Li, Hang and Liu, Qun (2016): Memory-enhanced Decoder for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{wang-EtAl:2016:EMNLP20161,
author = {Wang, Mingxuan and Lu, Zhengdong and Li, Hang and Liu, Qun},
title = {Memory-enhanced Decoder for Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {278--286},
url = {https://aclweb.org/anthology/D16-1027},
year = 2016
}
Wang et al. (2016)
Sountsov, Pavel and Sarawagi, Sunita (2016): Length bias in Encoder Decoder Models and a Case for Global Conditioning, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{sountsov-sarawagi:2016:EMNLP2016,
author = {Sountsov, Pavel and Sarawagi, Sunita},
title = {Length bias in Encoder Decoder Models and a Case for Global Conditioning},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1516--1525},
url = {https://aclweb.org/anthology/D16-1158},
year = 2016
}
Sountsov and Sarawagi (2016)
Shu, Raphael and Miura, Akiva (2016): Residual Stacking of RNNs for Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
add
@InProceedings{shu-miura:2016:WAT2016,
author = {Shu, Raphael and Miura, Akiva},
title = {Residual Stacking of RNNs for Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {223--229},
url = {http://aclweb.org/anthology/W16-4623},
year = 2016
}
Shu and Miura (2016)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Attention Model

Publications

Benchmarks

Discussion

Related Topics

New Publications

Attention Model

Advanced Modelling