Training

Neural machine translation models are typically trained on word predictions as given by sentence pairs from a parallel corpus with cross-entropy loss as an objective function.

Training is the main subject of 55 publications. 26 are discussed here.

Topics in NeuralNetworkModels

Publications

A number of key techniques that have been recently developed have entered the standard repertoire of neural machine translation research. Ranges for the random initialization of weights need to be carefully chosen (Glorot and Bengio, 2010). To avoid overconfidence of the model, label smoothing may be applied, i.e., optimization towards a target distribution that shifts probability mass away from the correct given target word towards other words (Chorowski and Jaitly, 2017). Distributing training over several GPUs creates the problem of synchronizing updates. Chen et al. (2016) compare various methods, including asynchronous updates. Training is made more robust by methods such as drop-out (Srivastava et al., 2014), where during training intervals a number of nodes are randomly masked. To avoid exploding or vanishing gradients during back-propagation over several layers, gradients are typically clipped (Pascanu et al., 2013). Chen et al. (2018) present briefly adaptive gradient clipping. Layer normalization (Lei Ba et al., 2016) has similar motivations, by ensuring that node values are within reasonable bounds.

Adjusting the Learning Rate:

An active topic of research are optimization methods that adjust the learning rate of gradient descent training. Popular methods are Adagrad (Duchi et al., 2011), Adadelta (Zeiler, 2012), and currently Adam (Kingma and Ba, 2015).

Sequence-Level Optimization:

Shen et al. (2016) introduce minimum risk training that allows for sentence level optimization with metrics such as the BLEU score. A set of possible translation is sampled and their relative probability is used to compute the expected loss (probability-weighted BLEU scores of the sampled translations). They show large gains on a Chinese-English task. Neubig (2016) also showed gains when optimizing towards smoothed sentence-level BLEU, using a sample of 20 translations. Hashimoto and Tsuruoka (2019) optimize towards the GLEU score and speed by training by vocabulary reduction. Wiseman and Rush (2016) use a loss function that penalizes the gold standard falling off the beam during training. Ma et al. (2019) also consider the point where the gold standard falls of the beam but record the loss for this initial sequence prediction and then reset the beam to the gold standard at that point. Edunov et al. (2018) compare various word-level and sentence-level optimization techniques but see only small gains by the best-performing sentence-level minimum risk method over alternatives. Xu et al. (2019) use a mix of gold-standard and predicted words in the prefix. They use an alignment component to keep the mixed prefix and the target training sentence in sync. Zhang et al. (2019) gradually shift from matching towards ground truth towards so-called word-level oracle obtained with Gumbel noise and sentence-level oracles obtained by selecting the BLEU-best translation from the n-best list obtained by beam search.

Right-to-Left Training

Several researcher report that translation quality for the right half of the sentence is lower than for the left half of the sentence and attribute this to the exposure bias: during training a correct prefix (also called teacher forcing) is used to make word predictions, while during decoding only the previously predicted words can be used. Wu et al. (2018) show that this imbalance is to a large degree due to linguistic reasons: it happens for right-branching languages like English and Chinese, but the opposite is the case for left-branching languages like Japanese.

Adversarial Training:

Wu et al. (2017) introduce adversarial training to neural machine translation, in which a discriminator is trained alongside a traditional machine translation model to distinguish between machine translation output and human reference translations. The ability to fool the discriminator is used as an additional training objective for the machine translation model. Yang et al. (2018) propose a similar setup, but add a BLEU-based training objective to neural translation model training. Cheng et al. (2018) employ adversarial training to address the problem of robustness, which they identify in the evidence that 70% of translations change when an input word is changed to a synonym. They aim to achieve more robust behavior by adding synthetic training data where one of the input words is replaced with a synonym (neighbor in embedding space) and by using a discriminator that predicts from the encoding of an input sentence if it is an original or an altered source sentence.

Knowledge Distillation:

There are several techniques that change the loss function to not only reward good word predictions that closely match the training data but that also closely match predictions of a previous model, called the teacher model. Khayrallah et al. (2018) use a general domain model as teacher to avoid overfitting to in-domain data during domain adaptation by fine-tuning. Wei et al. (2019) use the models that achieved the best results during training at previous checkpoints to guide training.

Faster Training:

Ott et al. (2018) improve training speed with 16 bit arithmetic and larger batches that lead to less idle time due to less variance in processing batches on different GPU. They scale up training to 128 GPUs.

Benchmarks

Discussion

New Publications

Yuta Nishimura and Katsuhito Sudoh and Graham Neubig and Satoshi Nakamura (2018): Multi-Source Neural Machine Translation with Data Augmentation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT) mentioned in Training and Multilingual Multimodal Multitask
add
@inproceedings{iwslt18-Nishimura-Multi-Source,
author = {Yuta Nishimura and Katsuhito Sudoh and Graham Neubig and Satoshi Nakamura},
title = {Multi-Source Neural Machine Translation with Data Augmentation},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
url = {https://arxiv.org/pdf/1810.06826.pdf},
year = 2018
}
Nishimura et al. (2018)

Adversarial Training

Cheng, Yong and Jiang, Lu and Macherey, Wolfgang (2019): Robust Neural Machine Translation with Doubly Adversarial Inputs, Proceedings of the 57th Conference of the Association for Computational Linguistics
add
@inproceedings{cheng-etal-2019-robust,
author = {Cheng, Yong and Jiang, Lu and Macherey, Wolfgang},
title = {Robust Neural Machine Translation with Doubly Adversarial Inputs},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/P19-1425},
pages = {4324--4333},
year = 2019
}
Cheng et al. (2019)
Sato, Motoki and Suzuki, Jun and Kiyono, Shun (2019): Effective Adversarial Regularization for Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics
add
@inproceedings{sato-etal-2019-effective,
author = {Sato, Motoki and Suzuki, Jun and Kiyono, Shun},
title = {Effective Adversarial Regularization for Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/P19-1020},
pages = {204--210},
year = 2019
}
Sato et al. (2019)
Elliott, Desmond (2018): Adversarial Evaluation of Multimodal Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1329,
author = {Elliott, Desmond},
title = {Adversarial Evaluation of Multimodal Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1329},
pages = {2974--2978},
year = 2018
}
Elliott (2018)

Bandit

Kreutzer, Julia and Khadivi, Shahram and Matusov, Evgeny and Riezler, Stefan (2018): Can Neural Machine Translation be Improved with User Feedback?, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
add
@InProceedings{N18-3012,
author = {Kreutzer, Julia and Khadivi, Shahram and Matusov, Evgeny and Riezler, Stefan},
title = {Can Neural Machine Translation be Improved with User Feedback?},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)},
publisher = {Association for Computational Linguistics},
pages = {92--105},
location = {New Orleans - Louisiana},
url = {http://aclweb.org/anthology/N18-3012},
year = 2018
}
Kreutzer et al. (2018)
Kreutzer, Julia and Uyheng, Joshua and Riezler, Stefan (2018): Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{P18-1165,
author = {Kreutzer, Julia and Uyheng, Joshua and Riezler, Stefan},
title = {Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1777--1788},
location = {Melbourne, Australia},
url = {http://aclweb.org/anthology/P18-1165},
year = 2018
}
Kreutzer et al. (2018)
Kreutzer, Julia and Sokolov, Artem and Riezler, Stefan (2017): Bandit Structured Prediction for Neural Sequence-to-Sequence Learning, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{kreutzer-sokolov-riezler:2017:Long,
author = {Kreutzer, Julia and Sokolov, Artem and Riezler, Stefan},
title = {Bandit Structured Prediction for Neural Sequence-to-Sequence Learning},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1503--1513},
url = {http://aclweb.org/anthology/P17-1138},
year = 2017
}
Kreutzer et al. (2017)

8-Bit / Speed

Quinn, Jerry and Ballesteros, Miguel (2018): Pieces of Eight: 8-bit Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
add
@InProceedings{N18-3014,
author = {Quinn, Jerry and Ballesteros, Miguel},
title = {Pieces of Eight: 8-bit Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)},
publisher = {Association for Computational Linguistics},
pages = {114--120},
location = {New Orleans - Louisiana},
url = {http://aclweb.org/anthology/N18-3014},
year = 2018
}
Quinn and Ballesteros (2018)
Bogoychev, Nikolay and Heafield, Kenneth and Aji, Alham Fikri and Junczys-Dowmunt, Marcin (2018): Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1332,
author = {Bogoychev, Nikolay and Heafield, Kenneth and Aji, Alham Fikri and Junczys-Dowmunt, Marcin},
title = {Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1332},
pages = {2991--2996},
year = 2018
}
Bogoychev et al. (2018)

Training Objective

Shao, Chenze and Chen, Xilin and Feng, Yang (2018): Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1510,
author = {Shao, Chenze and Chen, Xilin and Feng, Yang},
title = {Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1510},
pages = {4778--4784},
year = 2018
}
Shao et al. (2018) - sequence-level
Wieting, John and Berg-Kirkpatrick, Taylor and Gimpel, Kevin and Neubig, Graham (2019): Beyond BLEU:Training Neural Machine Translation with Semantic Similarity, Proceedings of the 57th Conference of the Association for Computational Linguistics
add
@inproceedings{wieting-etal-2019-beyond,
author = {Wieting, John and Berg-Kirkpatrick, Taylor and Gimpel, Kevin and Neubig, Graham},
title = {Beyond {BLEU}:Training Neural Machine Translation with Semantic Similarity},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/P19-1427},
pages = {4344--4355},
year = 2019
}
Wieting et al. (2019) - sentence-level optimization
Petrushkov, Pavel and Khadivi, Shahram and Matusov, Evgeny (2018): Learning from Chunk-based Feedback in Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
add
@InProceedings{P18-2052,
author = {Petrushkov, Pavel and Khadivi, Shahram and Matusov, Evgeny},
title = {Learning from Chunk-based Feedback in Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {326--331},
location = {Melbourne, Australia},
url = {http://aclweb.org/anthology/P18-2052},
year = 2018
}
Petrushkov et al. (2018) - chunk-based feedback
Zheng, Renjie and Ma, Mingbo and Huang, Liang (2018): Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1357,
author = {Zheng, Renjie and Ma, Mingbo and Huang, Liang},
title = {Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1357},
pages = {3188--3197},
year = 2018
}
Zheng et al. (2018) - multi-reference
Wu, Lijun and Tian, Fei and Qin, Tao and Lai, Jianhuang and Liu, Tie-Yan (2018): A Study of Reinforcement Learning for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1397,
author = {Wu, Lijun and Tian, Fei and Qin, Tao and Lai, Jianhuang and Liu, Tie-Yan},
title = {A Study of Reinforcement Learning for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1397},
pages = {3612--3621},
year = 2018
}
Wu et al. (2018) - reinforcement learning

Bidirectional

Zhou, Long and Zhang, Jiajun and Zong, Chengqing (2019): Synchronous Bidirectional Neural Machine Translation, Transactions of the Association for Computational Linguistics
add
@article{zhou-etal-2019-synchronous,
author = {Zhou, Long and Zhang, Jiajun and Zong, Chengqing},
title = {Synchronous Bidirectional Neural Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {7},
url = {https://www.aclweb.org/anthology/Q19-1006},
doi = {10.1162/tacl_a_00256},
pages = {91--105},
year = 2019
}
Zhou et al. (2019)

Context

Chen, Kehai and Wang, Rui and Utiyama, Masao and Sumita, Eiichiro and Zhao, Tiejun (2017): Context-Aware Smoothing for Neural Machine Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
add
@InProceedings{chen-EtAl:2017:I17-1,
author = {Chen, Kehai and Wang, Rui and Utiyama, Masao and Sumita, Eiichiro and Zhao, Tiejun},
title = {Context-Aware Smoothing for Neural Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {November},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
pages = {11--20},
url = {http://www.aclweb.org/anthology/I17-1002},
year = 2017
}
Chen et al. (2017)

Boosting

Zhang, Dakun and Kim, Jungi and Crego, Josep and Senellart, Jean (2017): Boosting Neural Machine Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
add
@inproceedings{zhang-etal-2017-boosting,
author = {Zhang, Dakun and Kim, Jungi and Crego, Josep and Senellart, Jean},
title = {Boosting Neural Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {https://www.aclweb.org/anthology/I17-2046},
pages = {271--276},
year = 2017
}
Zhang et al. (2017)

Dropout

Xiaolin Wang and Masao Utiyama and Eiichiro Sumita (2017): Empirical Study of Dropout Scheme for Neural Machine Translation, Machine Translation Summit XVI
add
@inproceedings{mtsummit2017:Wang,
author = {Xiaolin Wang and Masao Utiyama and Eiichiro Sumita},
title = {Empirical Study of Dropout Scheme for Neural Machine Translation},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Wang et al. (2017)

Tuning

Hao Qin and Takahiro Shinozaki and Kevin Duh (2017): Evolution Strategy based Automatic Tuning of Neural Machine Translation Systems, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{IWSLT2017:Qin,
author = {Hao Qin and Takahiro Shinozaki and Kevin Duh},
title = {Evolution Strategy based Automatic Tuning of Neural Machine Translation Systems},
url = {http://workshop2017.iwslt.org/downloads/O03-2-Paper.pdf},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Tokyo, Japan},
year = 2017
}
Qin et al. (2017)

Automatic Post-Editing

Vu, Thuy-Trang and Haffari, Gholamreza (2018): Automatic Post-Editing of Machine Translation: A Neural Programmer-Interpreter Approach, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1341,
author = {Vu, Thuy-Trang and Haffari, Gholamreza},
title = {Automatic Post-Editing of Machine Translation: A Neural Programmer-Interpreter Approach},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1341},
pages = {3048--3053},
year = 2018
}
Vu and Haffari (2018)

Variational

Zhang, Biao and Xiong, Deyi and su, jinsong and Duan, Hong and Zhang, Min (2016): Variational Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{zhang-EtAl:2016:EMNLP20162,
author = {Zhang, Biao and Xiong, Deyi and su, jinsong and Duan, Hong and Zhang, Min},
title = {Variational Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {521--530},
url = {https://aclweb.org/anthology/D16-1050},
year = 2016
}
Zhang et al. (2016)

Semi-Supervised

Cheng, Yong and Xu, Wei and He, Zhongjun and He, Wei and Wu, Hua and Sun, Maosong and Liu, Yang (2016): Semi-Supervised Learning for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{cheng-EtAl:2016:P16-1,
author = {Cheng, Yong and Xu, Wei and He, Zhongjun and He, Wei and Wu, Hua and Sun, Maosong and Liu, Yang},
title = {Semi-Supervised Learning for Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1965--1974},
url = {http://www.aclweb.org/anthology/P16-1185},
year = 2016
}
Cheng et al. (2016)

Discriminative

Do, Quoc-Khanh and Allauzen, Alexandre and Yvon, François (2015): A Discriminative Training Procedure for Continuous Translation Models, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{do-allauzen-yvon:2015:EMNLP,
author = {Do, Quoc-Khanh and Allauzen, Alexandre and Yvon, Fran\c{c}ois},
title = {A Discriminative Training Procedure for Continuous Translation Models},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1046--1052},
url = {http://aclweb.org/anthology/D15-1121},
year = 2015
}
Do et al. (2015)

Non-Linear

Huang, Shujian and Chen, Huadong and Dai, Xin-Yu and Chen, Jiajun (2015): Non-linear Learning for Statistical Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
add
@InProceedings{huang-EtAl:2015:ACL-IJCNLP,
author = {Huang, Shujian and Chen, Huadong and Dai, Xin-Yu and Chen, Jiajun},
title = {Non-linear Learning for Statistical Machine Translation},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {825--835},
url = {http://www.aclweb.org/anthology/P15-1080},
year = 2015
}
Huang et al. (2015)

Contrastive Noise Estimation

Cherry, Colin (2016): An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
add
@InProceedings{cherry:2016:N16-1,
author = {Cherry, Colin},
title = {An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {41--46},
url = {http://www.aclweb.org/anthology/N16-1006},
year = 2016
}
Cherry (2016)

Distillation

Markus Freitag and Yaser Al-Onaizan and Baskaran Sankaran (2017): Ensemble Distillation for Neural Machine Translation, CoRR
add
@article{DBLP:journals/corr/FreitagAS17,
author = {Markus Freitag and Yaser Al{-}Onaizan and Baskaran Sankaran},
title = {Ensemble Distillation for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1702.01802},
url = {http://arxiv.org/abs/1702.01802},
archiveprefix = {arXiv},
eprint = {1702.01802},
timestamp = {Mon, 13 Aug 2018 16:46:40 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/FreitagAS17},
bibsource = {dblp computer science bibliography, https://dblp.org},
year = 2017
}
Freitag et al. (2017)
Chen, Yun and Liu, Yang and Cheng, Yong and Li, Victor O.K. (2017): A Teacher-Student Framework for Zero-Resource Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) mentioned in Neural Network Models and Training
add
@InProceedings{chen-EtAl:2017:Long5,
author = {Chen, Yun and Liu, Yang and Cheng, Yong and Li, Victor O.K.},
title = {A Teacher-Student Framework for Zero-Resource Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1925--1935},
url = {http://aclweb.org/anthology/P17-1176},
year = 2017
}
Chen et al. (2017)
Kim, Yoon and Rush, Alexander M. (2016): Sequence-Level Knowledge Distillation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{kim-rush-2016-sequence,
author = {Kim, Yoon and Rush, Alexander M.},
title = {Sequence-Level Knowledge Distillation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {nov},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D16-1139},
doi = {10.18653/v1/D16-1139},
pages = {1317--1327},
year = 2016
}
Kim and Rush (2016)
Dakun Zhang and Josep Crego and Jean Senellart (2018): Analyzing Knowledge Distillation in Neural Machine Translation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{iwslt18-Distillation-Zhang,
author = {Dakun Zhang and Josep Crego and Jean Senellart},
title = {Analyzing Knowledge Distillation in Neural Machine Translation},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2018
}
Zhang et al. (2018)
Chen, Yun and Li, Victor O.K. and Cho, Kyunghyun and Bowman, Samuel (2018): A Stable and Effective Learning Strategy for Trainable Greedy Decoding, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1035,
author = {Chen, Yun and Li, Victor O.K. and Cho, Kyunghyun and Bowman, Samuel},
title = {A Stable and Effective Learning Strategy for Trainable Greedy Decoding},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1035},
pages = {380--390},
year = 2018
}
Chen et al. (2018)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions