While the attentional sequence-to-sequence model is currently the dominant architecture for neural machine translation, other architectures have been explored.
Alternative Architectures is the main subject of 9 publications.
Topics in NeuralNetworkModelsNeural Language Models | Attention Model | Inference | Coverage | Vocabulary | Embeddings | Multilingual Word Embeddings | Monolingual Data | Adaptation | Linguistic Annotation | Multilingual Multimodal Multitask | Alternative Architectures | Analysis And Visualization | Neural Components In Statistical Machine Translation
Self Attention (Transformer):Vaswani et al. (2017) replace the recurrent neural networks used in attentional sequence-to-sequence models with multiple self-attention layers (called Transformer), both for the encoder as well as the decoder. Chen et al. (2018) compare different configurations of Transformer or recurrent neural networks in the encoder and decoder, and report that many of the different quality gains are due to a handful of training tricks, and show better results with a Transformer encoder and a RNN decoder. Dehghani et al. (2018) propose a variant, called Universal Transformers, that do not use a fixed number of processing layers, but a arbitrary long loop through a single processing layer.
Adversarial Training:Wu et al. (2017) introduce adversarial training to neural machine translation, in which a discriminator is trained alongside a traditional machine translation model to distinguish between machine translation output and human reference translations. The ability to fool the discriminator is used as an additional training objective for the machine translation model. Yang et al. (2018) propose a similar setup, but add a BLEU-based training objective to neural translation model training.