Search Descriptions

General

Neural machine Translation

Statistical Machine Translation

Search Publications


author

title

other

year

Alternative Architectures

While the attentional sequence-to-sequence model is currently the dominant architecture for neural machine translation, other architectures have been explored.

Alternative Architectures is the main subject of 40 publications. 9 are discussed here.

Publications

Kalchbrenner and Blunsom (2013) build a comprehensive machine translation model by first encoding the source sentence with a convolutional neural network, and then generate the target sentence by reversing the process. A refinement of this was proposed by Gehring et al. (2017) who use multiple convolutional layers in the encoder and the decoder that do not reduce the length of the encoded sequence but incorporate wider context with each layer.

Self Attention (Transformer):

Vaswani et al. (2017) replace the recurrent neural networks used in attentional sequence-to-sequence models with multiple self-attention layers (called Transformer), both for the encoder as well as the decoder. Chen et al. (2018) compare different configurations of Transformer or recurrent neural networks in the encoder and decoder, and report that many of the different quality gains are due to a handful of training tricks, and show better results with a Transformer encoder and a RNN decoder. Dehghani et al. (2019) propose a variant, called Universal Transformers, that do not use a fixed number of processing layers, but a arbitrary long loop through a single processing layer.

Document Context:

Maruf et al. (2018) consider the entire source document as context when translating a sentence. Attention is computed over all input sentences and the sentences are weighted accordingly. Miculicich et al. (2018) extend this work with hierarchical attention which first computes attention over sentences and then over words. Due to computational problems, this is limited to a window of surrounding sentences. Maruf et al. (2019) also use hierarchical attention but compute sentence-level attention over the entire document and filters out the most relevant sentences before extending attention over words. A gate distinguishes between words in the source sentence and words in the context sentences.

Benchmarks

Discussion

Related Topics

New Publications

  • Hao et al. (2019)
  • Mino et al. (2017)
  • Kuang and Xiong (2018)
  • Wang et al. (2018)
  • Wang et al. (2018)
  • Di Gangi and Federico (2018)
  • Pappas et al. (2018)
  • Alkhouli et al. (2018)
  • Libovick\'y et al. (2018)
  • Tu et al. (2018)
  • Gu et al. (2018)
  • Huang et al. (2018)
  • Kaiser et al. (2018)
  • Unanue et al. (2018)
  • Maruf and Haffari (2018)
  • Kuang et al. (2018)
  • Zhang et al. (2018)
  • Domhan (2018)
  • Wang et al. (2018)
  • Zheng et al. (2018)
  • Wang et al. (2018)
  • Bahar et al. (2018)
  • Libovick\'y and Helcl (2018)
  • Bapna et al. (2018)
  • Cao and Xiong (2018)
  • Dou et al. (2018)
  • Tang et al. (2018)
  • Zhang et al. (2018)

Document-Level

  • Jehl and Riezler (2018)
  • Zhang et al. (2018)

End-to-end

  • Pouget-Abadie et al. (2014)
  • Hill et al. (2014)

Actions

Download

Contribute