Training machine translation for multiple language pairs leads to more generalization in the models, and helps low-resource language pairs. Moreover, the input to machine translation may also be enriched by information from other modalities, such as images or speech. And finally, machine translation may just be one task of an integrated neural network that performs other language processing tasks.
Multilingual Multimodal Multitask is the main subject of 18 publications.
Topics in NeuralNetworkModelsNeural Language Models | Attention Model | Inference | Coverage | Vocabulary | Embeddings | Multilingual Word Embeddings | Monolingual Data | Adaptation | Linguistic Annotation | Multilingual Multimodal Multitask | Alternative Architectures | Analysis And Visualization | Neural Components In Statistical Machine Translation
Multi-language training:Johnson et al. (2016) explore how well a single canonical neural translation model is able to learn from multiple to multiple languages, by simultaneously training on on parallel corpora for several language pairs. They show small benefits for several input languages with the same output languages, mixed results for translating into multiple output languages (indicated by an additional input language token). The most interesting result is the ability for such a model to translate in language directions for which no parallel corpus is provided, thus demonstrating that some interlingual meaning representation is learned, although less well than using traditional pivot methods. Firat et al. (2016) support multi-language input and output by training language-specific encoders and decoders and a shared attention mechanism.
Pre-trained word embeddings:Di Gangi and Federico (2017) do not observe improvement when using monolingual word embeddings in a gated network that trains additional word embeddings purely on parallel data. Abdou et al. (2017) showed worse performance on a WMT news translation task with pre-trained word embeddings. They argue, as Hill et al. (2014); Hill et al. (2017) did previously, that neural machine translation requires word embeddings that are based on semantic similarity of words (teacher and professor) rather than other kinds of relatedness (teacher and student), and demonstrate that word embeddings trained for translation score better on standard semantic similarity tasks. Artetxe et al. (2018) use monolingually trained word embeddings in a neural machine translation system, without using any parallel corpus. Qi et al. (2018) do show gains with pre-trained word embeddings in low resource conditions, but that benefits decrease with larger data sizes.
Multi-task training:Niehues and Cho (2017) tackle multiple tasks (translation, part-of-speech tagging, and named entity identification) with shared components of a sequence to sequence model, showing that training on several tasks improves performance on each individual task. Zaremoodi and Haffari (2018) refine this approach with adversarial training that enforces task-independent representation in intermediate layers, and apply to to joint training with syntactic and semantic parsing.