Neural machine Translation
Statistical Machine Translation
Monolingual data is much more plentiful than parallel data and has been been proven valuable for informing models of fluency and informing the representation of words.
Monolingual Data is the main subject of 20 publications. 11 are discussed here.
Topics in NeuralNetworkModelsNeural Language Models | Attention Model | Training | Inference | Coverage | Vocabulary | Embeddings | Multilingual Word Embeddings | Monolingual Data | Adaptation | Linguistic Annotation | Multilingual Multimodal Multitask | Alternative Architectures | Analysis And Visualization | Neural Components In Statistical Machine Translation
BacktranslationSennrich et al. (2016) back-translate the monolingual data into the input language and use the obtained synthetic parallel corpus as additional training data. Hoang et al. (2018) show that the quality of the machine translation system matters and can be improved by iterative back-translation. Burlot and Yvon (2018) also show that backtranslation quality matters and carry out additional analysis. Edunov et al. (2018) observe gains when back-translating with sampling search, instead of greedy search or beam search. Edunov et al. (2018) show better results with Monte Carlo search to generate the backtranslation data, i.e., randomly selecting word translations based on the predicted probability distribution. Currey et al. (2017) show that in low resource conditions simple copying of target side data to the source side also generates beneficial training data. Fadaee and Monz (2018) see gains with synthetic data generated by forward-translation (also called self-training). They also report gains when subsampling backtranslation data to favor rare or difficult to generate words (words with high loss during training).
Dual LearningHe et al. (2016) use monolingual data in a dual learning setup. Machine translation engines are trained in both directions, and in addition to regular model training from parallel data, monolingual data is translated in a round trip (e to f to e) and evaluated with a language model for language f and reconstruction match back to e as cost function to drive gradient descent updates to the model. Tu et al. (2017) augment the translation model with a reconstruction step. The generated output is translated back into the input language and the training objective is extended to not only include the likelihood of the target sentence but also the likelihood to the reconstructed input sentence. Niu et al. (2018) simultaneously train a model in both translation directions (with the identity of the source language indicated by marker token. Niu et al. (2019) extend this work to roundtrip translation training on monolingual data, allowing the forward translation and the reconstruction step to operate on the same model. They use Gumbel softmax to make the roundtrip differentiable.