Domain Adaptation has been widely studied in traditional statistical machine translation. These techniques have been adapted and new techniques have been applied to neural machine translation models to adapt them to domain or other stylistic aspects.
Adaptation is the main subject of 15 publications.
There is often a domain mismatch between the bulk (or even all) of the training data for a translation and its test data during deployment. There is rich literature in traditional statistical machine translation on this topic
. A common approach for neural models is to first train on all available training data, and then run a few iterations on in-domain data only (Luong and Manning, 2015)
, as already pioneered in neural language model adaption (Ter-Sarkisov et al., 2015)
. Servan et al. (2016)
demonstrate the effectiveness of this adaptation method with small in-domain sets consisting of as little as 500 sentence pairs.
Chu et al. (2017)
argue that given small amount of in-domain data leads to overfitting and suggest to mix in-domain and out-of-domain data during adaption. Freitag and Al-Onaizan (2016)
identify the same problem and suggest to use an ensemble of baseline models and adapted models to avoid overfitting. Peris et al. (2017)
consider alternative training methods for the adaptation phase but do not find consistently better results than the traditional gradient descent training.
Inspired by domain adaptation work in statistical machine translation on sub-sampling, Wang et al. (2017)
augment the canonical neural translation model with a sentence embedding state that allows distinction between in-domain and out-of-domain sentences. It is computed as the sum of all input word representations, and then used as initial state of the decoder. This sentence embedding allows them to distinguish between in-domain and out-of-domain sentences, using the centroids of all in-domain and out-of-domain sentence embeddings, respectively. Out-of-domain sentences that are closer to the in-domain centroid are included in the training data. Chen et al. (2017)
combine the idea of sub-sampling with sentence weighting. They build an in-domain vs. out-of-domain classifier for sentence pairs in the training data, and then use its prediction score to reduce the learning rate for sentence pairs that are out of domain.
Farajian et al. (2017)
show that traditional statistical machine translation outperforms neural machine translation when training general-purpose machine translation systems on a collection data, and then tested on niche domains. The adaptation technique allows neural machine translation to catch up.
A multi-domain model may be trained and informed at run-time about the domain of the input sentence. Kobus et al. (2016)
apply an idea initially proposed by Sennrich et al. (2016)
- to augment input sentences for register with a politeness feature token - to the domain adaptation problem. They add a domain token to each training and test sentence.
If the data contains sentences from multiple domains but the composition is unknown, then automatically detecting different domains (then typically called topics) with methods such as LDA is an option. Zhang et al. (2016)
apply such clustering and then compute for each word a topic distribution vector. It is used in addition to the word embedding to inform the encoder and decoder in a otherwise canonical neural translation model. Instead of word-level topic vectors, Chen et al. (2016)
encode the given domain membership of each sentence as an additional input vector to the conditioning context of word prediction layer.
- Joty et al. (2015)
- Valerio Miceli Barone et al. (2017)