Neural machine translation models operate on high-dimensional representation at any stage of processing. Their abilities and failures are hard to determine from their millions of parameters. To better understand the behavior of neural machine translation models, researchers compared performance to phrase-based systems, explored linguistic abilities of the models, and developed method to visualize their processing.
Analysis And Visualization is the main subject of 16 publications.
Topics in NeuralNetworkModelsNeural Language Models | Attention Model | Coverage | Vocabulary | Embeddings | Multilingual Word Embeddings | Monolingual Data | Adaptation | Linguistic Annotation | Multilingual Multimodal Multitask | Alternative Architectures | Analysis And Visualization | Neural Components In Statistical Machine Translation
Detailed quality assessments:Bentivogli et al. (2016) considered different linguistic categories when comparing the performance of neural vs. statistical machine translation systems for English-German. Toral and Sánchez-Cartagena (2017) compared different broad aspects such as fluency and reordering for nine language directions. Sennrich (2017) developed an automatic method to detect specific morphosyntactic errors. First a test set is created by taking sentence pairs, and modifying the target sentence to exhibit specific types of error, such as wrong gender of determiners, wrong particles for verbs, wrong transliteration. Then a neural translation model is evaluated by how often it scores the correct translation higher then the faulty translations. The paper compares byte-pair encoding against character-based models for rare and unknown words.
Role of individual neurons:Shi et al. (2016) correlated activation values of specific nodes in the state of a simple LSTM encoder-decoder translation model (without attention) with the length of the output and discovered nodes that count the number of words to ensure proper output length.
Predicting properties from internal representations: To probe intermediate representations, such as encoder and decoder states, a strategy is to use them as input to a classifier that predicts specific, mostly linguistic, properties.Belinkov et al. (2017) predict the part of speech and morphological features of words linked to encoder and decoder states, showing better performance of character-based models, but not much difference for deeper layers.