Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications





Computer Aided Translation

While statistical machine translation system often provide useful or good enough translations, demands for high quality, publishable translations still require human translators. Tools for translators to improve their productivity can be build using statistical machine translation methods.

Computer Aided Translation is the main subject of 239 publications. 55 are discussed here.

Topics in Introduction

Research Groups | Available Software | Speech Translation | Sign Language | Computer Aided Translation | Other Approaches | Deployment


The proper role for man (which is slow and expensive) and machine (which produces low-quality translations) in a collaboration to produce high-quality translations efficiently is still up for debate. It has long been argued that machines should only play an assisting role when professional translators craft their translation (Kay, 1980; Kay, 1997; Kay, 1997b). The tide has turned towards placing man as post-editor of output of machine translation systems, but this argument is not over yet.
Post-editing - Several studies show increase in productivity by post-editing machine translation output instead of translating unassisted. Skadiņš et al. (2011) show a 30 percent increase for English-Latvian translation with a slight but acceptable degradation in quality. Guerberof (2009) compares the benefits of translation memory matches and machine translation for a subset of sentences that lie within the 80-90% fuzzy match range, showing higher productivity gains and better quality (according the the LISA standard) when using machine translation. Plitt and Masselot (2010) compare post-editing machine translation against unassisted translation in custom web-based tool for a number of language pairs on information technology documents, showing productivity gains of up to 80%. Federico et al. (2012) assess the benefit of offering machine translation output in addition to translation memory matches (marked as such) in a realistic work environment for translator working on legal and information technology documents. They observe productivity gains of up to 20-50%, roughly independent from the original translator speed and segment length, but with different results for different language pairs and domains. Vazquez et al. (2013) find higher productivity for post-editing machine translation than using translation memory matches in a fuzzy match range of 80-95%. Garcia (2011) also measured higher productivity when bilingual native-Chinese students translate between English to Chinese in both directions. Pouliquen et al. (2011) showed for a patent translation task that non-professional post-editors may be able to create high-quality translations, comparable to a professional translation agency. In an experiment on translating English into three languages with a very restricted web interface used by professional translators, Green et al. (2013) carry our more sophisticated statistical analysis using ANOVA and show that post-editing leads to better and faster translations. Bogaert and Sutter (2013) show productivity increases in the range of 20% to 134% for 10 translators in an English-Dutch task on financial European Commission publications, with slightly higher quality. Läubli et al. (2013) stress the importance of testing post-editing machine translation under realistic working conditions, and found lower productivity increased (15–20%) than reported elsewhere, on a German-French translation task. Karamanis et al. (2011) investigate the impact of introducing post-editing machine translation on the work practices of professional translators. Pointing out that trust plays a great role when relying on previously translated segments found in translation memories (e.g., preferring work from close colleagues over freelancers), such trust lacks when assessing output from machine translation systems.
Analysis of post-editing effort - Koponen (2012) examines the relationship between human assessment of post-editing efforts and objective measures such as post-editing time and number of edit operations, finding for instance that segments that require a lot of reordering are perceived as being more difficult, and that long sentences are considered harder, even if only few words changed. Koponen (2013) finds relatively little difference between post-editors when given a choice of output from multiple machine translation systems, albeit in a controlled language setting.
Humans aiding computer - By giving human translators access to the inner workings of machine translation system, they may fix errors at various stages, such as changing the source sentences or its linguistic analysis (Varga and Yokoyama, 2007). Conversely, the input to a translation system may be automatically examined for phrases that are difficult to translate (Mohit and Hwa, 2007).
Interactive machine translation - The TransType project (Langlais et al., 2000; Foster et al., 2002; Bender et al., 2005) developed an interactive translation tool which predicts the most appropriate extension of a partial translation by quick re-translation based on user input (Tomás and Casacuberta, 2006). Word graphs allow for quicker re-translations (Och et al., 2003; Civera et al., 2004; Civera et al., 2006) and confidence metrics indicate how much should be presented to the user as reliable prediction (Ueffing and Ney, 2005; Ueffing and Ney, 2007). Macklovitch (2004) describes how users interacted with the TransType tool. Huang et al. (2015) use this principle to reduce the typing effort for English-Chinese translation by predicting each Chinese character from the user input of the first Latin character of the corresponding Pinyin.
Translation options - In addition to interactive predictions, human translators may be aided by the display of word and phrase translations (Koehn and Haddow, 2009; Koehn, 2009). Showing multiple such translation options may even allow monolingual users to translate from unknown source languages Koehn (2010).
Other assistance - Various types of information may be beneficial for a user of a translation tool, such as suggested translations for idioms, unknown words, and names (Abekawa and Kageura, 2007). Large word-aligned parallel corpora such as the billion word French-English corpus may be superior to traditional terminology databases (Barrière and Isabelle, 2011).
Translation memory - Translation memories are a widely accepted tool for translators. When translating a new sentence, these tools retrieve the most similar source sentence and its translation from what machine translation researchers would call a parallel corpus. Esplá et al. (2011) show that highlighting the mismatch detected by word alignment methods is helpful. Furthermore, the mismatch can be corrected by letting a statistical machine translation model translate it. The fuzzy match from the translation memory may be encoded as a large hierarchical rule (Koehn and Senellart, 2010) or other methods (Dandapat et al., 2011). The alignment between input sentence and the source part of the translation memory sentence pair may be aided using syntactic structures (Zhechev and Genabith, 2010; Zhechev and Genabith, 2010b).
Bilingual concordancer - Professional translators may want to search translation memories not only for fuzzy matches of full source sentences, but also for exact matches of words and phrases. The widely used TransSearch system (Isabelle et al., 1993) returns full sentence pairs, allowing the professional translator to examine which translation is more customary given the sentence context. Translators mostly search for 2-3 word terms (Simard and Macklovitch, 2005), especially highly polysemous adverbials and prepositional phrases (Macklovitch et al., 2008). Translation spotting is the technique to highlight the search term and its translation (Wu et al., 2003; Callison-Burch et al., 2004). Translation spotting may be improved by filtering, merging of variants, and pseudo-relevance feedback (Bourdaillet et al., 2010). Bai et al. (2012) present a normalized correlation method for translation spotting, which overcomes weaknesses of both word alignment-based and association-based translation spotting. Pastor and Alcina (2009) argue for the training of translators in search techniques in monolingual and bilingual corpora.
Usage analysis - Macklovitch et al. (2005) presents a tool that visualizes the user interactions. Human post-editing data may be mined to improve the performance of machine translation, as shown for transfer-based systems (Llitjos et al., 2007). Machine translation may also be used for interactive tutoring tools for foreign language learners (Wang and Seneff, 2007). Macklovitch (1994) shows how alignment methods may be used to spot error in human translations.
Automatic reviewing - Statistical machine translation methods may be also used to detect errors in professional translations. Already Isabelle et al. (1993) suggest added or missing content, consistent use of terminology, and present work on spotting translation of deceptive cognates, word pairs that have similar surface forms but different meaning.



Related Topics

New Publications

  • Forcada et al. (2017)
  • Schaeffer and Carl (2017)
  • Yamamoto (2017)
  • Zapata et al. (2017)
  • Peris et al. (2017)
  • Bulté et al. (2018)
  • Lam et al. (2018)
  • Ortega et al. (2018)
  • Peris and Casacuberta (2018)
  • Grangier and Auli (2018)
  • Aranberri and Pascual (2018)
  • Tezcan and Vandeghinste (2011)
  • Čulo (2014)
  • Azadi and Khadivi (2015)
  • Du et al. (2015)
  • Escartín and Arcedillo (2015)
  • Wäschle and Riezler (2015)
  • Forcada and Sánchez-Martínez (2015)
  • Gupta et al. (2015)
  • Mitchell (2015)
  • Moorkens and O'Brien (2015)
  • Vanallemeersch and Vandeghinste (2015)
  • Vela and Genabith (2015)
  • Hofmann (2015)
  • Hokamp and Liu (2015)
  • Ilao et al. (2015)
  • Moorkens et al. (2015)
  • Yamada (2015)
  • Escartín and Arcedillo (2015)
  • Xinhui (2015)
  • Yang et al. (2016)
  • Baisa et al. (2015)
  • Chatzitheodoroou (2015)
  • Nayek et al. (2015)