Word Alignment with Linguistic Annotation
As with statistical machine translation models, most word alignment methods view sentences simply as strings of unique tokens, but linguistic annotation may be exploited to improve word alignment quality.
Word Alignment With Linguistic Annotation is the main subject of 25 publications.
Word alignment methods have been extended to exploit part-of-speech information (Chang and Chen, 1994
; Tiedemann, 2003)
in constraint methods (Tiedemann, 2004)
, translation divergences (Dorr et al., 2002)
, compositionality constraints (Simard and Langlais, 2003)
, and syntactic constraints (Cherry and Lin, 2003
; Lin and Cherry, 2003
; Zhao and Vogel, 2003)
Fraser and Marcu (2005)
improve word alignments by stemming words in input and output language, thus generalizing over morphological variants. Syntactic constraints may derive from formal criteria of obtaining parallel tree structures, such as the ITG constraint, or from syntactic relationships between words on either side (Cherry and Lin, 2006)
Linguistic constraints may be modeled as priors in the generative model (Deng and Gao, 2007)
proposes a number of hand-crafted linguistic rules to improve word alignments obtained with traditional statistical methods.
Riesa et al. (2011)
use syntactic features in a discriminative word aligner and stress that guidance from the parse structure makes search during training more manageable.
- Huang and Yates (2014)
- Burlot and Yvon (2015)
- Kondo et al. (2013)
- Nakazawa and Kurohashi (2008)
- Gispert et al. (2004)
- Søgaard and Kuhn (2009)
- Søgaard (2009)
- Luong and Kan (2010)
- Lee et al. (2010)
- Huang and Choi (2000)
- Ozdowska (2005)
- Kondrak (2005)