Word Alignment Based on Co-Occurence
While most current work on word alignment is model-based, more heuristic approaches are based on co-occurence statistics.
Word Alignment Based On Coocurrence is the main subject of 20 publications. 16 are discussed here.
Early work in word alignment focused on co-occurence statistics to find evidence for word associations (Kaji and Aizono, 1996)
. These methods may find evidence for the alignment of a word to multiple translations, a problem called indirect association, which may be overcome with enforcing one-to-one alignments (Melamed, 1996)
Kumano and Hirakawa (1994)
augment this method with an existing bilingual dictionary. Sato and Nakanishi (1998)
use a maximum entropy model for word associations. Ker and Chang (1996)
groups words together into sense classes from a thesaurus to improve word alignment accuracy.
Co-occurence counts may also be used for phrase alignment, although this typically requires more efficient data structures for storing all phrases (Cromieres, 2006)
. Chatterjee and Agrawal (2006)
extends a recency vector approach (Fung and McKeown, 1994)
with additional constraints. Lardilleux and Lepage (2008)
iteratively match the longest common subsequences from sentence pairs and align the remainder.
Heuristic word alignment methods have may be extended into iterative algorithms, for instance the competitive linking algorithm by Melamed (1995)
; Melamed (1996)
; Melamed (1997)
; Melamed (2000)
or bilingual bracketing (Wu, 1997)
. Tufiş (2002)
extends a simple co-occurence method to align words.
Monolingual collocation may also be helpful for word alignment: Liu et al. (2010)
use collocation statistics help group words into cepts.
- Bai et al. (2009)
- Moore (2005)
- Tiedemann (2009)
- Melamed (1995)