String to Tree Models
The motivation to use linguistic syntax trees on the target side is to support grammatical coherent output and ground restructuring in syntactic properties.
String To Tree is the main subject of 18 publications.
String to tree models differ by the type of rules and linguistic annotation. Galley et al. (2004)
build translation rules that map input phrases to output tree fragments. Contextually richer rules and learning rule probabilities with the EM algorithm may lead to better performance (Galley et al., 2006)
. But also adjusting the parse trees to be able to extract rules for all lexical matches may be important — which requires the introduction of additional nonterminal symbols (Marcu et al., 2006)
or rules with multiple head nodes (Liu et al., 2007)
. Instead of using standard Penn treebank labels for nonterminals, relabeling the constituents may lead to the acquisiton of better rules (Huang and Knight, 2006)
. Since syntactic structure prohibits some phrase pairs that may be learned as syntactic translation rules, leading to less coverage, this may be alleviated by adjusting the rule extraction algorithm (DeNeefe et al., 2007)
DeNeefe et al. (2005)
present an interactive tool to inspect the workings of such syntactic translation models.
Syntax-augmented models (Zollmann et al., 2006)
overcome the restricting of matching the range of rules to syntactic constituent boundaries by merging or otherwise adding constituent labels. Zollmann and Venugopal (2006)
describe an efficient decoding algorithm for this approach.
Almaghout et al. (2011)
use simplified CCG tags that specify only context but not the resulting category as syntactic labels in a string-to-tree model.
When translating into morphologically rich languages who exhibit an increased number of long distance agreement, it may be better to encode morphological properties not in the grammar but in distinct agreement constraints that are checked at the appropriate level in the tree (Williams and Koehn, 2011)
- Braune et al. (2015)
- Sennrich and Haddow (2015)
- Seemann et al. (2015)
- Hassan et al. (2007)
- Weese et al. (2012)
- Williams and Koehn (2012)
- DeNeefe et al. (2010)