Paraphrases play an interesting role in machine translation. Since translators have often the choice between meaning-equivalent wordings, there may be gains from explicitly modelling paraphrasing in machine translation.
Paraphrasing is the main subject of 43 publications.
Early work on automatically extracting paraphrases looked at multiple translations of foreign texts (Barzilay and McKeown, 2001)
Paraphrases may also be extracted from parallel corpora (Bannard and Callison-Burch, 2005)
, which are available in larger quantities,
or even from monolingual non-parallel corpora (Marton et al., 2009)
, which are available in much larger quantities. Ganitkevitch et al. (2013)
exploit parallel data by detecting as paraphrases terms and phrases that map to the same foreign phrase, and intersecting these across several languages to build a large paraphrase database (PPDB). Pavlick et al. (2015)
improved this database with use of word embeddings and further enriched it with additional about the relationship between paraphrased terms.
For more accuracy dependency structure may be exploited (Hwang et al., 2008)
. Nelken and Shieber (2006)
present methods based on word overlap and tf/idf for the extraction of paraphrased sentences from a monolingual corpus.
Paraphrasing phrase translation entries (Callison-Burch et al., 2006
; Marton et al., 2009)
or the parallel corpus (Nakov and Hearst, 2007)
may generate more robust translation models.
Paraphrasing has been employed to improve reference translation for use in evaluation metrics. The number of reference translations may be increased by paraphrasing (Finch et al., 2004
; Owczarzak et al., 2006)
. The same idea is behind changing the reference translation by paraphrasing to make it more similar to the reference (Kauchak and Barzilay, 2006)
, or to attempt to paraphrase unmatched words in the system output (Zhou et al., 2006)
Marton et al. (2011)
uses heuristics to remove antonyms (words with opposite meaning) from paraphrases gathered from parallel corpora or with distributional similarity methods.
uses context information to disambiguate paraphrases in sentence context, by reinforcing vector models of paraphrasing along dimensions that are present in the sentence context of an instance.
- Xu et al. (2016)
- Liu and Hwa (2016)
- Suzuki et al. (2017)
- Seraj et al. (2015)
- Marton et al. (2009)
- Utiyama et al. (2011)
- Marton (2010)
- Apidianaki (2012)
- Madnani et al. (2012)
- Martzoukos and Monz (2012)
- He et al. (2012)
- Bond et al. (2008)
- Marton et al. (2009)
- Max (2009)
- Du et al. (2010)
- Max (2010)
- Resnik et al. (2010)
- Kuhn et al. (2010)
- Gangadharaiah et al. (2010)
- Kashioka (2005)
- Denkowski et al. (2010)
- Jiang et al. (2011)
- Gao and Vogel (2011)
- Bouamor et al. (2011)
- Chen and Dolan (2011)
- Nakov and Ng (2011)
- Thadani and McKeown (2011)
- Hajlaoui and Boitet (2004)
- Chang and Kung (2007)
- Zhao et al. (2008)