Moses
statistical
machine translation
system

Training Step 5: Extract Phrases

In the phrase extraction step, all phrases are dumped into one big file. Here is the top of that file:

 > head model/extract 
 wiederaufnahme ||| resumption ||| 0-0
 wiederaufnahme der ||| resumption of the ||| 0-0 1-1 1-2
 wiederaufnahme der sitzungsperiode ||| resumption of the session ||| 0-0 1-1 1-2 2-3
 der ||| of the ||| 0-0 0-1
 der sitzungsperiode ||| of the session ||| 0-0 0-1 1-2
 sitzungsperiode ||| session ||| 0-0
 ich ||| i ||| 0-0
 ich erklaere ||| i declare ||| 0-0 1-1
 erklaere ||| declare ||| 0-0
 sitzungsperiode ||| session ||| 0-0

The content of this file is for each line: foreign phrase, English phrase, and alignment points. Alignment points are pairs (foreign,english). Also, an inverted alignment file extract.inv is generated, and if the lexicalized reordering model is trained (default), a reordering file extract.o.

Edit - History - Print
Page last modified on July 14, 2006, at 01:15 AM