mkcls: Training of word classes.

mkcls is a tool to train word classes by using a maximum-likelihood-criterion. The resulting word classes are especially suited for language models or statistical translation models. The program mkcls was written by Franz Josef Och.

Usage of mkcls:

mkcls [-nnum] [-ptrain] [-Vfile] opt

-V output classes

-n number of optimization runs (Default: 1); larger number => better results

-p filename of training corpus (Default: 'train')

Example:

mkcls -c80 -n10 -pkorpus -Vkats opt

(generates 80 classes for the corpus 'in' and writes the classes in 'out')

In order to compile mkcls you may need:

It is released under the GNU Public License (GPL).

Citation:

  • Franz Josef Och: »Maximum-Likelihood-Schätzung von Wortkategorien mit Verfahren der kombinatorischen Optimierung« Studienarbeit, Universität Erlangen-Nürnberg, Germany,1995.
  • Franz Josef Och: »An Efficient Method for Determining Bilingual Word Classes«; pp. 71-76, Ninth Conf. of the Europ. Chapter of the Association for Computational Linguistics; EACL'99, Bergen, Norway, June 1999.
  • Source code:

    newest version on code.google.com NEW

    mkcls.2003-09-30.tar.gz

    mkcls.2001-01-12.tar.gz (old version)


    Last updated: 12 January 2001, och@isi.edu