Splitting up sentences into word tokens is especially a problem for languages where the writing system does not include spaces between words, such as many Asian languages.
Word Segmentation is the main subject of 16 publications.
Topics in DataParallel Corpora | Comparable Corpora | Dictionaries | Corpus Cleaning | Sentence Alignment | Truecasing | Word Segmentation | Spelling Correction | Sparse Data | Pivot Languages | Domain Adaptation
Last modified on March 10, 2015, at 01:32 AM
Hosted by the University of Edinburgh
Powered by PmWiki | Skin design theundersigned | mod by CarlosAB