@inproceedings{elkishky_xlent_2021, author = {El-Kishky, Ahmed and Renduchintala, Adi and Cross, James and Guzm{\'a}n, Francisco and Koehn, Philipp}, booktitle = {Preprint}, title = {{XLEnt}: Mining Cross-lingual Entities with Lexical-Semantic-Phonetic Word Alignment}, year = {2021} address = "Online", }
Entity_Type \tab Source_Entity \tab Target_Entity \tab Frequency
To obtain aligned entity pairs from non-English to non-English languages, one should simply join two English-aligned entity pair lists on the English entity (Source_Entity).
The annotated sentence pairs use the standard BIO annotation as illustrated in the following example:
I will continue to work as Tony Blair did very closely with the American administration . O O O O O O B-PERSON I-PERSON O O O O O B-NORP O O
Continueriò a lavorare come Tony Blair ha fatto da vicino con l' amministrazione americana . O O O O B-PERSON I-PERSON O O O O O O O B-NORP O
en-af (3.8M) en-am (1.4M) en-an (464K) en-ar (134M) en-arz (444K) en-as (72K) en-ast (14M) en-az (5.7M) en-ba (692K) en-bar (408K) en-be (11M) en-bg (55M) en-bn (37M) en-br (2.0M) en-bs (4.6M) en-ca (53M) en-cb (88K) en-ceb (3.0M) en-cs (69M) en-cx (852K) |
en-cy (3.8M) en-da (50M) en-de (72M) en-el (62M) en-eo (41M) en-es (171M) en-et (31M) en-eu (13M) en-fa (39M) en-ff (96K) en-fi (47M) en-fo (564K) en-fr (148M) en-fy (4.4M) en-ga (2.6M) en-gd (932K) en-gl (30M) en-gu (644K) en-ha (6.5M) en-he (61M) |
en-hi (48M) en-hr (52M) en-ht (1.9M) en-hu (65M) en-hy (4.7M) en-id (73M) en-ig (1.0M) en-ilo (1.3M) en-io (256K) en-is (15M) en-it (105M) en-ja (138M) en-jv (3.9M) en-ka (5.9M) en-kk (3.6M) en-km (2.1M) en-kn (680K) en-ko (57M) en-la (3.1M) en-lb (3.8M) |
en-lg (16K) en-lmo (300K) en-ln (36K) en-lo (676K) en-lt (29M) en-lv (27M) en-mg (5.1M) en-mk (40M) en-ml (18M) en-mn (1.9M) en-mr (12M) en-ms (31M) en-mwl (228K) en-my (1.4M) en-nds (900K) en-nds_nl (236K) en-ne (6.6M) en-nl (106M) en-no (36M) en-ns (28K) |
en-oc (3.8M) en-om (20K) en-or (428K) en-pa (584K) en-pl (102M) en-ps (1.1M) en-pt (105M) en-ro (59M) en-ru (186M) en-sd (2.7M) en-sh (3.6M) en-si (15M) en-sk (47M) en-sl (14M) en-so (1.2M) en-sq (22M) en-sr (30M) en-ss (36K) en-su (2.3M) en-sv (61M) |
en-sw (15M) en-ta (12M) en-te (3.3M) en-tg (320K) en-th (38M) en-tl (17M) en-tn (68K) en-tr (70M) en-tt (728K) en-ug (108K) en-uk (81M) en-ur (15M) en-vi (4.0K) en-wo (116K) en-wuu (844K) en-xh (13M) en-yi (2.1M) en-yo (812K) en-zh (127M) en-zu (464K) |
en-af (120MB) en-am (7.5MB) en-an (1.4MB) en-ar (1.3GB) en-arz (1.2MB) en-as (237KB) en-ast (47MB) en-az (25MB) en-ba (1.5MB) en-bar (1.1MB) en-be (56MB) en-bg (556MB) en-bn (219MB) en-br (5.5MB) en-bs (22MB) en-ca (461MB) en-cb (305KB) en-ceb (18MB) en-cs (741MB) en-cx (4.3MB) |
en-cy (20MB) en-da (532MB) en-de (554MB) en-el (515MB) en-eo (556MB) en-es (1.4GB) en-et (344MB) en-eu (70MB) en-fa (387MB) en-ff (241KB) en-fi (431MB) en-fo (1.8MB) en-fr (1.1GB) en-fy (13MB) en-ga (6.4MB) en-gd (3.1MB) en-gl (231MB) en-gu (2.8MB) en-ha (66MB) en-he (561MB) |
en-hi (385MB) en-hr (549MB) en-ht (13MB) en-hu (682MB) en-hy (24MB) en-id (654MB) en-ig (5.2MB) en-ilo (6.1MB) en-io (803KB) en-is (121MB) en-it (802MB) en-ja (940MB) en-jv (12MB) en-ka (31MB) en-kk (15MB) en-km (8.3MB) en-kn (2.6MB) en-ko (311MB) en-la (9.3MB) en-lb (52MB) |
en-lg (41KB) en-lmo (751KB) en-ln (85KB) en-lo (2.5MB) en-lt (340MB) en-lv (141MB) en-mg (36MB) en-mk (338MB) en-ml (55MB) en-mn (8.1MB) en-mr (37MB) en-ms (284MB) en-mwl (786KB) en-my (6.0MB) en-nds (2.7MB) en-nds_nl (707KB) en-ne (20MB) en-nl (1.1GB) en-no (326MB) en-ns (105KB) |
en-oc (13MB) en-om (59KB) en-or (1.1MB) en-pa (2.7MB) en-pl (978MB) en-ps (4.8MB) en-pt (737MB) en-ro (722MB) en-ru (1.4GB) en-sd (12MB) en-sh (16MB) en-si (58MB) en-sk (562MB) en-sl (118MB) en-so (5.3MB) en-sq (301MB) en-sr (254MB) en-ss (116KB) en-su (12MB) en-sv (669MB) |
en-sw (132MB) en-ta (76MB) en-te (16MB) en-tg (715KB) en-th (171MB) en-tl (112MB) en-tn (189KB) en-tr (548MB) en-tt (1.5MB) en-ug (331KB) en-uk (746MB) en-ur (92MB) en-ur (92MB) en-wo (334KB) en-wuu (1.1MB) en-xh (50MB) en-yi (22MB) en-yo (5.1MB) en-zhX (1.4GB) en-zu (2.6MB) |