Shared Task: Multilingual Low-Resource Translation for Indo-European Languages



AUTOMATIC EVALUATION

Metrics. We evaluate translations using BLEU, TER, chrF (all with SacreBLEU), COMET and BertScore. The final ranking is done according to the average ranking of the individual metrics per family, ties on individual metrics are considered. Baselines:

ROMANCE FAMILY (Wikipedia)

Official Ranking Average Ranking BLEU TER chrF COMET BertScore
CUNI-Primary 1.2±0.4 50.06 0.401 0.694 0.566 0.901
CUNI-Contrastive 1.6±0.5 49.48 0.404 0.693 0.569 0.901
Tencent-Contrastive 3.0±0.0 43.45 0.460 0.670 0.444 0.894
Tencent-Primary 3.8±0.4 43.31 0.462 0.668 0.442 0.894
BSC-Primary (*) 5.0±0.7 41.33 0.462 0.647 0.363 0.884
M2M-100 (baseline) 5.8±0.4 40.02 0.478 0.634 0.414 0.878
UBCNLP-Primary 7.2±0.4 35.41 0.528 0.588 0.007 0.854
mT5-Finetuned (baseline) 8.0±0.7 29.28 0.592 0.553 0.059 0.850
UBCNLP-Contrastive 8.6±0.5
28.51 0.591 0.529 -0.374 0.825
(*) Late submission

Evaluation per language pair (non-ranked, just for completeness)

ca2it BLEU TER chrF COMET BertScore
M2M-100 (baseline) 46.75 0.390 0.694 0.743 0.913
mT5-devFinetuned (baseline) 30.38 0.551 0.571 0.235 0.872
BSC-Primary 42.00 0.420 0.670 0.651 0.908
CUNI-Contrastive 49.49 0.366 0.714 0.813 0.916
CUNI-Primary 50.48 0.360 0.717 0.810 0.917
Tencent-Contrastive 44.09 0.410 0.680 0.667 0.912
Tencent-Primary 43.24 0.418 0.671 0.640 0.910
UBCNLP-Contrastive 25.46 0.574 0.539 -0.263 0.844
UBCNLP-Primary 35.06 0.477 0.622 0.391 0.886

ca2oc BLEU TER chrF COMET BertScore
M2M-100 (baseline) 40.24 0.405 0.673 0.341 0.892
mT5-devFinetuned (baseline) 40.14 0.395 0.680 0.402 0.897
BSC-Primary 57.10 0.272 0.780 0.514 0.929
CUNI-Contrastive 67.11 0.201 0.832 0.724 0.952
CUNI-Primary 66.90 0.202 0.829 0.719 0.951
Tencent-Contrastive 56.09 0.309 0.813 0.617 0.941
Tencent-Primary 56.52 0.304 0.817 0.640 0.944
UBCNLP-Contrastive 51.46 0.316 0.736 0.259 0.905
UBCNLP-Primary 59.93 0.254 0.787 0.538 0.928

ca2ro BLEU TER chrF COMET BertScore
M2M-100 (baseline) 33.06 0.640 0.535 0.159 0.831
mT5-devFinetuned (baseline) 17.33 0.830 0.407 -0.461 0.784
BSC-Primary 24.90 0.695 0.490 -0.076 0.814
CUNI-Contrastive 31.83 0.644 0.533 0.169 0.835
CUNI-Primary 32.81 0.640 0.535 0.168 0.834
Tencent-Contrastive 30.16 0.661 0.517 0.047 0.830
Tencent-Primary 30.18 0.664 0.516 0.047 0.829
UBCNLP-Contrastive 8.61 0.884 0.311 -1.119 0.725
UBCNLP-Primary 11.24 0.855 0.354 -0.908 0.749

NORTH-GERMANIC FAMILY (Europeana)

Official Ranking Average Ranking BLEU TER chrF COMET BertScore
M2M-100 (baseline) 1.0±0.0 31.45 0.54 0.55 0.399 0.862
Edinsaar-Contrastive 2.2±0.4 27.07 0.57 0.54 0.283 0.856
Edinsaar-Primary 2.8±0.4 27.54 0.58 0.52 0.276 0.849
UBCNLP-Primary 4.0±0.0 24.94 0.60 0.50 0.076 0.847
UBCNLP-Contrastive 5.0±0.0 24.02 0.61 0.49 -0.068 0.837
mT5-devFinetuned (baseline) 6.0±0.0 18.53 0.78 0.42 -0.102 0.810

Evaluation per language pair (non-ranked, just for completeness)

is2nb BLEU TER chrF COMET BertScore
M2M-100 (baseline) 19.28 0.67 0.42 -0.133 0.825
mT5-devFinetuned (baseline) 22.31 0.64 0.47 0.120 0.853
Edinsaar-Contrastive 12.99 0.71 0.41 -0.250 0.820
Edinsaar-Primary 16.27 0.72 0.39 -0.287 0.812
UBCNLP-Contrastive 9.53 0.77 0.33 -0.827 0.778
UBCNLP-Primary 12.76 0.74 0.36 -0.628 0.799


is2sv BLEU TER chrF COMET BertScore
M2M-100 (baseline) 21.17 0.63 0.45 -0.110 0.826
mT5-devFinetuned (baseline) 21.11 0.69 0.46 0.047 0.844
Edinsaar-Contrastive 17.32 0.66 0.42 -0.348 0.815
Edinsaar-Primary 18.78 0.68 0.41 -0.357 0.805
UBCNLP-Contrastive 17.62 0.69 0.40 -0.425 0.810
UBCNLP-Primary 13.99 0.70 0.38 -0.572 0.804


nb2is BLEU TER chrF COMET BertScore
M2M-100 (baseline) 21.46 0.64 0.47 0.259 0.833
mT5-devFinetuned (baseline) 3.55 1.26 0.21 -0.986 0.705
Edinsaar-Contrastive 18.27 0.66 0.46 0.155 0.829
Edinsaar-Primary 19.47 0.65 0.46 0.258 0.829
UBCNLP-Contrastive 7.75 0.78 0.32 -0.924 0.771
UBCNLP-Primary 15.65 0.68 0.43 -0.074 0.822


nb2sv BLEU TER chrF COMET BertScore
M2M-100 (baseline) 50.86 0.34 0.72 0.826 0.921
mT5-devFinetuned (baseline) 18.56 0.82 0.40 -0.368 0.790
Edinsaar-Contrastive 45.43 0.37 0.69 0.690 0.911
Edinsaar-Primary 42.94 0.40 0.65 0.615 0.898
UBCNLP-Contrastive 36.84 0.43 0.63 0.422 0.893
UBCNLP-Primary 42.72 0.39 0.67 0.636 0.906


sv2is BLEU TER chrF COMET BertScore
M2M-100 (baseline) 18.96 0.66 0.48 0.501 0.832
mT5-devFinetuned (baseline) 9.40 0.82 0.35 -0.138 0.777
Edinsaar-Contrastive 20.22 0.65 0.50 0.469 0.836
Edinsaar-Primary 22.35 0.64 0.51 0.509 0.836
UBCNLP-Contrastive 20.54 0.66 0.49 0.348 0.838
UBCNLP-Primary 14.75 0.71 0.45 0.144 0.825


sv2nb BLEU TER chrF COMET BertScore
M2M-100 (baseline) 56.82 0.29 0.77 1.048 0.935
mT5-devFinetuned (baseline) 36.31 0.46 0.63 0.716 0.891
Edinsaar-Contrastive 48.17 0.35 0.73 0.980 0.923
Edinsaar-Primary 45.42 0.38 0.70 0.919 0.912
UBCNLP-Contrastive 51.84 0.33 0.74 0.996 0.931
UBCNLP-Primary 49.76 0.35 0.73 0.952 0.927