Results of the Quality Estimation Shared Task 2023

Jump direclty to

Note: due to Codalab being unstable during the competition (failing submissions due to congested servers, servers down, etc.), the automatic computation of the predictions did not go as planned. As a result, the leaderboards of the "competition" phases are not representative and should not be considered. Instead,
  • participants are only listed in tasks and language pairs they have officially declared (form) to the organisers wishing to participate in; Participants who did not fill in the form are thus not considered for the official ranking of the shared task.
  • for a given language pair, each participant was ranked based on their submission with the highest score (primary metric) for that language pair;
  • only participants who officially participated in and submitted to all language pairs for a given task were considered for the "Multilingual" ranking. In this case, we retained the highest macro-average score (as reported by our scoring programmes) over all submissions which contain predictions over all the language pairs.

Task 1 -- Sentence-level

(top)

Multilingual (Average over all LPs)

 

English-German (MQM)

Chinese-English (MQM)

 

Hebrew-English (MQM)

English-Marathi (DA)

 

English-Hindi (DA)

English-Tamil (DA)

 

English-Telegu (DA)

English-Gujarati (DA)

 

Task 1 -- Word-level

(top)

Multilingual (Average over all LPs)

 

English-German (MQM)

Chinese-English (MQM)

 

Hebrew-English (MQM)

English-Marathi (PE)

 

English-Farsi (PE)

 

Task 2 -- Error Span Detection

(top)

Multilingual (Average over all LPs)

 

English-German

Chinese-English

 

Hebrew-English