The Suboptimal WMT Test Sets and Their Impact on Human Parity

This repository provides codes and datasets for the NMT evaluation for the following publication, "The Suboptimal WMT Test Sets and Their Impact on Human Parity" by Ahrii Kim, Yunju Bak, Jimin Sun, Sungwon Lyu, and Changmin Lee, submitted at Preprints.org.

About

We propose that the low quality of the source test set of the news track at WMT 2020 may lead to an overrated human parity claim. This is proved by performing a Relative Ranking human evaluation which is later analyzed by Significance Test and Absolute Ranking scores. We report that:

About 5% of the segments that have previously achieved a human parity claim turn out to be statistically invalid.
The proportion reaches up to 7% when the "contaminated" sentences are solely computed.
The absolute ranking score drops in all MT models while .

Dataset

Language Pair: EN -> KO
Original English Source: WMT 2020 English-III test set
Volume: 437 sentences
System: 2 online models (MT_Y, MT_Z)
Comparison:
- before-test-set (BTS) vs. after-test-set (ATS)
- Human Translation(HT) vs. MT_Y vs. MT_Z
- Google Translate (as a benchmark model)

Code

To install packages, please refer to requirements.txt

# run main score code
cd evalutation
python score.py > out.txt

Paper

Bibtex (preprinted ver.)

@article{kim-et-al-21-testset,
  author =       "Ahrii Kim and Yunju Bak and Jimin Sun and Sungwon Lyu and Changmin Lee",
  title =        "The Suboptimal WMT Test Sets and Their Impact on Human Parity",
  journal =      "Preprints",
  year =         "2021",
  doi = "10.20944/preprints202110.0199.v1"
}

Contact

For more questions, please contact the first author of the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data		data
evaluation		evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Suboptimal WMT Test Sets and Their Impact on Human Parity

About

Dataset

Code

Paper

Bibtex (preprinted ver.)

Contact

About

Releases

Packages

Languages

License

trotacodigos/suboptimal_test_set

Folders and files

Latest commit

History

Repository files navigation

The Suboptimal WMT Test Sets and Their Impact on Human Parity

About

Dataset

Code

Paper

Bibtex (preprinted ver.)

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages