Skip to content

lang-uk/bleurt_eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bleurt_eval

Evaluation of titles generated by Ukrainian model for GPT2 using BLEURT.

Experiment setup:

  • Results generated by three models (small, medium and large) were evaluated.
  • Files were converted using python convert_titles.py news_sample.small.out.jsonl reference.small.txt candidate.small.txt
  • Bleurt installed from master (at commit cebe7e6f996b40910cfaa520a63db47807e3bf5c)
  • using BLEURT-20 model
$ python -m bleurt.score_files   -candidate_file=candidate.small.txt   -reference_file=reference.small.txt  -bleurt_checkpoint=bleurt/BLEURT-20 -scores_file=scores.small.txt
$ python -m bleurt.score_files   -candidate_file=candidate.medium.txt   -reference_file=reference.medium.txt  -bleurt_checkpoint=bleurt/BLEURT-20 -scores_file=scores.medium.txt
$ python -m bleurt.score_files   -candidate_file=candidate.large.txt   -reference_file=reference.large.txt  -bleurt_checkpoint=bleurt/BLEURT-20 -scores_file=scores.large.txt

$ python -m bleurt.score_files   -candidate_file=candidate.mbart.1k.txt   -reference_file=reference.small.txt  -bleurt_checkpoint=bleurt/BLEURT-20 -scores_file=scores.mbart.1k.txt
$ python -m bleurt.score_files   -candidate_file=candidate.mbart.5k.txt   -reference_file=reference.small.txt  -bleurt_checkpoint=bleurt/BLEURT-20 -scores_file=scores.mbart.5k.txt

Combined files for the review available at bleurt.eval.small.csv, bleurt.eval.medium.csv and bleurt.eval.large.csv.

mean median
small 0.5432437893003226 0.5374010503292084
medium 0.5675271535292268 0.5721611678600311
large 0.5906959722489119 0.5916261672973633
mbart.1k 0.7408332733213902 0.8061753809452057
mbart.5k 0.7135102691948414 0.7874233424663544

bert_score reported on the same data.

bert_score version 0.3.13 two multilang models:

  • xlm-roberta-large, ranked 48
  • bert-base-multilingual-cased, ranked 81

xlm-roberta-large eval

$ time bert-score -r reference.small.txt -c candidate.small.txt -m xlm-roberta-large

xlm-roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.910866 R: 0.903273 F1: 0.906873

$ time bert-score -r reference.medium.txt -c candidate.medium.txt -m xlm-roberta-large

xlm-roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.915702 R: 0.906600 F1: 0.910944

$ time bert-score -r reference.large.txt -c candidate.large.txt -m xlm-roberta-large

xlm-roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.916548 R: 0.909308 F1: 0.912751

$ time bert-score -r reference.small.txt -c candidate.mbart.1k.txt -m xlm-roberta-large

xlm-roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.935879 R: 0.942202 F1: 0.938712

$ time bert-score -r reference.small.txt -c candidate.mbart.5k.txt -m xlm-roberta-large

xlm-roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.930015 R: 0.935764 F1: 0.932630

P R F1
small 0.910866 0.903273 0.906873
medium 0.915702 0.906600 0.910944
large 0.916548 0.909308 0.912751
mbart.1k 0.935879 0.942202 0.938712
mbart.5k 0.930015 0.935764 0.932630

bert-base-multilingual-cased eval

$ time bert-score -r reference.small.txt -c candidate.small.txt -m bert-base-multilingual-cased

bert-base-multilingual-cased_L9_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.780336 R: 0.766688 F1: 0.772851

$ time bert-score -r reference.medium.txt -c candidate.medium.txt -m bert-base-multilingual-cased

bert-base-multilingual-cased_L9_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.790484 R: 0.775623 F1: 0.782396

$ time bert-score -r reference.large.txt -c candidate.large.txt -m bert-base-multilingual-cased

bert-base-multilingual-cased_L9_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.792963 R: 0.780369 F1: 0.786095

$ time bert-score -r reference.small.txt -c candidate.mbart.1k.txt -m bert-base-multilingual-cased

bert-base-multilingual-cased_L9_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.839130 R: 0.863765 F1: 0.850054

$ time bert-score -r reference.small.txt -c candidate.mbart.5k.txt -m bert-base-multilingual-cased

bert-base-multilingual-cased_L9_no-idf_version=0.3.12(hug_trans=4.26.1)_fast-tokenizer P: 0.819808 R: 0.847989 F1: 0.832732

P R F1
small 0.780336 0.766688 0.772851
medium 0.790484 0.775623 0.782396
large 0.792963 0.780369 0.786095
mbart.1k 0.839130 0.863765 0.850054
mbart.5k 0.819808 0.847989 0.832732

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages