Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

batch loglikelihood_rolling across requests
#2559 opened Dec 11, 2024 by baberabb Loading…
add llama3 tasks
#2556 opened Dec 10, 2024 by baberabb Loading…
[MM] Chartqa
#2544 opened Dec 5, 2024 by baberabb Draft
[MM] Ai2d
#2542 opened Dec 5, 2024 by baberabb Draft
New arabicmmlu
#2541 opened Dec 5, 2024 by bodasadallah Loading…
Update KorMedMCQA: ver 2.0
#2540 opened Dec 5, 2024 by GyoukChu Loading…
max_length not used
#2515 opened Nov 25, 2024 by lintangsutawika Loading…
AraDICE task config file
#2507 opened Nov 19, 2024 by firojalam Loading…
fixed mmlu generative response extraction
#2503 opened Nov 18, 2024 by RawthiL Loading…
Added regex filter for bbh fewshot
#2502 opened Nov 18, 2024 by RawthiL Loading…
Add GigaChat API
#2495 opened Nov 15, 2024 by seldereyy Loading…
Yaml crowspairs tasks
#2488 opened Nov 14, 2024 by NAM00 Loading…
Biology ds
#2486 opened Nov 13, 2024 by deema-A Loading…
MILU dataset from AI4Bharat for Indic LLM eval
#2482 opened Nov 12, 2024 by abhinand5 Loading…
Update citation
#2474 opened Nov 8, 2024 by Sypherd Loading…
Use global filter alias
#2473 opened Nov 8, 2024 by Sypherd Loading…
allow fewshots for multimodal tasks
#2450 opened Nov 1, 2024 by artemorloff Loading…
Add Aggregation for Kobest Benchmark
#2446 opened Oct 31, 2024 by tryumanshow Loading…
fix tmlu tmlu_taiwan_specific_tasks tag
#2420 opened Oct 22, 2024 by nike00811 Loading…
ProTip! What’s not been updated in a month: updated:<2024-11-12.