-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Insights: EleutherAI/lm-evaluation-harness
Overview
Could not load contribution data
Please try again later
8 Pull requests merged by 5 people
-
fix doc: generate_until only outputs the generated text!
#2755 merged
Mar 3, 2025 -
Groundcocoa
#2724 merged
Mar 3, 2025 -
[Readme change for SGLang] fix error in readme and add OOM solutions for sglang
#2738 merged
Mar 3, 2025 -
fix vllm data parallel
#2746 merged
Feb 27, 2025 -
fix log condition on main
#2737 merged
Feb 26, 2025 -
Support SGLang as Potential Backend for Evaluation
#2703 merged
Feb 25, 2025 -
add humaneval+ and mbpp+
#2734 merged
Feb 25, 2025 -
Fix the import source for eval_logger
#2735 merged
Feb 25, 2025
6 Pull requests opened by 6 people
-
Allow writing config to wandb
#2736 opened
Feb 25, 2025 -
New benchmark: CaselawQA
#2739 opened
Feb 26, 2025 -
Add test for a simple Unitxt task
#2742 opened
Feb 26, 2025 -
Enable steering HF models
#2749 opened
Feb 28, 2025 -
Sae steered
#2750 opened
Feb 28, 2025 -
Fix for TruthfulQA MC2 results calculation
#2753 opened
Mar 3, 2025
2 Issues closed by 2 people
-
add humaneval+ and mbpp+ task
#2733 closed
Feb 25, 2025 -
Encountering assert len(indices) == len(inputs) error when using Qwen2vl for MMMU evaluation
#2720 closed
Feb 25, 2025
9 Issues opened by 9 people
-
'NoneType' object is not callable!
#2752 opened
Mar 3, 2025 -
Smooth landing errors during post processing
#2751 opened
Feb 28, 2025 -
Embedding checkpoint size mismatch when using peft on DeepSeek-R1-Distill-Qwen-1.5B.
#2748 opened
Feb 28, 2025 -
Gemini Support and usage
#2747 opened
Feb 27, 2025 -
HOW TO ADD NEW TASK?
#2745 opened
Feb 27, 2025 -
modelscope installed will lead some problems
#2744 opened
Feb 27, 2025 -
Error loading MMLU 'prehistory' config: BuilderConfig not found (available: ['default'])
#2743 opened
Feb 27, 2025 -
Creating a new task with data in chat format (openai)
#2741 opened
Feb 26, 2025 -
An error occurred: 'choices' (in openai chat completion)
#2740 opened
Feb 26, 2025
16 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Capture gen_kwargs from CLI in squad_completion
#2727 commented on
Feb 25, 2025 • 1 new comment -
lm_eval on squadv2 and meta-llama/Meta-Llama-3.1-8B fails with TypeError: Instance.__init__() got an unexpected keyword argument 'apply_chat_template'
#2537 commented on
Feb 26, 2025 • 0 new comments -
Different models on same tasks gives same results when cache is active
#2715 commented on
Feb 26, 2025 • 0 new comments -
Batching and generate_until special tokens
#2723 commented on
Feb 26, 2025 • 0 new comments -
Importing a local module in a task included with include_path
#2713 commented on
Feb 26, 2025 • 0 new comments -
Strip the input for the three tasks: FDA, SWDE, and SQuAD_completion.
#2690 commented on
Feb 26, 2025 • 0 new comments -
Only a single `filtered_resps` is logged for repeat > 1 for each sample
#1232 commented on
Feb 26, 2025 • 0 new comments -
task load return error
#2466 commented on
Feb 28, 2025 • 0 new comments -
Bug about the output information.
#2152 commented on
Mar 2, 2025 • 0 new comments -
Add `--examples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2]
#2520 commented on
Feb 26, 2025 • 0 new comments -
add llama3 tasks
#2556 commented on
Mar 3, 2025 • 0 new comments -
Add loncxt tasks
#2629 commented on
Mar 3, 2025 • 0 new comments -
Add from dataframe
#2655 commented on
Feb 28, 2025 • 0 new comments -
Convert gen tasks to multiple_choice
#2670 commented on
Mar 3, 2025 • 0 new comments -
New healthcare benchmark: careqa
#2714 commented on
Mar 3, 2025 • 0 new comments -
Add support for sequence labeling
#2718 commented on
Mar 1, 2025 • 0 new comments