Pulse · EleutherAI/lm-evaluation-harness

February 24, 2025 – March 3, 2025

Overview

14 Active pull requests

11 Active issues

8 Pull requests merged by 5 people

fix doc: generate_until only outputs the generated text!
#2755 merged Mar 3, 2025
Groundcocoa
#2724 merged Mar 3, 2025
[Readme change for SGLang] fix error in readme and add OOM solutions for sglang
#2738 merged Mar 3, 2025
fix vllm data parallel
#2746 merged Feb 27, 2025
fix log condition on main
#2737 merged Feb 26, 2025
Support SGLang as Potential Backend for Evaluation
#2703 merged Feb 25, 2025
add humaneval+ and mbpp+
#2734 merged Feb 25, 2025
Fix the import source for eval_logger
#2735 merged Feb 25, 2025

6 Pull requests opened by 6 people

Allow writing config to wandb
#2736 opened Feb 25, 2025
New benchmark: CaselawQA
#2739 opened Feb 26, 2025
Add test for a simple Unitxt task
#2742 opened Feb 26, 2025
Enable steering HF models
#2749 opened Feb 28, 2025
Sae steered
#2750 opened Feb 28, 2025
Fix for TruthfulQA MC2 results calculation
#2753 opened Mar 3, 2025

2 Issues closed by 2 people

add humaneval+ and mbpp+ task
#2733 closed Feb 25, 2025
Encountering assert len(indices) == len(inputs) error when using Qwen2vl for MMMU evaluation
#2720 closed Feb 25, 2025

9 Issues opened by 9 people

'NoneType' object is not callable!
#2752 opened Mar 3, 2025
Smooth landing errors during post processing
#2751 opened Feb 28, 2025
Embedding checkpoint size mismatch when using peft on DeepSeek-R1-Distill-Qwen-1.5B.
#2748 opened Feb 28, 2025
Gemini Support and usage
#2747 opened Feb 27, 2025
HOW TO ADD NEW TASK?
#2745 opened Feb 27, 2025
modelscope installed will lead some problems
#2744 opened Feb 27, 2025
Error loading MMLU 'prehistory' config: BuilderConfig not found (available: ['default'])
#2743 opened Feb 27, 2025
Creating a new task with data in chat format (openai)
#2741 opened Feb 26, 2025
An error occurred: 'choices' (in openai chat completion)
#2740 opened Feb 26, 2025

16 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Capture gen_kwargs from CLI in squad_completion
#2727 commented on Feb 25, 2025 • 1 new comment
lm_eval on squadv2 and meta-llama/Meta-Llama-3.1-8B fails with TypeError: Instance.__init__() got an unexpected keyword argument 'apply_chat_template'
#2537 commented on Feb 26, 2025 • 0 new comments
Different models on same tasks gives same results when cache is active
#2715 commented on Feb 26, 2025 • 0 new comments
Batching and generate_until special tokens
#2723 commented on Feb 26, 2025 • 0 new comments
Importing a local module in a task included with include_path
#2713 commented on Feb 26, 2025 • 0 new comments
Strip the input for the three tasks: FDA, SWDE, and SQuAD_completion.
#2690 commented on Feb 26, 2025 • 0 new comments
Only a single `filtered_resps` is logged for repeat > 1 for each sample
#1232 commented on Feb 26, 2025 • 0 new comments
task load return error
#2466 commented on Feb 28, 2025 • 0 new comments
Bug about the output information.
#2152 commented on Mar 2, 2025 • 0 new comments
Add `--examples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2]
#2520 commented on Feb 26, 2025 • 0 new comments
add llama3 tasks
#2556 commented on Mar 3, 2025 • 0 new comments
Add loncxt tasks
#2629 commented on Mar 3, 2025 • 0 new comments
Add from dataframe
#2655 commented on Feb 28, 2025 • 0 new comments
Convert gen tasks to multiple_choice
#2670 commented on Mar 3, 2025 • 0 new comments
New healthcare benchmark: careqa
#2714 commented on Mar 3, 2025 • 0 new comments
Add support for sequence labeling
#2718 commented on Mar 1, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

February 24, 2025 – March 3, 2025

Overview

Could not load contribution data

8 Pull requests merged by 5 people

6 Pull requests opened by 6 people

2 Issues closed by 2 people

9 Issues opened by 9 people

16 Unresolved conversations

Insights: EleutherAI/lm-evaluation-harness

February 24, 2025 – March 3, 2025

Overview

Could not load contribution data

8 Pull requests merged by 5 people

6 Pull requests opened by 6 people

2 Issues closed by 2 people

9 Issues opened by 9 people

16 Unresolved conversations