Tags · allenai/agent-eval

0.1.43

Adjust openness and tool usage values (#70)

LLM names take into account reasoning effort in model args (#69)

bump version for cost freezing take 3 (#68)

Cost freezing take two (#66)

Freeze model costs (#63)

Reproducibility url for multiple revisions (#65)

Add model name mappings for GPT-5 (#64)

Fix submissions url (#62)

Update how we figure out the task name when processing logs (#60)

Update readme fixes (#58)

PreviousNext