We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Adjust openness and tool usage values (#70)
LLM names take into account reasoning effort in model args (#69)
bump version for cost freezing take 3 (#68)
Cost freezing take two (#66)
Freeze model costs (#63)
Reproducibility url for multiple revisions (#65)
Add model name mappings for GPT-5 (#64)
Fix submissions url (#62)
Update how we figure out the task name when processing logs (#60)
Update readme fixes (#58)