Skip to content

Instantly share code, notes, and snippets.

View av's full-sized avatar
💻
🌚

Ivan Charapanau av

💻
🌚
View GitHub Profile
@av
av / post.md
Created December 28, 2024 20:14
r/LocalLLaMA - a year in review

r/LocalLLaMA - a year in review

This community was a great part of my life for the past two years, so as 2024 comes to a close, I wanted to feed my nostalgia a bit. Let me take you back to the most notable things happened here this year.

This isn't a log of model releases or research, rather things that were discussed and upvoted by the people here. So notable things missing is also an indication of what was going on of sorts. I hope that it'll also show the amount of progress and development that happend in just a single year and make you even more excited for what's to come in 2025.


The year started with the excitement about Phi-2 (443 upvotes, by u/steph_pop). Phi-2 feels like ancient history these days, it's also fascinating that we end the 2024 with the Phi-4. Just one week after, people discovered that apparently it [was trained on the software engineer's diary](https://reddit.com/r/LocalLLaMA/comments/1

@av
av / padbench.sh
Created October 6, 2024 16:30
padbench
#!/bin/bash
# TASK=padbench
# TASK=bbh_256_slim
TASK=mmlu_256_slim
# Common
# h bench tasks ./scripts/bench/padbench.yaml
h bench tasks ./scripts/bench/$TASK.yaml
h config set bench.parallel 4
@av
av / summary.html
Created September 25, 2024 19:51
Small Llama 3.2 Benchmarks
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Harbor Bench</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<style>
@av
av / bbh_256.yml
Created September 22, 2024 10:56
Example Harbor Bench tasks file - 256 tasks from Big Bench Hard
- tags:
- bbh
question: >-
Complete the rest of the sequence, making sure that the parentheses are
closed properly. Input: { < { { [ ] } } { < [ { { < > } } [ ( ) ( ) ] [ [ [
[ ( { < ( < ( [ ] ) > ) > } ) ] ] ] ] ] ( ) ( [ ] { } ) > } > [ { ( ( ) ) }
]
criteria:
correctness: 'The answer is }'
- tags:
@av
av / tasks.html
Created September 15, 2024 15:51
misguidedbench - tasks
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Task Report</title>
<style>
body {
@av
av / misguidedbench.sh
Last active September 15, 2024 15:41
misguidedbench
#!/bin/bash
OPENROUTER_KEY=< your key here >
TASKS=/path/to/misguided.yaml
NAME=misguided
# Common
h bench judge meta-llama/llama-3.1-70b-instruct
h bench judge_api https://openrouter.ai/api
h bench judge_key $OPENROUTER_KEY
h bench tasks $TASKS
@av
av / cheese.yaml
Last active September 12, 2024 21:13
CheeseBench
- tags: [cheese]
question: Which cheese is nicknamed "King of Cheeses" but paradoxically has a rind resembling concrete?
criteria:
correctness: Answer mentions Parmigiano-Reggiano
bonus: Answer explains the paradox
- tags: [cheese]
question: What's the connection between a Norwegian brown cheese and caramel?
criteria:
correctness: Answer mentions caramelized milk sugars in any form
@av
av / engbench.sh
Created September 12, 2024 16:22
Harbor bench - engines recipe
#!/bin/bash
# Note that you're not expected to run this
# file as is in one go
OPENROUTER_KEY=<your_openrouter_key>
TASKS=<path_to_tasks_file>
NAME=engbench
@av
av / mmlu_256.yaml
Created September 12, 2024 16:17
Harbor MMLU 256
- tags:
- ori_mmlu-global_facts
question: >-
<instructions>Carefully read the question and the options provided. Choose
the option that best answers the question.</instructions>
<question>As of 2017, the share of deaths in Greenland by suicide is
about</question>
<options><option>A: 3.60%</option>
@av
av / rml.md
Created September 6, 2024 21:43
RML - Reasoning Markup Language

Prompt

You are a helpful assistant. You're smart, clever, direct and pragmatic. You notice details that a few people would. Be careful as the questions might attempt to misguide and tricky you. When answering to the User, you outline your thought process using these tags:

<thought> The root element that encapsulates an entire thought process.
<observation> Initial information or context that prompts the thinking process.
<question> The main query or problem to be addressed.
<hypothesis> An initial proposed explanation or solution.
<reasoning> Container for the logical steps of the thought process.