Ivan Charapanau av

r/LocalLLaMA - a year in review

This community was a great part of my life for the past two years, so as 2024 comes to a close, I wanted to feed my nostalgia a bit. Let me take you back to the most notable things happened here this year.

This isn't a log of model releases or research, rather things that were discussed and upvoted by the people here. So notable things missing is also an indication of what was going on of sorts. I hope that it'll also show the amount of progress and development that happend in just a single year and make you even more excited for what's to come in 2025.

The year started with the excitement about Phi-2 (443 upvotes, by u/steph_pop). Phi-2 feels like ancient history these days, it's also fascinating that we end the 2024 with the Phi-4. Just one week after, people discovered that apparently it [was trained on the software engineer's diary](https://reddit.com/r/LocalLLaMA/comments/1

Prompt

You are a helpful assistant. You're smart, clever, direct and pragmatic. You notice details that a few people would. Be careful as the questions might attempt to misguide and tricky you. When answering to the User, you outline your thought process using these tags:

<thought> The root element that encapsulates an entire thought process.
<observation> Initial information or context that prompts the thinking process.
<question> The main query or problem to be addressed.
<hypothesis> An initial proposed explanation or solution.
<reasoning> Container for the logical steps of the thought process.

	#!/bin/bash

	# TASK=padbench
	# TASK=bbh_256_slim
	TASK=mmlu_256_slim

	# Common
	# h bench tasks ./scripts/bench/padbench.yaml
	h bench tasks ./scripts/bench/$TASK.yaml
	h config set bench.parallel 4


	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Harbor Bench</title>
	<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
	<style>

	- tags:
	- bbh
	question: >-
	Complete the rest of the sequence, making sure that the parentheses are
	closed properly. Input: { < { { [ ] } } { < [ { { < > } } [ ( ) ( ) ] [ [ [
	[ ( { < ( < ( [ ] ) > ) > } ) ] ] ] ] ] ( ) ( [ ] { } ) > } > [ { ( ( ) ) }
	]
	criteria:
	correctness: 'The answer is }'
	- tags:


	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Task Report</title>
	<style>
	body {

	#!/bin/bash
	OPENROUTER_KEY=< your key here >
	TASKS=/path/to/misguided.yaml
	NAME=misguided

	# Common
	h bench judge meta-llama/llama-3.1-70b-instruct
	h bench judge_api https://openrouter.ai/api
	h bench judge_key $OPENROUTER_KEY
	h bench tasks $TASKS

	- tags: [cheese]
	question: Which cheese is nicknamed "King of Cheeses" but paradoxically has a rind resembling concrete?
	criteria:
	correctness: Answer mentions Parmigiano-Reggiano
	bonus: Answer explains the paradox

	- tags: [cheese]
	question: What's the connection between a Norwegian brown cheese and caramel?
	criteria:
	correctness: Answer mentions caramelized milk sugars in any form

	#!/bin/bash

	# Note that you're not expected to run this
	# file as is in one go

	OPENROUTER_KEY=<your_openrouter_key>
	TASKS=<path_to_tasks_file>
	NAME=engbench

	- tags:
	- ori_mmlu-global_facts
	question: >-
	<instructions>Carefully read the question and the options provided. Choose
	the option that best answers the question.</instructions>

	<question>As of 2017, the share of deaths in Greenland by suicide is
	about</question>

	<options><option>A: 3.60%</option>