faul_sname — LessWrong

This suggests a course of action if you work at a company which can have significant positive externalities and cares, during good times, more than zero about them: during those good times, create dashboards and alerts with metrics which correlate with those externalities, to add trivial friction (in the form of "number go down feels bad") to burning the commons during bad times.

faul_sname's Shortform

faul_sname5d54

If something LoRA-shaped usefully cracks continual learning things a lot of things in general are going to get very crazy very quickly.

faul_sname's Shortform

faul_sname5d30

If continual learning is cracked before jailbreak resistance, and the deployment model of "the same weights are used for inference for all customers" holds, the world of corporate espionage is going to get wild.

Right now, you need to be careful not to include sensitive information, include untrusted external information, AND have a method of sending arbitrary data to the outside world in a single context window since the LLM might be tricked by the external content. Any two of those, however, are fine.

If (sample-efficient) continual learning is cracked, and models are still shared across multiple customers, you will need to be sure to never share sensitive information with a model that will learn that information and then be available for your competitors to do inference on OR fully trust any model that has learned off of possibly-advesarial-to-you data.

And if continual learning is cracked without major architectural changes, giving up on using the same model for all customers means giving up on many of the benefits of batching.

faul_sname's Shortform

faul_sname6d70

Obnoxious discovery about the Claude API that anyone doing interp involving prefill should probably be aware of: the Claude API treats prefill tokens differently from identical model-generated tokens.

Specifically, if you have some prompt, and get a completion at temperature=0, then give the exact same prompt prefilling the first n tokens of the completion you just got back, the completion after your prefill will sometimes not match the original completion by the model. This is a separate phenomenon from the phenomenon where you can get multiple possible responses from a temperature=0 prompt.

The most compact reprouction I've found is:

Model: claude-opus-4-5-20251101

Temperature: 0

Prompt: "Test"

Hitting the API with max_tokens=1 and prefill=" OK" yields ",".

Hitting the API with max_tokens=2 and prefill=" OK" yields ", I".

So you'd expect a prefill of " OK," to yield " I". But in fact, hitting the API with max_tokens=2 and prefill=" OK," yields " ".

Detailed steps to reproduce:

1.Obtain an Anthropic API key

2. Use the messages API in the most basic possible fashion

~ $ function clopus_complete {
   PROMPT="$1";
      PREFILL="$2";
         MAX_TOKENS="$3";
             curl -s https://api.anthropic.com/v1/messages   -H "content-type: application/json"   -H "x-api-key: $ANTHROPIC_API_KEY"   -H "anthropic-version: 2023-06-01"   -d "$(jq -n --arg prompt "$PROMPT" --arg prefill "$PREFILL" --argjson max_tokens "$MAX_TOKENS" '{
                 "model": "claude-opus-4-5-20251101",
                     "max_tokens": $max_tokens,
                         "temperature": 0,
                             "messages": [
                                   {"role": "user", "content": $prompt},
                                         {"role": "assistant", "content": $prefill}
                                             ]
                                               }')" | jq .content[0].text;
                                               }
                                               ~ $ clopus_complete "Test" " OK" 2
                                               ", I"
                                               ~ $ clopus_complete "Test" " OK" 1
                                               ","
                                               ~ $ clopus_complete "Test" " OK," 1
                                               "  "

3. Observe the inconsistency

As a side note, the behavior is really really weird with prefill and temperature=1. Specifically, when prefilling with " OK,", the model switches to Chinese about 15% of the time.

With temperature=1

With temp=1, the model switches to other languages quite frequently

function clopus_complete_v2 {
   PROMPT="$1";
      PREFILL="$2";
         MAX_TOKENS="$3";
             curl -s https://api.anthropic.com/v1/messages   -H "content-type: application/json"   -H "x-api-key: $ANTHROPIC_API_KEY"   -H "anthropic-version: 2023-06-01"   -d "$(jq -n --arg prompt "$PROMPT" --arg prefill "$PREFILL" --argjson max_tokens "$MAX_TOKENS" '{
                 "model": "claude-opus-4-5-20251101",
                     "max_tokens": $max_tokens,
                         "temperature": 1,
                             "messages": [
                                   {"role": "user", "content": $prompt},
                                         {"role": "assistant", "content": $prefill}
                                             ]
                                               }')" | jq .content[0].text;
                                               }

And then sampling 20x yields

[
  "I'm working fine. Is there anything I can",
  "  I'm working well!\n\nHow can I",
  "  I'm working! How can I help you",
  "I'm able to respond to your message. How",
  "I'm working properly. How can I help you",
  "  I'm here and ready to assist you.",
  "  I'm here and ready to help. What",
  "  I'm here and ready to help! How",
  " I'm here and ready to help. What",
  "  I'm here and ready to help. What",
  " I'm here and ready to help. What",
  ", I'm here and ready to help! How",
  "\nI'm here and ready to help. What",
  "测试成功！\n\n你好！我",
  "  I'm here and ready to help. What",
  "  I'm here and ready to help. What",
  "\nI'm here and ready to help! How",
  "\nI'm ready to help. What would you",
  "我收到了你的测试消息",
  "  I'm ready. How can I help you"
]

Which... what in the world???

So yeah. If you're trying to use prefill in the API for interp (or looming) purposes, be aware.

Also, and prediction: if and when Anthropic fixes this issue, it will also resolve the well-publicized bug around sha1:b5ae639978c36ae6a1890f96d58e8f3552082c4f.

Many can write faster asm than the compiler, yet don't. Why?

faul_sname9d20

That's entirely fair. And tbh most of the time I'm looking at a hot loop where the compiler did something dumb, the first question I ask myself is "is there some other way I could write this code so that the compiler will recognize that the more performant code is an option". Compilers are really quite good and fully featured these days, so there usually is some code transformation or pragma or compiler flag that will work for my specific use case.

leogao's Shortform

faul_sname9d20

Generating a loud noise that you're expecting but your opponents aren't might be even better at differentially elevating their heart rates.

Many can write faster asm than the compiler, yet don't. Why?

faul_sname10d20

The point with assembler in drawing the analogy "assembly programmers : optimizing compilers :: programmers-in-general : scaffolded LLMs". The post was not about any particular opinions I have^[1] about how LLMs will or won't interact with assembly code.

As optimizing compilers became popular, assembly programmers found that their particular skill of writing assembly code from scratch was largely obsolete. They didn't generally beome unemployed as result though. Instead, many of the incidental skills they picked up along the way^[2] went from."incidental side skill" to "main value proposition".

^{^}
I do have such opinions, namely "LLMs mostly won't write as for basically the same reasons humans don't write much asm". But that opinion isn't super relevant here.
^{^}
e.g. knowing how to read a crash dump or which memory access patterns are good or just general skill at translating high-level descriptions of program behavior into a good data model and code the correctly operate on those data structures

Many can write faster asm than the compiler, yet don't. Why?

faul_sname10d61

My thesis is approximately "we don't write assembly because it usually doesn't provide much practical benefit and also it's obnoxious to do". This is in opposition to the thesis "we don't write assembly because computers have surpassed the abilities of all but the best humans and so human intervention would only make the output worse".

I think this is an important point because some people seem to be under the impression that "LLMs can write better code than pretty much all humans" is a necessary prerequisite for "it's usually not worth it for a human to write code", and also operating under the model of "once LLMs write most code, there will be nothing left to do for the people with software development skills".

Moving Goalposts: Modern Transformer Based Agents Have Been Weak ASI For A Bit Now

faul_sname13d136

Drop in remote worker

I think this one sounds like it describes a single level of capability, but quietly assumes a that the capabilities of "a remote worker" are basically static compared to the speed of capabilities growth. A late-2025 LLM with the default late-2025 LLM agent scaffold provided by the org releasing that model (e.g. chatgpt.com for openai) would have been able to do many of the jobs posted in 2022 to Upwork. But these days, before posting a job to Upwork, most people will at least try running their request by ChatGPT to see if it can one-shot it, and so those exact jobs no longer exist. The jobs which still exist are those which require some capabilities that are not available to anyone with a browser and $20 to their name.

This is a fine assumption if you expect AI capabilities to go from "worse than humans at almost everything" to "better than humans at almost everything" in short order, much much faster than the ability of "legacy" organizations to adapt to them. I think that worldview is pretty well summarized by the graph from the waitbutwhy AI article:

But if the time period isn't short, we may instead see that "drop-in remote worker" is a moving target in the same way "AGI" is, and so me may get AI with scary capabilities we care about without getting a clear indication like "you can now hire a drop-in AI worker that is actually capable of all the things you would hire ahuman to do".

Linda Linsefors's Shortform

faul_sname16d*Ω260

Reality has a surprising amount of detail^[1]. If the training objective is improved by better modeling the world, and the model is does not have enough parameters to capture all of the things about the world which would help reduce loss, the model will learn lots of the incidental complexities of the world. As a concrete example, I can ask something like

What is the name of the stadium in Rome at the confluence of two rivers, next to the River Walk Mariott? Answer from memory.

and the current frontier models know enough about the world that they can, without tools or even any substantial chain of thought, correctly answer that trick question^[2]. To be able to answer questions like this from memory, models have to know lots of geographical details about the world.

Unless your technique for extracting a sparse modular world model produces a resulting world model which is larger than the model it came from, I think removing the things which are noise according to your sparse modular model will almost certainly hurt performance on factual recall tasks like this one.

^{^}
See the essay by that name for some concrete examples.
^{^}
The trick is that there is second city named Rome in the United States, in the state of Georgia. Both Romes contain a confluence of two rivers, both contain river walks, both contain Mariotts, both contain stadiums, but only the Rome in the US contains a stadium at the confluence of two rivers next to a Mariott named for its proximity to the river.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments