LessWrong

LLM AGI will have memory, and memory changes alignment

Summary:

When stateless LLMs are given memories they will accumulate new beliefs and behaviors, and that may allow their effective alignment to evolve. (Here "memory" is learning during deployment that is persistent beyond a single session.)^[1]

LLM agents will have memory: Humans who can't learn new things ("dense anterograde amnesia") are not highly employable for knowledge work. LLM agents that can learn during deployment seem poised to have a large economic advantage. Limited memory systems for agents already exist, so we should expect nontrivial memory abilities improving alongside other capabilities of LLM agents.

Memory changes alignment: It is highly useful to have an agent that can solve novel problems and remember the solutions. Such memory includes useful skills and beliefs like "TPS reports should be filed in the folder ./Reports/TPS"....

(Continue Reading – 2407 more words)

khafra15m20

Good timing--the day after you posted this, a round of new Tom & Jerry cartoons swept through twitter, fueled by transformer models which included in their layers MLPs that can learn at test time. Github repo here: https://github.com/test-time-training (The videos are more eye-catching, but they've also done text models).

The case for AGI by 2030

Benjamin_Todd

14h

This is a linkpost for https://80000hours.org/agi/guide/when-will-agi-arrive/

I spent a couple of weeks writing this new introduction to AI timelines. Posting here in case useful to share and for feedback. The aim is to be up-to-date, more balanced than Situational Awareness, but still relatively accessible.

In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress:

OpenAI’s Sam Altman: Shifted from saying in November “the rate of progress continues” to declaring in January “we are now confident we know how to build AGI”
Anthropic’s Dario Amodei: Stated in January “I’m more confident than I’ve ever been that we’re close to powerful capabilities… in the next 2-3 years”
Google DeepMind’s Demis Hassabis: Changed from “as soon as 10 years” in autumn to “probably three to five years away” by January.

What explains the shift? Is...

(Continue Reading – 12477 more words)

6Vladimir_Nesov12h

Traditionally steps of GPT series are roughly 100x in raw compute (I'm not counting effective compute, since it's not relevant to cost of training). GPT-4 is 2e25 FLOPs. Which puts "GPT-6" at 2e29 FLOPs. To train a model in 2028, you would build an Nvidia Rubin Ultra NVL576 (Kyber) training system in 2027. Each rack holds 576 compute dies at about 3e15 BF16 FLOP/s per die[1] or 1.6e18 FLOP/s per rack. A Blackwell NVL72 datacenter costs about $4M per rack to build, possibly a non-Ultra Rubin NVL144 datacenter will cost about $5M per rack, and a Rubin Ultra NVL576 datacenter might cost about $12M per rack[2]. To get 2e29 BF16 FLOPs in 4 months at 40% utilization, you'd need 30K racks that would cost about $360B all-in (together with the rest of the training system). Which is significantly more than "tens of billions of dollars". "GPT-8" is two steps of 100x in raw compute up from "GPT-6", at 2e33 FLOPs. You'd need to use 10000x more compute than what $360B buy in 2027. Divide it by how much cheaper that compute gets within a few years, let's say 8x cheaper. What we get is $450T, which is much more than merely "trillions", and also technologically impossible to produce at that time without transformative AI. ---------------------------------------- 1. Chips in Blackwell GB200 systems are manufactured with 4nm process and produce about 2.5 dense BF16 FLOP/s per chip, with each chip holding 2 almost reticle sized compute dies. Rubin moves to 3nm, compared to Blackwell at 4nm, which makes each die about 2x more performant (from more transistors and higher clock speed, but the die size must remain the same), which predicts about 2.5 dense BF16 FLOP/s per die or 5 BF16 FLOP/s per 2-die chip. (Nvidia announced that dense FP8 performance will increase 3.3x, but that's probably due to giving more transistors to FP8, which can't be done as much for BF16 since it already needs a lot.) To separately support this, today Google announced Ironwood, their 7th generation

Benjamin_Todd23m10

Thanks, useful to have these figures and an independent data on these calculations.

I've been estimating it based on a 500x increase in effective FLOP per generation, rather than 100x of regular FLOP.

Rough calculations are here.

At the current trajectory, the GPT-6 training run costs $6bn in 2028, and GPT-7 costs $130bn in 2031.

I think that makes GPT-8 a couple of trillion in 2034.

You're right that if you wanted to train GPT-8 in 2031 instead, then it would cost roughly 500x more than training GPT-7 that year.

Grounded Ghosts in the Machine - Friston Blankets, Mirror Neurons, and the Quest for Cooperative AI

Davidmanheim

24m

This is a linkpost for https://davidmanheim.com/exploring-cooperation-11

This is another post in my ongoing "Exploring Cooperation" substack series, focused on something more directly related to LLMs and alignment - I am including the post in its entirety.

Throughout this series, we’ve repeatedly circled around the requirements for genuine cooperation—shared context, aligned goals, and outcomes that matter to the participants. In earlier posts, we explored the importance of preferences and identity, noting that cooperation depends not just on behavior, but on agents who care about how things turn out and persist long enough for reciprocity and coordination to make sense. We also discussed why status and power can undermine cooperation, especially when incentives diverge or when agents lack continuity across time. Now, after laying this conceptual groundwork across discussions of evolution, economics, and history, we’re finally...

(Continue Reading – 2637 more words)

Short Timelines don't Devalue Long Horizon Research

133

Vladimir_Nesov

Ω 521d

Short AI takeoff timelines seem to leave no time for some lines of alignment research to become impactful. But any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. So even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. This doesn't crucially depend on giving significant probability to long AI takeoff timelines, or on expected value in such scenarios driving the priorities.

Potential for AI to take up the torch makes it reasonable to still prioritize things that have no hope at all...

(See More – 120 more words)

Davidmanheim29m20

That seems correct, but I don't think any of those aren't useful to investigate with AI, despite the relatively higher bar.

2ChristianKl1h

The key is still to distinguish good from bad ideas. In the linked post, you essentially make the argument that "Whole brain emulation artificial intelligence is safer than LLM-based artificial superintelligence". That's a claim that might be true or not true. On aspect of spending more time with that idea would be to think more critically about whether that's true. However, even if it would be true, it wouldn't help in a scenario where we already have LLM-based artificial superintelligence.

2avturchin13h

The same is valid for life extension research. It requires decades, and many, including Brian Johnson, say that AI will solve aging and therefore human research in aging is not relevant. However, most of aging research is about collecting data about very slow processes. The more longitudinal data we collect, the easier it will be for AI to "take up the torch."

Why Have Sentence Lengths Decreased?

189

Arjun Panickssery

“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)

Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50...

(Continue Reading – 1016 more words)

mruwnik1h10

Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.

I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.

New Paper: Infra-Bayesian Decision-Estimation Theory

Vanessa Kosoy, Diffractor

Ω 151h

This is a linkpost for https://arxiv.org/abs/2504.06820

Diffractor is the first author of this paper.
Official title: "Regret Bounds for Robust Online Decision Making"

Abstract: We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory

...

(See More – 158 more words)

2Alexander Gietelink Oldenziel1h

Congratulations on this paper. It seems like a major result. Any chance of more exposition for those of us less cognitively-inclined? =)

Vanessa Kosoy1h20

Thank you <3

Any chance of more exposition for those of us less cognitively-inclined? =)

Read the paper! :)

It might seem long at first glance, but all the results are explained in the first 13 pages, the rest is just proofs. If you don't care about the examples, you can stop on page 11. Naturally, I welcome any feedback on the exposition there.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Learned pain as a leading cause of chronic pain

SoerenMind

Epistemic status: Amateur synthesis of medical research that is still recent but now established enough to make it into modern medical textbooks. Some specific claims vary in evidence strength. I’ve spent ~20-30 hours studying the literature and treatment approaches, which were very effective for me.

Disclaimer: I'm not a medical professional. This information is educational only, not medical advice. Consult healthcare providers for medical conditions.

Key claims

This post builds on previous discussions about the fear-pain cycle and learned chronic pain. The post adds the following claims:

Neuroplastic pain - pain learned by the brain (and/or spinal cord) - is a well-evidenced phenomenon and widely accepted in modern medical research (very high confidence).
It explains many forms of chronic pain previously attributed to structural causes - not just wrist pain and back pain (high confidence). Other

...

(Continue Reading – 2405 more words)

ChristianKl1h40

In 2019, the WHO's added "nociplastic pain" (another word for neuroplastic pain) as an official new category of pain, alongside the long established nociceptic and neuropathic pain categories

It's worth noting that in 2019 the WHO also added various diagnosis from Chinese traditional medicine. The process that the WHO uses is not about finding truth but to provide codes that allow healthcare providers to talk with each other and store diagnoses.

1Rafka2h

Thanks for the write-up—I hadn’t looked into neuroplastic pain before, but it rang a bell. A year ago, I messed up my leg (probably sciatic nerve, not diagnosed), and the pain stuck around way longer than it should have. I couldn’t walk for more than five minutes without it flaring up, even weeks after the initial strain. It clearly should’ve healed by then—nothing was torn, broken, or visibly inflamed—but the pain stayed. What finally worked wasn’t rest, it was more walking. Slow, deliberate, painful-but-not-too-painful walking, plus stretching. It hurt, but it got better. And once I saw that, something flipped—now whenever that sensation comes back, I’m not worried. I just think, “yeah, I know this one,” and it fades. That sounds a lot like the “engage with the pain while reframing it as safe” strategy you described, and it tracks well with my experience. I’ll be experimenting to see if the same approach works on other kinds of pain, too.

Martin Vlach's Shortform

Martin Vlach

Martin Vlach1h10

Snapshot of a local(=Czech) discussion detailing motivations and decision paths of GAI actors, mainly the big developers:

Contributor A, initial points:

For those not closely following AI progress, two key observations:

Public Models vs. True Capability: Publicly accessible AI models will become increasingly poor indicators of the actual state-of-the-art in AI. Competitive AI labs will likely prioritize using their most advanced models internally to accelerate their own research and gain a dominant position, rather than releasing these top models for potentia

bhauth

This is a linkpost for https://www.quantamagazine.org/intelligence-evolved-at-least-twice-in-vertebrate-animals-20250407/

Researchers used RNA sequencing to observe how cell types change during brain development. Other researchers looked at connection patterns of neurons in brains. Clear distinctions have been found between all mammals and all birds. They've concluded intelligence developed independently in birds and mammals; I agree. This is evidence for convergence of general intelligence.

cubefox2h40

Your headline overstates the results. The last common ancestor of birds an mammals probably wasn't exactly unintelligent. (In contrast to our last common ancestor with the octopus, as the article discusses.)

1Jonas Hallgren4h

I see your point, yet if the given evidence is 95% in the past, the 5% in the future only gets a marginal amount added to it, I do still like the idea of crossing off potential filters to see where the risks are so fair enough!

3Knight Lee6h

What I'm trying to argue is that there could easily be no Great Filter, and there could exist trillions of trillions of observers who live inside the light cone of an old alien civilization, whether directly as members of the civilization, or as observers who listen to their radio. It's just that we're not one of them. We're one of the first few observers who aren't in such a light cone. Even though the observers inside such light cones outnumber us a trillion to one, we aren't one of them. :) if you insist on scientific explanations and dismiss anthropic explanations, then why doesn't this work as an answer?

3Julian Bradshaw5h

Oh okay. I agree it's possible there's no Great Filter.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Summary:

Key claims