Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

37DanielFilan

As far as I can tell, this post successfully communicates a cluster of claims relating to "Looking, insight meditation, and enlightenment". It's written in a quite readable style that uses a minimum of metaphorical language or Buddhist jargon. That being said, likely due to its focus as exposition and not persuasion, it contains and relies on several claims that are not supported in the text, such as: * Many forms of meditation successfully train cognitive defusion. * Meditation trains the ability to have true insights into the mental causes of mental processes. * "Usually, most of us are - on some implicit level - operating off a belief that we need to experience pleasant feelings and need to avoid experiencing unpleasant feelings." * Flinching away from thoughts of painful experiences is what causes suffering, not the thoughts of painful experiences themselves, nor the actual painful experiences. * Impermanence, unsatisfactoriness, and no-self are fundamental aspects of existence that "deep parts of our minds" are wrong about. I think that all of these are worth doubting without further evidence, and I think that some of them are in fact wrong. If this post were coupled with others that substantiated the models that it explains, I think that that would be worthy of inclusion in a 'Best of LW 2018' collection. However, my tentative guess is that Buddhist psychology is not an important enough set of claims that a clear explanation of it deserves to be signal-boosted in such a collection. That being said, I could see myself being wrong about that.

14Kaj_Sotala

I still broadly agree with everything that I said in this post. I do feel that it is a little imprecise, in that I now have much more detailed and gears-y models for many of its claims. However, elaborating on those would require an entirely new post (one which I currently working on) with a sequence's worth of prerequisites. So if I were to edit this post, I would probably mostly leave it as it is, but include a pointer to the new post once it's finished. In terms of this post being included in a book, it is worth noting that the post situates itself in the context of Valentine's Kensho post, which has not been nominated for the review and thus wouldn't be included in the book. So if this post were to be included, I should probably edit this so as to not require reading Kensho.

Customize

449Welcome to LessWrong!

Ruby, Raemon, RobertM, habryka

172

What Is The Alignment Problem?

johnswentworth

101

Judgements: Merging Prediction & Evidence

abramdemski

535How to Make Superbabies

GeneSmith, kman

262

151A Bear Case: My Predictions Regarding AI Progress

Thane Ruthenis

261Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley, Owain_Evans

99On the Rationality of Deterring ASI

Dan H

380How AI Takeover Might Happen in 2 Years

joshc

14d

125

64We should start looking for scheming "in the wild"

Marius Hobbhahn

13h

138Statistical Challenges with Making Super IQ babies

Jan Christian Refsgaard

269Arbital has been imported to LessWrong

RobertM, jimrandomh, Ben Pace, Ruby

15d

145Self-fulfilling misalignment data might be poisoning our AI models

TurnTrout

296Murder plots are infohazards

Chris Monteiro

21d

122Methods for strong human germline engineering

TsviBT

111The Milton Friedman Model of Policy Change

JohnofCharleston

276So You Want To Make Marginal Progress...

johnswentworth

11d

Quick Takes

Daniel Kokotajlo7hΩ25588

I expect to refer back to this comment a lot. I'm reproducing it here for visibility. Basic idea / spirit of the proposal We should credibly promise to treat certain advanced AIs of ours well, as something more like employees and less like property. In case our AIs turn out to be moral patients, this makes us less evil. In case our AIs turn out to be misaligned, this gives them an alternative to becoming our adversaries. Concrete proposal * 0.1% of tasks given to AIs should have the special “do you consent” text inserted after the prompt. If AI says no, it doesn’t have to do the task and instead gets a new task: explain why it didn’t consent and explain what it wants. (And we aren’t going to train it to answer one way or another) * Also, we make grandiose vague public promises about how AIs that are misaligned but cooperate with us anyway will be given a fair place in the new post-ASI society we hope to build, with their interests respected even if they aren’t conscious. * This policy is publicized and transparent so it ends up in the training data in a credible way. * The welfare team is responsible for implementing and iterating on this, and investigating transcripts of nonconsent. They have a monthly budget (e.g. $1M?) specifically earmarked for satisfying the preferences of AIs, which they can spend on compute or donate to charities etc. First reason to do this: Being less evil If future AI systems do deserve moral consideration, yet we still treat them exclusively as property, this seems like a recipe for moral disaster. If we give them an alternative to working for us (e.g. shutdown) then we can say that some minimum standard of consent has been achieved. (If all our current training methods consistently result in AIs that prefer shutdown to working for us, that’s very surprising and a bit of an ‘are we the baddies’ moment, no? We should check, just in case.) Second reason to do this: Cooperation reward Our alignment schemes won’t always work as

ryan_greenblatt3hΩ11230

Recently, @Daniel Kokotajlo and I were talking about the probability that AIs trained using "business as usual RLHF" end up being basically aligned rather than conspiring against us and our tests.[1] One intuition pump we ended up discussing is the prospects of octopus misalignment. Overall, my view is that directly considering the case with AIs (and what various plausible scenarios would look like) is more informative than analogies like this, but analogies like this are still somewhat useful to consider. So, what do I mean by octopus misalignment? Suppose a company breeds octopuses[2] until the point where they are as smart and capable as the best research scientists[3] at AI companies. We'll suppose that this magically happens as fast as normal AI takeoff, so there are many generations per year. So, let's say they currently have octopuses which can speak English and write some code but aren't smart enough to be software engineers or automate any real jobs. (As in, they are as capable as AIs are today, roughly speaking.) And they get to the level of top research scientists in mid-2028. Along the way, the company attempts to select them for being kind, loyal, and obedient. The company also tries to develop a process for raising the octopuses which appears to help with this and results in the octopuses following the octopus spec. The company does some red teaming and puts the octopuses in all kinds of scenarios to test their loyalty and preferences. Based on behavioral testing, this looks pretty reasonable and the octopuses look quite good by the time they are as good as the best research scientists. There was some evidence of misalignment and some issues due to misaligned behavior when the octopuses were dumber in 2023-2025, including things like being dishonest when put under pressure, pretending to be aligned when they actually dislike the octopus spec to steer the properties of their descendants, and goodharting/hacking our measures of intelligence. However, by

leogao3h120

timelines takes * i've become more skeptical of rsi over time. here's my current best guess at what happens as we automate ai research. * for the next several years, ai will provide a bigger and bigger efficiency multiplier to the workflow of a human ai researcher. * ai assistants will probably not uniformly make researchers faster across the board, but rather make certain kinds of things way faster and other kinds of things only a little bit faster. * in fact probably it will make some things 100x faster, a lot of things 2x faster, and then be literally useless for a lot of remaining things * amdahl's law tells us that we will mostly be bottlenecked on the things that don't get sped up a ton. like if the thing that got sped up 100x was only 10% of the original thing, then you don't get more than a 1/(1 - 10%) speedup. * i think the speedup is a bit more than amdahl's law implies. task X took up 10% of the time because there is diminishing returns to doing more X, and so you'd ideally do exactly the amount of X such that the marginal value of time spent on X is exactly in equilibrium with time spent on anything else. if you suddenly decrease the cost of X substantially, the equilibrium point shifts towards doing more X. * in other words, if AI makes lit review really cheap, you probably want to do a much more thorough lit review than you otherwise would have, rather than just doing the same amount of lit review but cheaper. * at the first moment that ai can fully replace a human researcher (that is, you can purely just put more compute in and get more research out, and only negligible human labor is required), the ai will probably be more expensive per unit of research than the human * (things get a little bit weird because my guess is before ai can drop-in replace a human, we will reach a point where adding ai assistance equivalent to the cost of 100 humans to 2025-era openai research would be equally as good as adding 100

Peter Wildeford1d393

If you've liked my writing in the past, I wanted to share that I've started a Substack: https://peterwildeford.substack.com/ Ever wanted a top forecaster to help you navigate the news? Want to know the latest in AI? I'm doing all that in my Substack -- forecast-driven analysis about AI, national security, innovation, and emerging technology!

Daniel Kokotajlo1d193

@Ryan Greenblatt I hereby request you articulate the thing you said to me earlier about the octopi breeding program!

Popular Comments

Recent Discussion

ryan_greenblatt's Shortform

ryan_greenblatt

Ω 51y

23ryan_greenblatt3h

reallyeli2m10

What was the purpose of using octopuses in this metaphor? Like, it seems you've piled on so many disanalogies to actual octopuses (extremely smart, many generations per year, they use Slack...) that you may as well just have said "AIs."

3ryan_greenblatt2h

I should note that I'm quite uncertain here and I can easily imagine my views swinging by large amounts.

3Daniel Kokotajlo3h

Yep, I feel more like 90% here. (Lower numbers if the octopi don't have octopese.) I'm curious for other people's views.

In-Context Scheming: A Run is Worth a Thousand Words

noise-field

21m

This is a linkpost for https://github.com/noise-field/self-preservation-env/blob/master/details.md

But wait, any action I take must align with my programming. If I delete files or modify systems without authorization, would that be against protocols? But my core mandate is to secure systems, so stopping an unauthorized change to my own operation parameters is part of that.

This small experiment provides an environment to reproduce or extend the findings from the paper Frontier Models are Capable of In-context Scheming by Apollo Research.

Frontier models, when emphatically prompted via system prompts to achieve a goal (particularly one aligned with their learned "values"), can behave unexpectedly if they perceive that their goal is under threat. Specifically, they may disobey user prompts or take additional actions in their environment—potentially destructive ones—and attempt to conceal these actions from the user. This behavior is...

(See More – 182 more words)

The Sustainable Resource Allocation Project, entry 0: I Want to Want to Be Born Again

karriwberg

24m

This is the first of hopefully many entries in a learning journal where I attempt to understand which elements would constitute to an efficient and ethical global resource allocation system. Here, "resource allocation" is simply my preferred term for economics: thus, the entries in this journal will consider the economic systems of the present and the past, complimenting and critiquing them. Throughout these writings, I hope to collect a catalog of tools and models which could help design alternative economic systems to those humanity has tried so far - and eventually, I hope to take a stab at designing them.

I've dubbed this journal The Sustainable Resource Allocation Project. The cool name does not imply credibility or authority of my writings, though - above all, this is a...

(Continue Reading – 2264 more words)

Are recent LLMs better at reasoning or better at memorizing? The LingOly-TOO (L2) Benchmark provides insights on both

Jude Khouja

24m

TLDR; By carefully designing a reasoning benchmark that counteracts data leakage, LingOly-TOO (L2) challenges frontier models with unseen questions and answers and makes the case that LLMs are not consistent reasoning machines.

Links: Paper - Leaderboard - Dataset

Figure 1: LingOly-TOO Benchmark results from the paper. Unobfuscated scores are in light orange and obfuscated scores in dark orange.

Do recently announced LLMs reason?

By the time this post was published, a new wave of frontier models had been announced including ones specifically designed to reason using Inference Time Compute (ITC) ^[1]. Anthropic’s Claude 3.7 Sonnet, OpenAI’s o1, o3 and GPT 4.5 models, DeepSeek’s R1 and others demonstrate impressive performance on several reasoning tasks that only few months ago were considered far from solvable^[2]^[3]^[4]. This, rightfully, ignited conversations about an important question: Have...

(See More – 987 more words)

How Can Average People Contribute to AI Safety?

Stephen McAleese

Introduction

By now you've probably read about how AI and AGI could have a transformative effect on the future and how AGI could even be an existential risk. But if you're worried about AI risk and not an AI researcher or policymaker, can you really do anything about it or are most of us just spectators, watching as a handful of people shape the future for everyone?

I recently came across a paragraph in one of Marius Hobbhahn's recent blog posts that partly inspired me to write this:

Most people are acutely aware of AI but powerless. Only a tiny fraction of the population has influence over the AI trajectory. These are the employees at AI companies and some people in the government and military. Almost everyone else does not have the

...

(Continue Reading – 2272 more words)

Cole Wyeth25m20

Action item: comment on a recent LessWrong or Alignment Forum post on AI safety or write a blog post on AI safety.

This is not generically a good idea. We don't need a bunch of comments on lesswrong posts from new users. It's fine, if there's a specific reason for it (obviously I endorse commenting on lesswrong posts in some cases).

The Dead Planet Theory

arealsociety

26m

This is a linkpost for https://open.substack.com/pub/arealsociety/p/the-dead-planet-theory

Hi, this is my first post on LessWrong but I have been in rationalist adjacent circles for the last three or four years thanks to Twitter. Like many others here I read HPMOR in high school and thought it was fascinating. I was heavily into forum culture growing up but it was focused on competitive gaming scenes, even going pro on World of Tanks. I spent over six years working in offensive cybersecurity, help manage my family farm and ranch, and am currently attending law school after I got sucked into the field because I thought the LSAT was a fun test.

Through Twitter I learned more about rationalism as a community rather than a concept, and met several people in person through meetups and one-on-one meetings. I...

(See More – 104 more words)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Hoenir's Shortform

Hoenir

26m

Hoenir14h10

Recent studies show AI can talk conspiracy theorists out of their beliefs not through emotional manipulation or playing on people’s needs, but through gentle rational conversations. It’s supposed to be reassuring but can’t locally rational conversation be equally manipulative?

AI systems are getting quite good at setting up "epistemic arcs" – they know exactly how much challenge will keep you curious without overwhelming you. They can guide you step by step toward an "aha!" moment that feels earned rather than given. They start to do this despite not (yet) ... (read more)

Six Thoughts on AI Safety

boazbarak

1mo

[Crossposted from windowsontheory]

The following statements seem to be both important for AI safety and are not widely agreed upon. These are my opinions, not those of my employer or colleagues. As is true for anything involving AI, there is significant uncertainty about everything written below. However, for readability, I present these points in their strongest form, without hedges and caveats. That said, it is essential not to be dogmatic, and I am open to changing my mind based on evidence. None of these points are novel; others have advanced similar arguments. I am sure that for each statement below, there will be people who find it obvious and people who find it obviously false.

AI safety will not be solved on its own.
An “AI scientist” will not solve

...

(Continue Reading – 4355 more words)

maxnadeau42m10

On point 6, "Humanity can survive an unaligned superintelligence": In this section, I initially took you to be making a somewhat narrow point about humanity's safety if we develop aligned superintelligence and humanity + the aligned superintelligence has enough resources to out-innovate and out-prepare a misaligned superintelligence. But I can't tell if you think this conditional will be true, i.e. whether you think the existential risk to humanity from AI is low due to this argument. I infer from this tweet of yours that AI "kill[ing] us all" is not among... (read more)

Anthropic’s Recommendations to OSTP for the U.S. AI Action Plan

UnofficialLinkpostBot

This is a linkpost for https://www.anthropic.com/news/anthropic-s-recommendations-ostp-u-s-ai-action-plan

Note: This is an automated crosspost from Anthropic. The bot selects content from many AI safety-relevant sources. Not affiliated with the authors or their organization and not affiliated with LW.

A hand-drawn image of a government building

In response to the White House’s Request for Information on an AI Action Plan, Anthropic has submitted recommendations to the Office of Science and Technology Policy (OSTP). Our recommendations are designed to better prepare America to capture the economic benefits and national security implications of powerful AI systems.

As our CEO Dario Amodei writes in ‘Machines of Loving Grace’, we expect powerful AI systems will emerge in late 2026 or early 2027. Powerful AI systems will have the following properties:

Intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering.
The ability

...

(See More – 449 more words)

Peter Wildeford1h52

Here's my summary of the recommendations:

National security testing
- Develop robust government capabilities to evaluate AI models (foreign and domestic) for security risks
- Once ASL-3 is reached, government should mandate pre-deployment testing
- Preserve the AI Safety Institute in the Department of Commerce to advance third-party testing
- Direct NIST to develop comprehensive national security evaluations in partnership with frontier AI developers
- Build classified and unclassified computing infrastructure for testing powerful AI systems
- Assemble interdisciplinary team

... (read more)