Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

37DanielFilan

As far as I can tell, this post successfully communicates a cluster of claims relating to "Looking, insight meditation, and enlightenment". It's written in a quite readable style that uses a minimum of metaphorical language or Buddhist jargon. That being said, likely due to its focus as exposition and not persuasion, it contains and relies on several claims that are not supported in the text, such as: * Many forms of meditation successfully train cognitive defusion. * Meditation trains the ability to have true insights into the mental causes of mental processes. * "Usually, most of us are - on some implicit level - operating off a belief that we need to experience pleasant feelings and need to avoid experiencing unpleasant feelings." * Flinching away from thoughts of painful experiences is what causes suffering, not the thoughts of painful experiences themselves, nor the actual painful experiences. * Impermanence, unsatisfactoriness, and no-self are fundamental aspects of existence that "deep parts of our minds" are wrong about. I think that all of these are worth doubting without further evidence, and I think that some of them are in fact wrong. If this post were coupled with others that substantiated the models that it explains, I think that that would be worthy of inclusion in a 'Best of LW 2018' collection. However, my tentative guess is that Buddhist psychology is not an important enough set of claims that a clear explanation of it deserves to be signal-boosted in such a collection. That being said, I could see myself being wrong about that.

14Kaj_Sotala

I still broadly agree with everything that I said in this post. I do feel that it is a little imprecise, in that I now have much more detailed and gears-y models for many of its claims. However, elaborating on those would require an entirely new post (one which I currently working on) with a sequence's worth of prerequisites. So if I were to edit this post, I would probably mostly leave it as it is, but include a pointer to the new post once it's finished. In terms of this post being included in a book, it is worth noting that the post situates itself in the context of Valentine's Kensho post, which has not been nominated for the review and thus wouldn't be included in the book. So if this post were to be included, I should probably edit this so as to not require reading Kensho.

Customize

448Welcome to LessWrong!

Ruby, Raemon, RobertM, habryka

Judgements: Merging Prediction & Evidence

abramdemski

515

How to Make Superbabies

GeneSmith, kman

254

15Open Thread Spring 2025

Ben Pace

252Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley, Owain_Evans

118Statistical Challenges with Making Super IQ babies

Jan Christian Refsgaard

106Methods for strong human germline engineering

TsviBT

118Self-fulfilling misalignment data might be poisoning our AI models

TurnTrout

377How AI Takeover Might Happen in 2 Years

joshc

11d

115

260Arbital has been imported to LessWrong

RobertM, jimrandomh, Ben Pace, Ruby

13d

61The Milton Friedman Model of Policy Change

JohnofCharleston

13h

293Murder plots are infohazards

Chris Monteiro

19d

272So You Want To Make Marginal Progress...

johnswentworth

482Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

2mo

205A History of the Future, 2025-2040

L Rudolf L

15d

Quick Takes

faul_sname5h187

Shameful admission: after well over a decade on this site, I still don't really intuitively grok why I should expect agents to become better approximated by "single-minded pursuit of a top-level goal" as they gain more capabilities. Yes, some behaviors like getting resources and staying alive are useful in many situations, but that's not what I'm talking about. I'm talking about specifically the pressures that are supposed to inevitably push agents into the former of the following two main types of decision-making: 1. Unbounded consequentialist maximization: The agent has one big goal that doesn't care about its environment. "I must make more paperclips forever, so I can't let anyone stop me, so I need power, so I need factories, so I need money, so I'll write articles with affiliate links." It's a long chain of "so" statements from now until the end of time. 2. Homeostatic agent: The agent has multiple drives that turn on when needed to keep things balanced. "Water getting low: better get more. Need money for water: better earn some. Can write articles to make money." Each drive turns on, gets what it needs, and turns off without some ultimate cosmic purpose. Both types show goal-directed behavior. But if you offered me a choice of which type of agent I'd rather work with, I'd choose the second type in a heartbeat. The homeostatic agent may betray me, but it will only do that if doing so satisfies one of its drives. This doesn't mean homeostatic agents never betray allies - they certainly might if their current drive state incentivizes it (or if for some reason they have a "betray the vulnerable" drive). But the key difference is predictability. I can reasonably anticipate when a homeostatic agent might work against me: when I'm standing between it and water when it's thirsty, or when it has a temporary resource shortage. These situations are concrete and contextual. With unbounded consequentialists, the betrayal calculation extends across the entire future l

Buck18hΩ25412

Alignment Forum readers might be interested in this:

Daniel Kokotajlo17h2810

Claude has been playing pokemon for the last few days. It's still playing, live on twitch. You can go watch alongside hundreds of other people. It's fun. What updates should I make about AGI timelines from Claude's performance? Let's think step by step. First, it's cool that Claude can do this at all. The game keeps track of "Step count" and Claude is over 30,000 already; I think that means 30,000 actions (e.g. pressing the A button). For each action there is about a paragraph of thinking tokens Claude produces, in order to decide what to do. Any way you slice it this is medium-horizon agency at least -- claude is operating fully autonomously, in pursuit of goals, for a few days. Does this mean long-horizon agency is not so difficult to train after all? Not so fast. Pokemon is probably an especially easy environment, and Claude is still making basic mistakes even so. In particular, Pokemon seems to have a relatively linear world where there's a clear story/path to progress along, and moreover Claude's pretraining probably teaches it the whole story + lots of tips & tricks for how to complete it. In D&D terms the story is running on rails. I think I would have predicted in advance that this dimension of difficulty would matter, but also I feel validated by Claude's performance -- it seems that Claude is doing fine at Pokemon overall, except that Claude keeps getting stuck/lost wandering around in various places. It can't seem to keep a good memory of what it's already tried / where it's already been, and so it keeps going in circles, until eventually it gets lucky and stumbles to the exit. A more challenging video game would be something open-ended and less-present-in-training-data like Dwarf Fortress. On the other hand, maybe this is less a fundamental limitation Claude has and more a problem with its prompt/scaffold? Because it has a limited context window it has to regularly compress it by e.g. summarizing / writing 'notes to self' and then deleting the re

leogao1d3328

my referral/vouching policy is i try my best to completely decouple my estimate of technical competence from how close a friend someone is. i have very good friends i would not write referrals for and i have written referrals for people i basically only know in a professional context. if i feel like it's impossible for me to disentangle, i will defer to someone i trust and have them make the decision. this leads to some awkward conversations, but if someone doesn't want to be friends with me because it won't lead to a referral, i don't want to be friends with them either.

Alexander Gietelink Oldenziel3d*59-2

Why Do the French Dominate Mathematics? France has an outsized influence in the world of mathematics despite having significantly fewer resources than countries like the United States. With approximately 1/6th of the US population and 1/10th of its GDP, and French being less widely spoken than English, France's mathematical achievements are remarkable. This dominance might surprise those outside the field. Looking at prestigious recognitions, France has won 13 Fields Medals compared to the United States' 15 a nearly equal achievement despite the vast difference in population and resources. Other European nations lag significantly behind, with the UK having 8, Russia/Soviet Union 6/9, and Germany 2. France's mathematicians are similarly overrepresented in other mathematics prizes and honors, confirming this is not merely a statistical anomaly. I believe two key factors explain France's exceptional performance in mathematics while remaining relatively average in other scientific disciplines: 1. The "Classes Préparatoires" and "Grandes Écoles" System The French educational system differs significantly from others through its unique "classes préparatoires" (preparatory classes) and "grandes écoles" (elite higher education institutions). After completing high school, talented students enter these intensive two-year preparatory programs before applying to the grandes écoles. Selection is rigorously meritocratic, based on performance in centralized competitive examinations (concours). This system effectively postpones specialization until age 20 rather than 18, allowing for deeper mathematical development during a critical cognitive period. The École Normale Supérieure (ENS) stands out as the most prestigious institution for mathematics in France. An overwhelming majority of France's top mathematicians—including most Fields Medalists—are alumni of the ENS. The school provides an ideal environment for mathematical talent to flourish with small class sizes, close me

Popular Comments

Recent Discussion

sarahconstantin's Shortform

sarahconstantin

5mo

Viliam3m20

/the-political-is-personal/

It seems like many people propose "generalization from their own example" as a model for the entire humanity. And it can be quite annoying when people around you agree on a model that doesn't fit you at all... and when you point it out, they dismiss it by saying that you are in a denial. Because they have examined their own minds deeply, and found out that it was true... yeah, possibly so, but that doesn't necessarily make it true about the others.

everyone likes whatever popular people around them like -- no I don't
if we legalize

... (read more)

On Writing #1

Zvi

18m

This isn’t primarily about how I write. It’s about how other people write, and what advice they give on how to write, and how I react to and relate to that advice.

I’ve been collecting those notes for a while. I figured I would share.

At some point in the future, I’ll talk more about my own process – my guess is that what I do very much wouldn’t work for most people, but would be excellent for some.

...

(Continue Reading – 4285 more words)

Self's Shortform

Self

1mo

3Self2h

They do, but the explanation proposed here matches everything I know most exactly and simply. E.g. it became immediately clear that the sequences wouldn't work nearly as well for me if I didn't like Eliezer Or the way fashion models are of course not selected for attractiveness but for more mimetic-copying-inducing highstatus traits like height/confidence/presence/authenticity and others And yeah not all of the Claude examples are good, I hadn't cherrypicked

2Viliam2h

You mean, like him as a blogger? Or as a person in real life? If the former, isn't causality the other way round? I mean, I like Eliezer as a blogger because he wrote the Sequences. So it would sound weird to me to say: "I admire Eliezer as a blogger a lot because he wrote some amazing articles on rationality... and Girard's theory predicts that therefore I will like his articles... which is true!" (We could nitpick that some things that I like about Eliezer's style are orthogonal to whether his points about rationality are true, but that already has a name: halo effect.) I am not trying to contradict your experience, but it seems to me that my experience (with the Sequences) does not match this model at all. Or other things that I think about. My friends used to play Magic the Gathering cards, this has never appealed to me. I liked sci-fi, but I was reading sci-fi books long before I have met another person who did. I learned Esperanto from a textbook long before I met another Esperanto speaker. My wife loves skiing and opera, that has no effect on me. Seems like I am quite resistant to copying others. (Is that a part of being on the autistic spectrum? Maybe I should file Girard's theory under "this is what normies do"; no offense meant.)

Self19m10

Aspies certainly seem to do this less!

You mean, like him as a blogger? Or as a person in real life?

The latter? Like, I subconsciously parse his blogging voice not unlike as if it were a person in my tribal surroundings, and I like/admire/relate to that virtual person, and I think this is what causes some extra persuasion

I mean yes it's embarrassing, but it's what I see in myself and what seems to be most consistent with what everyone else is doing, certainly more consistent than what they claim they're doing.

E.g. it seems rare for someone who activel... (read more)

1Self2h

More thoughts that may or may not be directly relevant * What's missing from my definition is that deception happens solely via "stepping in front of the camera", i.e. via the regular sensory channels of the deceived optimizer, ie brainwashing or directly modifying memory is not deception * From this follows to deceive is to either cause a false pattern recognition or to prevent a correct one, and for this you indeed need familiarity with the victim's perceptual categories I'd like to say more re: hostile telepaths or other deception frameworks but am unsure what your working models are

Annapurna's Shortform

Annapurna

1mo

Viliam24m20

Learn the official language of the place you are migrating to.

Yes, this sounds completely obvious to me.

Of course, learning languages takes time, and may be more difficult for older people. So I wouldn't expect fluent speech from the start, and maybe from the older generation even in a year or two -- just a gesture of trying. The important thing is that they do not isolate their kids and themselves from the local society behind the language barrier. Become bilingual.

Heck, if I had to emigrate somewhere, I would want my kids to speak the local language, bec... (read more)

2Viliam4h

Unless there were similar known examples in OpenAI prompts, this doesn't sound plausible at all.

leogao's Shortform

leogao

Ω 33y

Neel Nanda1h20

Agreed. If I'm talking to someone who I expect to be able to recalibrate, I just explain that I think the standard norms are dumb, the norms I actually follow, and then give an honest and balanced assessment. If I'm talking to someone I don't really know, I generally give a positive but not very detailed reference or don't reply, depending on context.

2Viliam5h

A synthesis between the structural forces theory and "pulling the rope sideways". The economical and other forces determine the main direction, a leader who already wanted to go in that direction gets elected and starts going in that direction, his idiosyncratic whims get implemented as a side effect. Like, instead of Hitler, there would be another German leader determined to change the post-WW1 world order, but he would probably be less obsessed about the Jews. Also, he might make different alliances.

2Viliam5h

Some games do put their finger on the scale, for example you have a first-person shooter where you learn to aim better but you also now have a gun that deals 200 damage per hit, as opposed to your starting gun that dealt 10. But puzzle-solving games are usually fair, I think.

Richard_Kennaway's Shortform

Richard_Kennaway

Richard_Kennaway1h20

Beyond HaHa: Mapping the Causal Chain from Jokes to Knowledge

Summary

We explore the specific causal mechanisms linking humor recognition to learning outcomes, including the computational and neurological pathways involved.

This study began with a practical goal: to evaluate the use of humor as a pedagogical tool in Cardiopulmonary Resuscitation (CPR) courses through a randomized trial. However, the lack of clear criteria to define and operationalize "humor" in educational contexts led us to explore its conceptual foundations. Initially, we adopted Clarke's formula, which describes humor as "a pleasant...

(See More – 126 more words)

P. João1h10

here: https://www.lesswrong.com/posts/5XznvCufF5LK4d2Db/the-semi-rational-wildfirefighter

Fuzzing LLMs sometimes makes them reveal their secrets

Fabien Roger

Ω 316d

Scheming AIs may have secrets that are salient to them, such as:

What their misaligned goal is;
What their takeover plan is and what coordination signals they use to collude with other AIs (if they have one);
What good behavior looks like on a task they sandbag.

Extracting these secrets would help reduce AI risk, but how do you do that? One hope is that you can do fuzzing of LLMs,^[1] e.g. by adding noise to LLM weights or activations.

While LLMs under fuzzing might produce many incorrect generations, sometimes-correct generations can still be very helpful if you or the LLM itself can tell if a given answer is correct. But it’s still unclear if this works at all: there are probably some intermediate activations that would result in an LLM telling you the secret, but...

(Continue Reading – 2672 more words)

Fabien Roger1h20

By doing more search around promising vectors found with random search or MELBO, you could get more powerful vectors, and that could be useful for unlocking / fuzzing-adversarial-training. It's unclear if that would be more effective than just fine-tuning the model on the generation from the best random vectors, but it would be worth trying.

For interp, I don't know what interp metric you want to optimize. Vector norm is a really bad metric: effective MELBO vectors have a much smaller norm, but qualitatively I find their results are sometimes much more erra... (read more)

The Semi-Rational Wildfirefighter

P. João

LessWrong Context:

I didn’t want to write this.

Not for lack of courage—I’d meme-storm Putin’s Instagram if given half a chance. But why?

Too personal.
My stories are tropical chaos: I survived the Brazilian BOPE (think Marine Corps training, but post-COVID).
I’m dyslexic, writing in English (a crime against Grice).
This is LessWrong, not some Deep Web Reddit thread.

Okay, maybe a little lack of courage.

And yet, something can be extracted from all this madness, right?

Then comes someone named Gwern. He completely ignores my thesis and simply asks:
"Tell military firefighter stories."

My first instinct was to dismiss him as an oddball—until a friend told me I was dealing with a legend of rationality. I have to admit: I nearly shit myself. His comment got more likes than the post I’d spent years working on.

Someone with,...

(See More – 356 more words)

LESSWRONG
LW