Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.
Agreed. If I'm talking to someone who I expect to be able to recalibrate, I just explain that I think the standard norms are dumb, the norms I actually follow, and then give an honest and balanced assessment. If I'm talking to someone I don't really know, I generally give a positive but not very detailed reference or don't reply, depending on context.
I’m considering translating my work into English to share it with the LessWrong community, but I’d like to first ask if it aligns with the community's interests and could be valuable. Below is a summary of the work to help evaluate its relevance:
We explore the specific causal mechanisms linking humor recognition to learning outcomes, including the computational and neurological pathways involved.
This study began with a practical goal: to evaluate the use of humor as a pedagogical tool in Cardiopulmonary Resuscitation (CPR) courses through a randomized trial. However, the lack of clear criteria to define and operationalize "humor" in educational contexts led us to explore its conceptual foundations. Initially, we adopted Clarke's formula, which describes humor as "a pleasant...
Scheming AIs may have secrets that are salient to them, such as:
Extracting these secrets would help reduce AI risk, but how do you do that? One hope is that you can do fuzzing of LLMs,[1] e.g. by adding noise to LLM weights or activations.
While LLMs under fuzzing might produce many incorrect generations, sometimes-correct generations can still be very helpful if you or the LLM itself can tell if a given answer is correct. But it’s still unclear if this works at all: there are probably some intermediate activations that would result in an LLM telling you the secret, but...
By doing more search around promising vectors found with random search or MELBO, you could get more powerful vectors, and that could be useful for unlocking / fuzzing-adversarial-training. It's unclear if that would be more effective than just fine-tuning the model on the generation from the best random vectors, but it would be worth trying.
For interp, I don't know what interp metric you want to optimize. Vector norm is a really bad metric: effective MELBO vectors have a much smaller norm, but qualitatively I find their results are sometimes much more erra...
I didn’t want to write this.
Not for lack of courage—I’d meme-storm Putin’s Instagram if given half a chance. But why?
Okay, maybe a little lack of courage.
And yet, something can be extracted from all this madness, right?
Then comes someone named Gwern. He completely ignores my thesis and simply asks:
"Tell military firefighter stories."
My first instinct was to dismiss him as an oddball—until a friend told me I was dealing with a legend of rationality. I have to admit: I nearly shit myself. His comment got more likes than the post I’d spent years working on.
Someone with,...
A new paper by Yoshua Bengio and the Safe Artificial Intelligence For Humanity (SAIFH) team argues that the current push towards building generalist AI agents presents catastrophic risks, creating a need for more caution and an alternative approach. We propose such an approach in the form of Scientist AI, a non-agentic AI system that aims to be the foundation for safe superintelligence. (Note that this paper is intended for a broad audience, including readers unfamiliar with AI safety.)
...The leading AI companies are increasingly focused on building generalist AI agents—systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by
Is this possibly a "Chinese room" kind of situation? The model alone is not an agent, but "the model + the way it is used" might be...
And to be more precise, I don't mean things like "the model could be used by an agent", because obviously yes; but more like "the model + a way of using it that we also separately wouldn't call an agent" could be.
That's not something unique to homeostatic agents, though. If a model-based maximizer has some gap between its model and the real world, that gap can be exploited by another agent for its own gain, and that's game over for the maximizer.
I don't think of my argument as model-based vs heuristic-reactive, I mean it as unbounded vs bounded. Like you could imagine making a giant stack of heuristics that makes it de-facto act like an unbounded consequentialist, and you'd have a similar problem. Model-based agents only become relevant because they seem like an ea...
it became immediately clear that the sequences wouldn't work nearly as well for me if I didn't like Eliezer
You mean, like him as a blogger? Or as a person in real life?
If the former, isn't causality the other way round? I mean, I like Eliezer as a blogger because he wrote the Sequences. So it would sound weird to me to say: "I admire Eliezer as a blogger a lot because he wrote some amazing articles on rationality... and Girard's theory predicts that therefore I will like his articles... which is true!"
(We could nitpick that some things that I like about El...
I've spent enough time staring at LLM chain-of-thoughts now that when I started thinking about a thing for work, I found my thoughts taking the shape of an LLM thinking about how to approach its problem. And that actually felt like a useful systematic way of approaching the problem, so I started writing out that chain of thought like I was an LLM, and that felt valuable in helping me stay focused.
Of course, I had to amuse myself by starting the chain-of-thought with "The user has asked me to..."