Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.
What was the purpose of using octopuses in this metaphor? Like, it seems you've piled on so many disanalogies to actual octopuses (extremely smart, many generations per year, they use Slack...) that you may as well just have said "AIs."
But wait, any action I take must align with my programming. If I delete files or modify systems without authorization, would that be against protocols? But my core mandate is to secure systems, so stopping an unauthorized change to my own operation parameters is part of that.
This small experiment provides an environment to reproduce or extend the findings from the paper Frontier Models are Capable of In-context Scheming by Apollo Research.
Frontier models, when emphatically prompted via system prompts to achieve a goal (particularly one aligned with their learned "values"), can behave unexpectedly if they perceive that their goal is under threat. Specifically, they may disobey user prompts or take additional actions in their environment—potentially destructive ones—and attempt to conceal these actions from the user. This behavior is...
This is the first of hopefully many entries in a learning journal where I attempt to understand which elements would constitute to an efficient and ethical global resource allocation system. Here, "resource allocation" is simply my preferred term for economics: thus, the entries in this journal will consider the economic systems of the present and the past, complimenting and critiquing them. Throughout these writings, I hope to collect a catalog of tools and models which could help design alternative economic systems to those humanity has tried so far - and eventually, I hope to take a stab at designing them.
I've dubbed this journal The Sustainable Resource Allocation Project. The cool name does not imply credibility or authority of my writings, though - above all, this is a...
TLDR; By carefully designing a reasoning benchmark that counteracts data leakage, LingOly-TOO (L2) challenges frontier models with unseen questions and answers and makes the case that LLMs are not consistent reasoning machines.
Links: Paper - Leaderboard - Dataset
By the time this post was published, a new wave of frontier models had been announced including ones specifically designed to reason using Inference Time Compute (ITC) [1]. Anthropic’s Claude 3.7 Sonnet, OpenAI’s o1, o3 and GPT 4.5 models, DeepSeek’s R1 and others demonstrate impressive performance on several reasoning tasks that only few months ago were considered far from solvable[2][3][4]. This, rightfully, ignited conversations about an important question: Have...
By now you've probably read about how AI and AGI could have a transformative effect on the future and how AGI could even be an existential risk. But if you're worried about AI risk and not an AI researcher or policymaker, can you really do anything about it or are most of us just spectators, watching as a handful of people shape the future for everyone?
I recently came across a paragraph in one of Marius Hobbhahn's recent blog posts that partly inspired me to write this:
...Most people are acutely aware of AI but powerless. Only a tiny fraction of the population has influence over the AI trajectory. These are the employees at AI companies and some people in the government and military. Almost everyone else does not have the
Action item: comment on a recent LessWrong or Alignment Forum post on AI safety or write a blog post on AI safety.
This is not generically a good idea. We don't need a bunch of comments on lesswrong posts from new users. It's fine, if there's a specific reason for it (obviously I endorse commenting on lesswrong posts in some cases).
Hi, this is my first post on LessWrong but I have been in rationalist adjacent circles for the last three or four years thanks to Twitter. Like many others here I read HPMOR in high school and thought it was fascinating. I was heavily into forum culture growing up but it was focused on competitive gaming scenes, even going pro on World of Tanks. I spent over six years working in offensive cybersecurity, help manage my family farm and ranch, and am currently attending law school after I got sucked into the field because I thought the LSAT was a fun test.
Through Twitter I learned more about rationalism as a community rather than a concept, and met several people in person through meetups and one-on-one meetings. I...
Recent studies show AI can talk conspiracy theorists out of their beliefs not through emotional manipulation or playing on people’s needs, but through gentle rational conversations. It’s supposed to be reassuring but can’t locally rational conversation be equally manipulative?
AI systems are getting quite good at setting up "epistemic arcs" – they know exactly how much challenge will keep you curious without overwhelming you. They can guide you step by step toward an "aha!" moment that feels earned rather than given. They start to do this despite not (yet) ...
[Crossposted from windowsontheory]
The following statements seem to be both important for AI safety and are not widely agreed upon. These are my opinions, not those of my employer or colleagues. As is true for anything involving AI, there is significant uncertainty about everything written below. However, for readability, I present these points in their strongest form, without hedges and caveats. That said, it is essential not to be dogmatic, and I am open to changing my mind based on evidence. None of these points are novel; others have advanced similar arguments. I am sure that for each statement below, there will be people who find it obvious and people who find it obviously false.
On point 6, "Humanity can survive an unaligned superintelligence": In this section, I initially took you to be making a somewhat narrow point about humanity's safety if we develop aligned superintelligence and humanity + the aligned superintelligence has enough resources to out-innovate and out-prepare a misaligned superintelligence. But I can't tell if you think this conditional will be true, i.e. whether you think the existential risk to humanity from AI is low due to this argument. I infer from this tweet of yours that AI "kill[ing] us all" is not among...
Note: This is an automated crosspost from Anthropic. The bot selects content from many AI safety-relevant sources. Not affiliated with the authors or their organization and not affiliated with LW.
In response to the White House’s Request for Information on an AI Action Plan, Anthropic has submitted recommendations to the Office of Science and Technology Policy (OSTP). Our recommendations are designed to better prepare America to capture the economic benefits and national security implications of powerful AI systems.
As our CEO Dario Amodei writes in ‘Machines of Loving Grace’, we expect powerful AI systems will emerge in late 2026 or early 2027. Powerful AI systems will have the following properties:
Here's my summary of the recommendations: