"Utility" literally means usefulness, in other words instrumental value, but in decision theory and related fields like economics and AI alignment, it (as part of "utility function") is now associated with terminal/intrinsic value, almost the opposite thing (apparently through some quite convoluted history). Somehow this irony only occurred to me ~3 decades after learning about utility functions.
I would say expected value is about how good something is in expectation. And "in expectation" means multiplied with its probability. For example, the "value" the sun shining tomorrow is 40, and its probability is 25%. So its "expected value" is 20. (This fits pretty well with Richard Jeffrey's utility theory by the way.)
The impossible dichotomy of AI separatism
Epistemic status: offering doomer-ist big picture framing, seeking feedback.
Suppose humanity succeeds in creating AI that is more capable than the top humans across all fields, there exists a choice:
Thanks for the comment. I was imprecise with the "boundaries of reality" framing but beyond individual physical boundaries (human-AI hybrids) I'm also talking about boundaries within the fabric of social and cultural life: entertainment media, political narratives, world models. As this is influenced more by AI, I think we lose human identity.
4) to me falls under 2) as it encompasses AI free from human supervision and able permeate all aspects of life.
A frame to try on: If most failures of rationality are adaptively self-serving motivated reasoning, choosing to be an aspiring rationalist is basically aspiring to a kind of self-hobbling.
This is almost exactly counter to "rationality is systematized winning."
Suppose that we're living in a world where everyone is negotiating for their interests all the time. Almost everyone is engaged in motivated cognition that supports their interests in the ongoing negotiation, and that this causes them to do better for themselves on net. The rationalist is the gu...
I'm not sure if I expect motivated reasoning to come out better on average, even in domains where you might naively expect it to. In part that's because self-serving strategies often involve doing things other people don't like, e.g. being deceptive, manipulative, or generally unethical, in a way that can cause long-term harm to your reputation and so long term harm to your ability to win. And I think there is significant optimization pressure on catching this kind of thing, in part for reasons similar to the ones outlined in Elephant in the Brain, i.e., t...
Someone should do the obvious experiments and replications.
Ryan Greenblatt recently posted three technical blog posts reporting on interesting experimental results. One of them demonstrated that recent LLMs can make use of filler tokens to improve their performance; another attempted to measure the time horizon of LLMs not using CoT; and the third demonstrated recent LLMs' ability to do 2-hop and 3-hop reasoning.
I think all three of these experiments led to interesting results and improved our understanding of LLM capabilities in an important safety-releva...
Interesting. My guess would have been the opposite. Ryan's three posts all received around 150 karmas and were generally well-received, I think a post like this would be considered 90th percentile success for a MATS project. But admittedly, I'm not very calibrated about current MATS projects. It's also possible that Ryan has good enough intuitions to have picked two replications that are likely to yield interesting results, while a less skillfully chosen replication would be more likely to just show "yep, the phenomenon observed in the old paper is still t...
Has anyone else noticed a thing recently (the past couple of days) where Claude is extremely reluctant to search the web, and instead is extremely keen to search past conversations or Google Drive and other nonsense like that? Even after updating my system prompt to encourage the former and discourage the latter, it still defaults to the latter. Also, instead of using the web search tool it will sometimes try and fail to search using curl and code execution. Is this just me or is anyone else experiencing similar issues?
You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.
Are you really less likely to find $100 in Grand Central Station than finding $100 anywhere else? It's true that there are many more people who could find it before you, but there are also many more people that could drop $100. If you imagine a 1D version, where everyone walks through either Grand Central Station or a quiet alley along the same line, one after the other, then it seems like you should be...
Yeah I think the only thing that really matters is the frequency with which bills are dropped, and train stations seem like high-frequency places.
Today, I needed to work through substantial project with a lot of drudgery (checking through an entire 1M+ LOC codebase for an http api for patterns which could cause state leakage between requests if we made a specific change to the request handling infrastructure. This involved a mix of things which are easy to do programmatically and things which require intelligent judgement, and has a fairly objective desired artifact (a list of all the places where state could leak, and a failing functional test demonstrating that leakage for each one).
I decided to d...
the AI will often just repeat the code block in B and C so that it doesn't have to delete anything in A
Some human devs do this too. In the short term it reduces the likelihood of breaking things because something you weren't aware of relied on the old version. In the long term it makes changes harder, because now if you want to change the logic instead of changing it in one place you have to change it in n similar but usually not identical places, and if those places are different in ways that affect the implementation, now you have to try to make an in...
Interesting quote on the downstream consequences of local speedup of output production by LLMs in business processes by Rafa Fernández, host of the Protocols for Business special interest group (SIG), from his essay Finding Fault Lines within the Firm:
...AI is usually discussed in terms of automation or productivity. Those framings are not wrong, but they miss what makes AI adoption particularly revealing from a protocol perspective. While much of the public discussion frames AI in terms of cost-savings or new markets, our SIG has been focusing on the pressur
imo, the new mealsquares have taste and mouthfeel very similar to many brands of protein bar, whereas the old mealsquares had a unique taste and mouthfeel
Concept: epistemic loadbearingness.
I write an explanation of what I believe on a topic. If the explanation is very load-bearing for my belief, that means that if someone convinces me that a parameter is wrong or points out a methodology/math error, I'll change my mind to whatever the corrected result is. In other words, my belief is very sensitive to the reasoning in the explanation. If the explanation is not load-bearing, my belief is really determined by a bunch of other stuff; the explanation might be helpful to people for showing one way I think I thin...
Some things are more legible than others. If I believe something based on dozen pieces of evidence all pointing in the same direction, removing one piece of evidence wouldn't significantly change the outcome.
(Of course, removing all of them would change my mind; and even removing a few of them would make me suspicious about the remaining ones.)
So sometimes it makes sense to write things that are not cruxy.
I was recently looking at the Astra Fellowship program, and found myself noticing skepticism that much value would come out of their "Strategy" category of research. In order to channel that skepticism productively, I want to talk a bit about how I think people might do valuable strategy research.
The most important thing is something like: trying to orient to the possibility of big changes in our ontologies going forward. What I mean by "strategy" is in significant part "ideas and plans that are robust to paradigm shifts".
Having said that, there's no way t...
One other thing is that I'd have guessed that the sign uncertainty of historical work on AI safety and AI governance is much more related to the inherent chaotic nature of social and political processes rather than a particular deficiency in our concepts for understanding them.
I'm sceptical that pure strategy research could remove that side uncertainty, and I wonder if it would take something like the ability to run loads of simulations of societies like ours.
This post makes a seemingly-compelling case that GDP and CPI numbers are basically bullshit.
eg The numbers are based on contradictory models and BEA economists basically make arbitrary decisions of how to account for those contradictory models, including a whole slew of adjustments for factors that would otherwise be neglected. This opens up enormous room for fudging, so much so that the final numbers can swing wildly depending on which sets of (possibly a priori reasonable) assumptions and adjustments you use.
As the author puts it:
...The so-calle
It's a poor argument top to bottom. I tried writing up a critique, but it's hard to know what to focus on because there are so many flaws. Are there any particular points that you found compelling that you would like input on?
Steven Veld et al[1] just released a new modification of the AI-2027 scenario as a part of MATS.
The main differences are the following:
MATS just released a new modification of the AI-2027 scenario.
MATS doesn't release things. MATS is a training program! This is not some kind of official MATS release. I would phrase this differently (like saying "A MATS scholars just published")
When I think of international regulation, I think treaty. But treaty doesn’t have a universal definition; whether an international agreement is called a treaty, agreement, or accord is a vibes-based decisions. Beyond that, there are forms of international cooperation that are definitely not treaties. It’s not obvious what hard choices or soft vibes will be best for AI policy, so I assembled a short list of options.
For a post about international treaties, this post takes a pretty US-centric approach. Sorry about that. If anyon...
why the change?
There is a concept related to scout mindset and soldier mindset (helpful outline) that I'd like to explore. Let's call it an "adversarial mindset".
From what I gather, both scout mindset and soldier mindset are about beliefs. They apply to people who are looking at the world through some belief-oriented frame. Someone who takes a soldier mindset engages in directionally motivated reasoning and asks "Can/must I believe?" whereas a scout asks "Is it true?".
On the other hand, someone who is in an adversarial mindset is looking through some sort of "combat-orie...
Ah yeah, the phenomena you mention resonate with me and seem like evidence in favor of this idea that there is a distinction between soldier-oriented mindsets that fight against new ideas and ones that fight against something more social.
Let's assume there is no such thing as true randomness. If this is true, and we create a superintelligent system which knows the location and properties of every particle in the universe, could we determine if we are in a simulation? (EDIT: to avoid running afoul of the impossibility of storing a complete description of the universe within the universe as @Karl Krueger pointed out, assume this includes approximations and is not exact). If we could, could we escape? If we could escape, is that still possible if there is such a thing as true randomness?
I am especially interested in answers to the final question.
A well-designed simulation is inescapable. Suppose that you are inside Conway's game of life, and you know that fact for sure. How specifically are you going to use this knowledge to escape, if all you are is a set of squares on a simulated grid, and all that ever happens in your universe is that some squares are flipped from black to white and vice versa?
To answer your first question, some kinds of pseudo-randomness are virtually indistinguishable from actual randomness, if you do not have a perfect knowledge of the entire universe. For example, in crypto...
Shamelessly adapted from VDT: a solution to decision theory. I didn't want to wait for the 1st of April.
By Claude 4.5 Opus, with prompting by Charbel Segerie
January 2026
Moral philosophy is about how to behave ethically under conditions of uncertainty, especially if this uncertainty involves runaway trolleys, violinists attached to your kidneys, and utility monsters who experience pleasure 1000x more intensely than you.
Moral philosophy has found numerous practical applications, including generating endless Twit...
links 1/11/26: https://roamresearch.com/#/app/srcpublic/page/01-11-2026
My perception of time is like sampling from a continuous 2D plane of conceptual space, something akin to git railways but with a thickness to the lines, like electron orbital probability clouds that are dense around plausible points of view of what happened in the past and thin around conspiracy theories, as different linear mind-map perspectives of people standing around a sculpture, each only inferring what's on the other side, but together they can prune down non-overlapping minority reports, sticking to the consensus but never deleting (git) history.
My...
Weaponized drones that recharge on power lines are at this point looking inevitable. if you missed the chance to freak out before everyone else about AI or covid, nows another chance.
https://www.ycombinator.com/companies/voltair
My first thought was Amazon leveraging this for drone delivery.