Lightcone Infrastructure FundraiserGoal 2:$1,709,691 of $2,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
This is an attempt to compile all publicly available primary evidence relating to the recent death of Suchir Balaji, an OpenAI whistleblower. This is a tragic loss and I feel very sorry for the parents. The rest of this piece will be unemotive as it is important to establish the nature of this death as objectively as possible. I was prompted to look at this by a surprising conversation I had IRL suggesting credible evidence that it was not suicide. The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 26 2024. The police say it was a suicide with no evidence of foul play. Most of the evidence we have comes from the parents and George Webb. Webb describes himself as an investigative journalist, but I would classify him as more of a conspiracy theorist, based on a quick scan of some of his older videos. I think many of the specific factual claims he has made about this case are true, though I generally doubt his interpretations. Webb seems to have made contact with the parents early on and went with them when they first visited Balaji's apartment. He has since published videos from the scene of the death, against the wishes of the parents[1] and as a result the parents have now unendorsed Webb.[2] List of evidence: * He didn't leave a suicide note.[3] * The cause of death was decided by the authorities in 14 (or 40, unclear) minutes.[4] * The parents arranged a private autopsy which "made their suspicions stronger".[5] * The parents say "there are a lot of facts that are very disturbing for us and we cannot share at the moment but when we do a PR all of that will come out."[6] * The parents say "his computer has been deleted, his desktop has been messed up".[7] * Although the parents also said that their son's phone and laptop are not lost and are in escrow.[8][9] I think the claim of the computer being deleted is more up-to-date, but I'm not sure as that video was posted earlier. * It was his birt
You May Want to Know About Locally Decodable Codes In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters. jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file. Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes.  One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know. A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption
ryan_greenblattΩ438110
9
I thought it would be helpful to post about my timelines and what the timelines of people in my professional circles (Redwood, METR, etc) tend to be. Concretely, consider the outcome of: AI 10x’ing labor for AI R&D[1], measured by internal comments by credible people at labs that AI is 90% of their (quality adjusted) useful work force (as in, as good as having your human employees run 10x faster). Here are my predictions for this outcome: * 25th percentile: 2 year (Jan 2027) * 50th percentile: 5 year (Jan 2030) The views of other people (Buck, Beth Barnes, Nate Thomas, etc) are similar. I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views). ---------------------------------------- 1. Only including speedups due to R&D, not including mechanisms like synthetic data generation. ↩︎
People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying. 
Alignment is not all you need. But that doesn't mean you don't need alignment. One of the fairytales I remember reading from my childhood is the "Three sillies". The story is about a farmer encountering three episodes of human silliness, but it's set in one more frame story of silliness: his wife is despondent because there is an axe hanging in their cottage, and she thinks that if they have a son, he will walk underneath the axe and it will fall on his head. The frame story was much more memorable to me than any of the "body" stories, and I randomly remember this story much more often than any other fairytale I read at the age I read fairytales. I think the reason for this is that the "hanging axe" worry is a vibe very familiar from my family and friend circle, and more generally a particular kind of intellectual neuroticism that I encounter all the time, that is terrified of incomplete control or understanding. I really like the rationalist/EA ecosphere because of its emphasis on the solvability of problems like this: noticing situations where you can just approach the problem, taking down the axe. However, a baseline of intellectual neuroticism persists (after all you wouldn't expect otherwise from a group of people who pull smoke alarms on pandemics and existential threats that others don't notice). Sometimes it's harmless or even beneficial. But a kind of neuroticism in the community that bothers me, and seems counterproductive, is a certain "do it perfectly or you're screwed" perfectionism that pervades a lot of discussions. (This is also familiar to me from my time as a mathematician: I've had discussions with very intelligent and pragmatic friends who rejected even the most basic experimentally confirmed facts of physics because "they aren't rigorously proven".) A particular train of discussion that annoyed me in this vein was the series of responses to Raemon's "preplanning and baba is you" post. The initial post I think makes a nice point -- it suggest

Popular Comments

Recent Discussion

8Alexander Gietelink Oldenziel
You May Want to Know About Locally Decodable Codes In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters. jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file. Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes.  One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know. A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption

Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me.
Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.

In some ways it is uniquely easy to do this to codes with maximal entropy because per definit... (read more)

2Noosphere89
Indeed, even three query locally decodable codes have code lengths that must grow exponentially with message size: https://www.quantamagazine.org/magical-error-correction-scheme-proved-inherently-inefficient-20240109/
2MondSemmel
That claim is from 2017. Does Ilya even still endorse it?

Epistemic Status: This post is an attempt to condense some ideas I've been thinking about for quite some time. I took some care grounding the main body of the text, but some parts (particularly the appendix) are pretty off the cuff, and should be treated as such. 

The magnitude and scope of the problems related to AI safety have led to an increasingly public discussion about how to address them. Risks of sufficiently advanced AI systems involve unknown unknowns that could impact the global economy, national and personal security, and the way we investigate, innovate, and learn. Clearly, the response from the AI safety community should be as multi-faceted and expansive as the problems it aims to address. In a previous post, we framed fruitful collaborations between applied...

1Jonas Hallgren
I really like this! For me it somewhat also paints a vision for what could be which might inspire action. Something that I've generally thought would be really nice to have over the last couple of years is a vision for how an AI Safety field that is decentralized could look like and what the specific levers to pull would be to get there.  What does the optimal form of a decentralized AI Safety science look like?  How does this incorporate parts of meta science and potentially decentralized science?  How does this look like with literature review from AI systems? How can we use AI Systems in themselves to create such infrastructure in the field? How do such communication pathways optimally look like?  I feel that there are so many low-hanging fruit here. There are so many algorithms that we could apply to make things better. Yes we've got some forums but holy smokes could the underlying distribution and optimisation systems be optimised. Maybe the lightcone crew could cook something in this direction?

Thanks for the comment! I do hope that the thoughts expressed here can inspire some action, but I'm not sure I understand your questions. Do you mean 'centralized', or are you thinking about the conditions necessary for many small scale trading zones? 

In this way, I guess the emergence of big science could be seen as a phase transition from decentralization -> centralization. 

2dr_s
I think some believe it's downright impossible and others that we'll just never create it because we have no use for something so smart it overrides our orders and wishes. That at most we'll make a sort of magical genie still bound by us expressing our wishes.
1Lucius Bushnaq
End points are easier to infer than trajectories, so sure, I think there's some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow.  For example, I think it's a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we're not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we're doing right now is like building an abacus out of supercomputers. I can't imagine any mature civilisation would stick with this. It's not that I refuse to speculate about how  a world post aligned superintelligence might look. I just didn't think that your guess was very plausible. I don't think pre-existing property rights or state structures would matter very much in such a world, even if we don't get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and g
10faul_sname
Assuming that which end point you get to doesn't depend on the intermediate trajectories at least.

Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don't think it's like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don't think this was ever defensible:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He

... (read more)

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

Encounters

The following encounters existed:

Encounter NameThreat (Surprised)Threat (Alerted)Alerted ByTier
Whirling Blade Trap--2--1
Goblins12Anything1
Boulder Trap--3--2
Orcs24Anything2
Clay Golem--4--3
Hag36Tier 2 and up3
Steel Golem--5--4
Dragon48Tier 3 and up4

Each encounter had a Threat that determined how dangerous it was to adventurers.  When adventurers encountered that, they would roll [Threat]d2 to determine how challenging they found it.

However, many encounters had two different Threat levels, depending on whether they were alerted to the adventurers or not.  (A dragon that's woken up from...

 Notes on my performance:

. . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.)... (read more)

3Christian Z R
Wow, I think got more lucky there than I really deserved to. This was really interesting, I could see that there must be something complicated going on, but I never got close to guessing the actual system. Encounters alerting other encounters is both simple and feels intuitively right once you see it. After stealing most of my ideas from abstractapplic (thanks) I spent most time trying to figure out which order to put the C-W-B-H. I found that having the toughest encounters later worked best, which must be an effect that is actually caused by the random players putting their strongest encounter in room 9, so a strong encounter in room 6 or 8 will help alert this final encounter. So even though it is not built in, the other builders preference for putting dragons in room 9 makes strong encounter more valuable for the later rooms. Luckily this caused me to switch from C-B-W-H-D to C-W-B-H-D, so the Boulder trap alerted the Hag, giving me the final 3 points. I guess this says something about emergent effects can still be valuable(ish) even when you haven't grokked the entire system... Anyway, thanks a lot for an enjoyable challenge.

Edit 2: I'm now fairly confident that this is just the Presumptuous Philosopher problem is disguise, which is explained clearly in Section 6.1 here https://www.lesswrong.com/s/HFyami76kSs4vEHqy/p/LARmKTbpAkEYeG43u

This is my first post ever on LessWrong. Let me explain my problem. 

I was born in a unique situation — I shall omit the details of exactly what this situation was, but for my argument's sake, assume I was born as the tallest person in the entire world. Or instead suppose that I was born into the richest family in the world. In other words, take as an assumption that I was born into a situation entirely unique relative to all other humans on an easily measurable dimension such as height or wealth (i.e., not some niche measure like "longest tongue"). And indeed, my...

Answer by Noosphere8920

The answer is yes, trivially, because under a wide enough conception of computation, basically everything is simulatable, so everything is evidence for the simulation hypothesis because it includes effectively everything.

It will not help you infer anything else though.

More below:

http://www.amirrorclear.net/academic/ideas/simulation/index.html

https://arxiv.org/abs/1806.08747

2Answer by Alexander Gietelink Oldenziel
For what it's worth I do think observers that observe themselves to be highly unique in important axes rationally should increase their credence in simulation hypotheses.
5Answer by Ape in the coat
Yes, it is your main error. Think how justified this assumption is according to your knowledge state. How much evidence do you actually have? Have you check many simulations before generalizing that principle? Or are you just speculating based on total ignorance? For your own sake, please don't. Both SIA and SSA are also unjustified assumptions out of nowhere and lead to more counterintuitive conclusions. Instead consider these two problems. Problem 1: Problem 2: Are you justified to believe that Problem 2 has the same answer as Problem 1? That you can simply assume that half of the balls in blue bag are blue? Not after you went and checked a hundred random blue bags and in all of them half the balls were blue but just a priori? And likewise with a grey bag. Where would these assumptions be coming from? You can come up with some plausibly sounding just-so story. That people who were filling the bag felt the urge to put blue balls in a blue bag. But what about the opposite just-so story, where people were disincentivized to put blue balls in a blue bag? Or where people payed no attention to the color of bag? Or all the other possible just-so stories? Why do you prioritize this one in particular? Maybe you imagine yourself tasked with filling two bags with balls of different colors. And when you inspect your thinking process in such situation, you feel the urge to put a lot of blue balls in blue bag. But why would the way you'd fill the bags, be entangled with the actual causal process that filled these bags in a general case? You don't know that bags were filled by people with your sensibilities. You don't know that they were filled by people, to begin with. Or spin it the other way. Suppose, you could systematically produce correct reasoning by simply assuming things like that. What would be the point in gathering evidence then? Why spend extra energy on checking the way blue bags and grey bags are organized if you can confidently a priori deduce that? 
4CstineSublime
What makes you care about it? What makes it persuasive to you? What decisions would you make differently and what tangible results within this presumed simulation would you expect to see differently pursuant to proving this? (How do you expect your belief in the simulation to pay rent in anticipated experiences?) Also, the general consensus in rational or at least broadly in science is if something is unfalsifiable then it must not be entertained.    Say more? I don't see how they are the same reference class.

Introduction

How many years will pass before transformative AI is built? Three people who have thought about this question a lot are Ajeya Cotra from Open Philanthropy, Daniel Kokotajlo from OpenAI and Ege Erdil from Epoch. Despite each spending at least hundreds of hours investigating this question, they still still disagree substantially about the relevant timescales. For instance, here are their median timelines for one operationalization of transformative AI:

Median Estimate for when 99% of currently fully remote jobs will be automatable
Daniel4 years
Ajeya13 years
Ege40 years

You can see the strength of their disagreements in the graphs below, where they give very different probability distributions over two questions relating to AGI development (note that these graphs are very rough and are only intended to capture high-level differences, and especially aren't very...

23ryan_greenblatt
ReviewMy sense is that this post holds up pretty well. Most of the considerations under discussion still appear live and important including: in-context learning, robustness, whether jank AI R&D accelerating AIs can quickly move to more general and broader systems, and general skepticism of crazy conclusions. At the time of this dialogue, my timelines were a bit faster than Ajeya's. I've updated toward the views Daniel expresses here and I'm now about half way between Ajeya's views in this post and Daniel's (in geometric mean). My read is that Daniel looks somewhat too aggressive in his predictions for 2024, though it is a bit unclear exactly what he was expecting. (This concrete scenario seems substantially more bullish than what we've seen in 2024, but not by a huge amount. It's unclear if he was intending these to be mainline predictions or a 25th percentile bullish scenario.) AI progress appears substantially faster than the scenario outlined in Ege's median world. In particular: * On "we have individual AI labs in 10 years that might be doing on the order of e.g. $30B/yr in revenue". OpenAI made $4 billion in revenue in 2024, it seems AI company revenue 3x's per year such that in 2026 they will make around $30 billion which is 3 years out instead of 10. * On "maybe AI systems can get gold on the IMO in five years". We seem likely to see gold on IMO this year (a bit less than 2 years later). It would be interesting to hear how Daniel, Ajeya, and Ege's views have changed since the time this was posted. (I think Daniel has somewhat later timelines (but the update is smaller than the progression of time such that AGI now seems closer to Daniel) and I think Ajeya has somewhat sooner timelines.) Daniel discusses various ideas for how to do a better version of this dialogue in this comment. My understanding is that Daniel (and others) have run something similar to what he describes multiple times and participants find this valuable. I'm not sure how much people have

I've updated toward the views Daniel expresses here and I'm now about half way between Ajeya's views in this post and Daniel's (in geometric mean).

What were the biggest factors that made you update? (I obviously have some ideas, but curious what seemed most important to you.)

10Daniel Kokotajlo
That concrete scenario was NOT my median prediction. Sorry, I should have made that more clear at the time. It was genuinely just a thought experiment for purposes of eliciting people's claims about how they would update on what kinds of evidence. My median AGI timeline at the time was 2027 (which is not that different from the scenario, to be clear! Just one year delayed basically.) To answer your other questions: --My views haven't changed much. Performance on the important benchmarks (agency tasks such as METR's RE-Bench) has been faster than I expected for 2024, but the cadence of big new foundation models seems to be slower than I expected (no GPT-5; pretraining scaling is slowing down due to data wall apparently? I thought that would happen more around GPT-6 level). I still have 2027 as my median year for AGI. --Yes, I and others have run versions of that exercise several times now and yes people have found it valuable. The discussion part, people said, was less valuable than the "force yourself to write out your median scenario" part, so in more recent iterations we mostly just focused on that part.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that this post on the role of capital in an AI future elicited.

When discussing the future of AI, I semi-often hear an argument along the lines that in a slow takeoff world, despite AIs automating increasingly more of the economy, humanity will remain in the driving seat because of its ownership of capital. This world posits one where humanity effectively becomes a rentier class living well off the vast economic productivity of the AI economy where despite contributing little to no value, humanity can extract most/all of the surplus value created due to its ownership of capital alone.

This is a possibility, and indeed is perhaps closest to what a ‘positive singularity’ looks...

7sapphire
  It feels to me like the hairs of the nobility are doing amazingly well. That is more than enough money to support a lifestyle of leisure. Such a lifestyle is not available to the vast majority of people. So it seems like they mostly did secure a superior existence for their heirs.

This is only true if you restrict "nobility" to Great Britain and if you only count "nobles" those who are considered such in our current day. This is a confusion of the current British noble title (specifically members of "Peerage of Great Britain") with "land owning rentier class that existed before the industrial revolution". For our discussion, we need to look at the second one.

I do not have specific numbers of UK, but quoting for Europe from wikipedia (https://en.wikipedia.org/wiki/Nobility#Europe):
"The countries with the highest proportion of nobles ... (read more)

16Matthew Barnett
I agree with nearly all the key points made in this post. Like you, I think that the disempowerment of humanity is likely inevitable, even if we experience a peaceful and gradual AI takeoff. This outcome seems probable even under conditions where strict regulations are implemented to ostensibly keep AI "under our control". However, I’d like to contribute an ethical dimension to this discussion: I don’t think peaceful human disempowerment is necessarily a bad thing. If you approach this issue with a strong sense of loyalty to the human species, it’s natural to feel discomfort at the thought of humans receiving a progressively smaller share of the world’s wealth and influence. But if you adopt a broader, more cosmopolitan moral framework—one where agentic AIs are considered deserving of control over the future, just as human children are—then the prospect of peaceful and gradual human disempowerment becomes much less troubling. To adapt the analogy you used this post, consider the 18th century aristocracy. In theory, they could have attempted to halt the industrial revolution in order to preserve their relative power and influence over society for a longer period. This approach might have extended their dominance for a while longer, perhaps by several decades.  But, fundamentally, the aristocracy was not a monolithic "class" with a coherent interest in preventing their own disempowerment—they were individuals. And as individuals, their interests did not necessarily align with a long-term commitment to keeping other groups, such as peasants, out of power. Each aristocrat could make personal choices, and many of them likely personally benefitted from industrial reforms. Some of them even adapted to the change, becoming industrialists themselves and profiting greatly. With time, they came to see more value in the empowerment and well-being of others over the preservation of their own class's dominance. Similarly, humanity faces a comparable choice today with respect

At MATS today we practised “looking back on success”, a technique for visualizing and identifying positive outcomes.

The driving question was, “Imagine you’ve had a great time at MATS; what would that look like?”

My personal answers:

  • Acquiring breadth, ie getting a better understanding of the whole AI safety portfolio / macro-strategy. A good heuristic for this might be reading and understanding 1 blogpost per mentor
  • Writing a “good” paper. One that I’ll feel happy about a couple years down the line
  • Clarity on future career plans. I’d probably like to keep
... (read more)

Civilization: A Superintelligence Aligned with Human Interests

Consider civilization as a problem-solving superintelligence. The graph below shows the global decline in extreme poverty from 1820 to 2015, prompting Steven Pinker’s quote,

   “We have been doing something right, and it would be nice to know what, exactly, it is.”

After 1980, the rate of decline increases dramatically. It would be good to know the dynamics that culminated in this dramatic decline that continued for the next 40 years. What did we get right? What did civilization learn?

It’s not possible to answer this directly, but we would like to open it up for discussion. We need a conceptual framework for thinking abstractly about civilization over long periods of time.

We like the game metaphor. Games are universal. Asking someone to imagine a game...

Ben20

On the "what did we start getting right in the 1980's for reducing global poverty" I think most of the answer was a change in direction of China. In the late 70's they started reforming their economy (added more capitalism, less command economy): https://en.wikipedia.org/wiki/Chinese_economic_reform.

Comparing this graph on wiki https://en.wikipedia.org/wiki/Poverty_in_China#/media/File:Poverty_in_China.svg , to yours, it looks like China accounts for practically all of the drop in poverty since the 1980s.

Arguably this is a good example for your other point... (read more)