Lightcone Infrastructure FundraiserGoal 2:$1,709,691 of $2,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
This is an attempt to compile all publicly available primary evidence relating to the recent death of Suchir Balaji, an OpenAI whistleblower. This is a tragic loss and I feel very sorry for the parents. The rest of this piece will be unemotive as it is important to establish the nature of this death as objectively as possible. I was prompted to look at this by a surprising conversation I had IRL suggesting credible evidence that it was not suicide. The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 26 2024. The police say it was a suicide with no evidence of foul play. Most of the evidence we have comes from the parents and George Webb. Webb describes himself as an investigative journalist, but I would classify him as more of a conspiracy theorist, based on a quick scan of some of his older videos. I think many of the specific factual claims he has made about this case are true, though I generally doubt his interpretations. Webb seems to have made contact with the parents early on and went with them when they first visited Balaji's apartment. He has since published videos from the scene of the death, against the wishes of the parents[1] and as a result the parents have now unendorsed Webb.[2] List of evidence: * He didn't leave a suicide note.[3] * The cause of death was decided by the authorities in 14 (or 40, unclear) minutes.[4] * The parents arranged a private autopsy which "made their suspicions stronger".[5] * The parents say "there are a lot of facts that are very disturbing for us and we cannot share at the moment but when we do a PR all of that will come out."[6] * The parents say "his computer has been deleted, his desktop has been messed up".[7] * Although the parents also said that their son's phone and laptop are not lost and are in escrow.[8][9] I think the claim of the computer being deleted is more up-to-date, but I'm not sure as that video was posted earlier. * It was his birt
ryan_greenblattΩ438110
9
I thought it would be helpful to post about my timelines and what the timelines of people in my professional circles (Redwood, METR, etc) tend to be. Concretely, consider the outcome of: AI 10x’ing labor for AI R&D[1], measured by internal comments by credible people at labs that AI is 90% of their (quality adjusted) useful work force (as in, as good as having your human employees run 10x faster). Here are my predictions for this outcome: * 25th percentile: 2 year (Jan 2027) * 50th percentile: 5 year (Jan 2030) The views of other people (Buck, Beth Barnes, Nate Thomas, etc) are similar. I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views). ---------------------------------------- 1. Only including speedups due to R&D, not including mechanisms like synthetic data generation. ↩︎
You May Want to Know About Locally Decodable Codes In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters. jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file. Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes.  One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know. A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption
People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying. 
Alignment is not all you need. But that doesn't mean you don't need alignment. One of the fairytales I remember reading from my childhood is the "Three sillies". The story is about a farmer encountering three episodes of human silliness, but it's set in one more frame story of silliness: his wife is despondent because there is an axe hanging in their cottage, and she thinks that if they have a son, he will walk underneath the axe and it will fall on his head. The frame story was much more memorable to me than any of the "body" stories, and I randomly remember this story much more often than any other fairytale I read at the age I read fairytales. I think the reason for this is that the "hanging axe" worry is a vibe very familiar from my family and friend circle, and more generally a particular kind of intellectual neuroticism that I encounter all the time, that is terrified of incomplete control or understanding. I really like the rationalist/EA ecosphere because of its emphasis on the solvability of problems like this: noticing situations where you can just approach the problem, taking down the axe. However, a baseline of intellectual neuroticism persists (after all you wouldn't expect otherwise from a group of people who pull smoke alarms on pandemics and existential threats that others don't notice). Sometimes it's harmless or even beneficial. But a kind of neuroticism in the community that bothers me, and seems counterproductive, is a certain "do it perfectly or you're screwed" perfectionism that pervades a lot of discussions. (This is also familiar to me from my time as a mathematician: I've had discussions with very intelligent and pragmatic friends who rejected even the most basic experimentally confirmed facts of physics because "they aren't rigorously proven".) A particular train of discussion that annoyed me in this vein was the series of responses to Raemon's "preplanning and baba is you" post. The initial post I think makes a nice point -- it suggest

Popular Comments

Recent Discussion

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.)

No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opinions. Have we entered into what will in hindsight be not even the early stages, but actually the middle stage, of the mad tumbling rush into singularity? Or are we just witnessing the exciting early period of a new technology, full of discovery and opportunity, akin to the boom years of the personal computer and the web?

AI is approaching elite skill at programming, possibly barreling into superhuman status at advanced mathematics, and only picking up speed. Or so the framing goes. And...

29Thane Ruthenis
I'm very skeptical of AI being on the brink of dramatically accelerating AI R&D. My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here: What actually matters for ML-style progress is picking the correct trick, and then applying it to a big-enough model. If you pick the trick wrong, you ruin the training run, which (a) potentially costs millions of dollars, (b) wastes the ocean of FLOP you could've used for something else. And picking the correct trick is primarily a matter of research taste, because: * Tricks that work on smaller scales often don't generalize to larger scales. * Tricks that work on larger scales often don't work on smaller scales (due to bigger ML models having various novel emergent properties). * Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.[1] So 10x'ing the number of small-scale experiments is unlikely to actually 10x ML research, along any promising research direction. And, on top of that, I expect that AGI labs don't actually have the spare compute to do that 10x'ing. I expect it's all already occupied 24/7 running all manners of smaller-scale experiments, squeezing whatever value out of them that can be squeezed out. (See e. g. Superalignment team's struggle to get access to compute: that suggests there isn't an internal compute overhang.) Indeed, an additional disadvantage of AI-based researchers/engineers is that their forward passes would cut into that limited compute budget. Offloading the computations associated with software engineering and experiment oversight onto the brains of mid-level human engineers is potentially more cost-efficient. As a separate line of argumentation: Suppose that, as you describe it in another comment, we imagine that AI would soon be able to give senior researchers teams of 10x-speed 24/7-working junior devs, to

Thanks for the mention Thane. I think you make excellent points, and agree with all of them, to some degree. Yet, I'm expecting huge progress in AI algorithms to be unlocked by AI reseachers.

I'll quote from my comments on .

How closely are they adhering to the "main path" of scaling existing techniques with minor tweaks? If you want to know how a minor tweak affects your current large model at scale, that is a very compute-heavy researcher-time-light type of experiment. On the other hand, if you want to test a lot of novel new paths at much smaller scales

... (read more)
6JBlack
My description "better capabilities than average adult human in almost all respects", differs from "would be capable of running most people's lives better than they could". You appear to be taking these as synonymous. The economically useful question is more along the lines of "what fraction of time taken on tasks could a business expect to be able to delegate to these agents for free vs a median human that they have to employ at socially acceptable wages" (taking into account supervision needs and other overheads in each case). My guess is currently "more than half, probably not yet 80%". There are still plenty of tasks that a supervised 120 IQ human can do that current models can't. I do not think there will remain many tasks that a 100 IQ human can do with supervision that a current AI model cannot with the same degree of supervision, after adjusting processes to suit the differing strengths and weakness of each.
9ryan_greenblatt
I've had a pretty similar experience personally but: * I think serial speed matters a lot and you'd be willing to go through a bunch more hassle if the junior devs worked 24/7 and at 10x speed. * Quantity can be a quality of its own—if you have truely vast (parallel) quantities of labor, you can be much more demanding and picky. (And make junior devs do much more work to understand what is going on.) * I do think the experimentation thing is probably somewhat big, but I'm uncertain. * (This one is breaking with the junior dev analogy, but whatever.) In the AI case, you can train/instruct once and then fork many times. In the analogy, this would be like you spending 1 month training the junior dev (who still works 24/7 and at 10x speed, so 10 months for them) and then forking them into many instances. Of course, perhaps AI sample efficiency is lower. However, my personal guess is that lots of compute spent on learning and aggressive schlep (e.g. proliferation, lots of self-supervised learning, etc) can plausibly substantially reduce or possibly eliminate the gap (at least once AIs are more capable) similar to how it works for EfficientZero.

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

Encounters

The following encounters existed:

Encounter NameThreat (Surprised)Threat (Alerted)Alerted ByTier
Whirling Blade Trap--2--1
Goblins12Anything1
Boulder Trap--3--2
Orcs24Anything2
Clay Golem--4--3
Hag36Tier 2 and up3
Steel Golem--5--4
Dragon48Tier 3 and up4

Each encounter had a Threat that determined how dangerous it was to adventurers.  When adventurers encountered that, they would roll [Threat]d2 to determine how challenging they found it.

However, many encounters had two different Threat levels, depending on whether they were alerted to the adventurers or not.  (A dragon that's woken up from...

2abstractapplic
 Notes on my performance: . . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.) There were only three likely hypotheses based on the problem statement: A) adventurers scout one room ahead, B) adventurers take optimal path(s), and C) adventurers hit every room so all that matters is the order. Early efforts ruled out C, and the Bonus Objective being fully achievable under A but not B made A a lot more plausible; however, further investigations[1] made it seem like that might be a fakeout[2], so I (narrowly) chose to max-min instead of max-max; even in retrospect, I'm not 100% sure that was a bad decision. Notes on the scenario: I have strongly ambivalent feelings about almost every facet of this game. The central concept was solid gold but could have been handled better. In particular, I think puzzling out the premise could have been a lot more fun if we hadn't known the entry and exit squares going in. The writing was as fun and funny as usual - if not more so! - but seemed less . . . pointed?/ambitious?/thematically-coherent? than I've come to expect. The difficulty curve was perfect early but annoying late. A lot of our scenarios commit the minor sin of making initial headway hard to make, discouraging casual players and giving negligible or negative reward for initial investigations; this one emphatically doesn't, since pairing high-traffic rooms with high-challenge creatures was an easy(-ish) way to get better-than-random EV. However, the central mechanics of "dungeoneers scout one roo
aphyer20

I think puzzling out the premise could have been a lot more fun if we hadn't known the entry and exit squares going in

I think this would have messed up the difficulty curve a bit: telling players 'here is the entrance and exit' is part of what lets 'stick a tough encounter at the entrance/exit' be a simple strategy.

The writing was as fun and funny as usual - if not more so! - but seemed less . . . pointed?/ambitious?/thematically-coherent? than I've come to expect.

This is absolutely true though I'm surprised it's obvious: my originally-planned scenario did... (read more)

3Christian Z R
Wow, I think got more lucky there than I really deserved to. This was really interesting, I could see that there must be something complicated going on, but I never got close to guessing the actual system. Encounters alerting other encounters is both simple and feels intuitively right once you see it. After stealing most of my ideas from abstractapplic (thanks) I spent most time trying to figure out which order to put the C-W-B-H. I found that having the toughest encounters later worked best, which must be an effect that is actually caused by the random players putting their strongest encounter in room 9, so a strong encounter in room 6 or 8 will help alert this final encounter. So even though it is not built in, the other builders preference for putting dragons in room 9 makes strong encounter more valuable for the later rooms. Luckily this caused me to switch from C-B-W-H-D to C-W-B-H-D, so the Boulder trap alerted the Hag, giving me the final 3 points. I guess this says something about emergent effects can still be valuable(ish) even when you haven't grokked the entire system... Anyway, thanks a lot for an enjoyable challenge.

Summary

In my first piece, I settled on conceiving of radical empathy as an object view. I highlight some other potential moral implications of object views:

  1. More of what someone finds pleasant or is (knowingly or unknowingly) disposed to find pleasant is better for them, all else equal. Death can be bad for them if and because it deprives them of the objects of those preferences. Death can also be bad for them if and because they explicitly disprefer it or its consequences (more).
  2. We should care about what animals care about, e.g. preventing their pain and keeping them close to those to which they’re attached (more).
  3. Someone’s suffering might not necessarily always count against their life, in case it isn't about their life or aspects of it, e.g. grieving the loss of a loved one (more).
  4. If
...

Summary

I illustrate and defend actualist object views as my conception of radical empathy, of being concerned exactly with what we would actually care about. As a kind of asymmetric person-affecting view, the most important implication for cause prioritization is probably lower priority for extinction risk reduction relative to total utilitarianism.

  1. I illustrate with an example where I find actualist object views handle changing preferences better than other views (more).
  2. An important implication for cause prioritization is the (Procreation) Asymmetry: we have reasons to prevent miserable lives, but not to create happy or fulfilling lives, for the sake of what those lives would actually care about (more).
    1. This is not (necessarily) antinatalist or pro-extinction, but it would probably lead to less priority for extinction risk reduction, compared to total utilitarianism (more).
  3. I highlight some problems for actualism, addressing
...
This is a linkpost for https://fatebook.io

Fatebook is a website that makes it extremely low friction to make and track predictions.

It's designed to be very fast - just open a new tab, go to fatebook.io, type your prediction, and hit enter. Later, you'll get an email reminding you to resolve your question as YES, NO, or AMBIGUOUS.

It's private by default, so you can track personal questions and give forecasts that you don't want to share publicly. You can also share questions with specific people, or publicly.

Fatebook syncs with Fatebook for Slack - if you log in with the email you use for Slack, you’ll see all of your questions on the website.

As you resolve your forecasts, you'll build a track record - Brier score, Relative Brier score, and see your calibration chart. You can...

Adam B10

Thanks for the suggestion - you can now sign in to Fatebook with email!

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
8Alexander Gietelink Oldenziel
You May Want to Know About Locally Decodable Codes In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters. jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file. Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes.  One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know. A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption

Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me.
Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.

In some ways it is uniquely easy to do this to codes with maximal entropy because per definit... (read more)

2Noosphere89
Indeed, even three query locally decodable codes have code lengths that must grow exponentially with message size: https://www.quantamagazine.org/magical-error-correction-scheme-proved-inherently-inefficient-20240109/
2MondSemmel
That claim is from 2017. Does Ilya even still endorse it?

Epistemic Status: This post is an attempt to condense some ideas I've been thinking about for quite some time. I took some care grounding the main body of the text, but some parts (particularly the appendix) are pretty off the cuff, and should be treated as such. 

The magnitude and scope of the problems related to AI safety have led to an increasingly public discussion about how to address them. Risks of sufficiently advanced AI systems involve unknown unknowns that could impact the global economy, national and personal security, and the way we investigate, innovate, and learn. Clearly, the response from the AI safety community should be as multi-faceted and expansive as the problems it aims to address. In a previous post, we framed fruitful collaborations between applied...

1Jonas Hallgren
I really like this! For me it somewhat also paints a vision for what could be which might inspire action. Something that I've generally thought would be really nice to have over the last couple of years is a vision for how an AI Safety field that is decentralized could look like and what the specific levers to pull would be to get there.  What does the optimal form of a decentralized AI Safety science look like?  How does this incorporate parts of meta science and potentially decentralized science?  How does this look like with literature review from AI systems? How can we use AI Systems in themselves to create such infrastructure in the field? How do such communication pathways optimally look like?  I feel that there are so many low-hanging fruit here. There are so many algorithms that we could apply to make things better. Yes we've got some forums but holy smokes could the underlying distribution and optimisation systems be optimised. Maybe the lightcone crew could cook something in this direction?

Thanks for the comment! I do hope that the thoughts expressed here can inspire some action, but I'm not sure I understand your questions. Do you mean 'centralized', or are you thinking about the conditions necessary for many small scale trading zones? 

In this way, I guess the emergence of big science could be seen as a phase transition from decentralization -> centralization. 

2dr_s
I think some believe it's downright impossible and others that we'll just never create it because we have no use for something so smart it overrides our orders and wishes. That at most we'll make a sort of magical genie still bound by us expressing our wishes.
1Lucius Bushnaq
End points are easier to infer than trajectories, so sure, I think there's some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow.  For example, I think it's a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we're not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we're doing right now is like building an abacus out of supercomputers. I can't imagine any mature civilisation would stick with this. It's not that I refuse to speculate about how  a world post aligned superintelligence might look. I just didn't think that your guess was very plausible. I don't think pre-existing property rights or state structures would matter very much in such a world, even if we don't get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and g
10faul_sname
Assuming that which end point you get to doesn't depend on the intermediate trajectories at least.

Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don't think it's like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don't think this was ever defensible:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He

... (read more)