Lightcone Infrastructure FundraiserGoal 2:$1,709,691 of $2,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
This is an attempt to compile all publicly available primary evidence relating to the recent death of Suchir Balaji, an OpenAI whistleblower. This is a tragic loss and I feel very sorry for the parents. The rest of this piece will be unemotive as it is important to establish the nature of this death as objectively as possible. I was prompted to look at this by a surprising conversation I had IRL suggesting credible evidence that it was not suicide. The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 26 2024. The police say it was a suicide with no evidence of foul play. Most of the evidence we have comes from the parents and George Webb. Webb describes himself as an investigative journalist, but I would classify him as more of a conspiracy theorist, based on a quick scan of some of his older videos. I think many of the specific factual claims he has made about this case are true, though I generally doubt his interpretations. Webb seems to have made contact with the parents early on and went with them when they first visited Balaji's apartment. He has since published videos from the scene of the death, against the wishes of the parents[1] and as a result the parents have now unendorsed Webb.[2] List of evidence: * He didn't leave a suicide note.[3] * The cause of death was decided by the authorities in 14 (or 40, unclear) minutes.[4] * The parents arranged a private autopsy which "made their suspicions stronger".[5] * The parents say "there are a lot of facts that are very disturbing for us and we cannot share at the moment but when we do a PR all of that will come out."[6] * The parents say "his computer has been deleted, his desktop has been messed up".[7] * Although the parents also said that their son's phone and laptop are not lost and are in escrow.[8][9] I think the claim of the computer being deleted is more up-to-date, but I'm not sure as that video was posted earlier. * It was his birt
You May Want to Know About Locally Decodable Codes In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters. jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file. Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes.  One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know. A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption
ryan_greenblattΩ438110
9
I thought it would be helpful to post about my timelines and what the timelines of people in my professional circles (Redwood, METR, etc) tend to be. Concretely, consider the outcome of: AI 10x’ing labor for AI R&D[1], measured by internal comments by credible people at labs that AI is 90% of their (quality adjusted) useful work force (as in, as good as having your human employees run 10x faster). Here are my predictions for this outcome: * 25th percentile: 2 year (Jan 2027) * 50th percentile: 5 year (Jan 2030) The views of other people (Buck, Beth Barnes, Nate Thomas, etc) are similar. I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views). ---------------------------------------- 1. Only including speedups due to R&D, not including mechanisms like synthetic data generation. ↩︎
People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying. 
Alignment is not all you need. But that doesn't mean you don't need alignment. One of the fairytales I remember reading from my childhood is the "Three sillies". The story is about a farmer encountering three episodes of human silliness, but it's set in one more frame story of silliness: his wife is despondent because there is an axe hanging in their cottage, and she thinks that if they have a son, he will walk underneath the axe and it will fall on his head. The frame story was much more memorable to me than any of the "body" stories, and I randomly remember this story much more often than any other fairytale I read at the age I read fairytales. I think the reason for this is that the "hanging axe" worry is a vibe very familiar from my family and friend circle, and more generally a particular kind of intellectual neuroticism that I encounter all the time, that is terrified of incomplete control or understanding. I really like the rationalist/EA ecosphere because of its emphasis on the solvability of problems like this: noticing situations where you can just approach the problem, taking down the axe. However, a baseline of intellectual neuroticism persists (after all you wouldn't expect otherwise from a group of people who pull smoke alarms on pandemics and existential threats that others don't notice). Sometimes it's harmless or even beneficial. But a kind of neuroticism in the community that bothers me, and seems counterproductive, is a certain "do it perfectly or you're screwed" perfectionism that pervades a lot of discussions. (This is also familiar to me from my time as a mathematician: I've had discussions with very intelligent and pragmatic friends who rejected even the most basic experimentally confirmed facts of physics because "they aren't rigorously proven".) A particular train of discussion that annoyed me in this vein was the series of responses to Raemon's "preplanning and baba is you" post. The initial post I think makes a nice point -- it suggest

Popular Comments

Recent Discussion

Summary

In my first piece, I settled on conceiving radical empathy as an object view. I highlight some other potential moral implications of object views:

  1. More of what someone finds pleasant or is (knowingly or unknowingly) disposed to find pleasant is better for them, all else equal. Death can be bad for them if and because it deprives them of the objects of those preferences. Death can also be bad for them if and because they explicitly disprefer it or its consequences (more).
  2. We should care about what animals care about, e.g. preventing their pain and keeping them close to those to which they’re attached (more).
  3. Someone’s suffering might not necessarily always count against their life, in case it isn't about their life or aspects of it, e.g. grieving the loss of a loved one (more).
  4. If there
...

Summary

I illustrate and defend actualist object views as my conception of radical empathy, of being concerned exactly with what we would actually care about. As a kind of asymmetric person-affecting view, the most important implication for cause prioritization is probably lower priority for extinction risk reduction relative to total utilitarianism.

  1. I illustrate with an example where I find actualist object views handle changing preferences better than other views (more).
  2. An important implication for cause prioritization is the (Procreation) Asymmetry: we have reasons to prevent miserable lives, but not to create happy or fulfilling lives, for the sake of what those lives would actually care about (more).
    1. This is not (necessarily) antinatalist or pro-extinction, but it would probably lead to less priority for extinction risk reduction, compared to total utilitarianism (more).
  3. I highlight some problems for actualism, addressing
...
This is a linkpost for https://fatebook.io

Fatebook is a website that makes it extremely low friction to make and track predictions.

It's designed to be very fast - just open a new tab, go to fatebook.io, type your prediction, and hit enter. Later, you'll get an email reminding you to resolve your question as YES, NO, or AMBIGUOUS.

It's private by default, so you can track personal questions and give forecasts that you don't want to share publicly. You can also share questions with specific people, or publicly.

Fatebook syncs with Fatebook for Slack - if you log in with the email you use for Slack, you’ll see all of your questions on the website.

As you resolve your forecasts, you'll build a track record - Brier score, Relative Brier score, and see your calibration chart. You can...

Adam B10

Thanks for the suggestion - you can now sign in to Fatebook with email!

8Alexander Gietelink Oldenziel
You May Want to Know About Locally Decodable Codes In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters. jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file. Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes.  One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know. A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption

Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me.
Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.

In some ways it is uniquely easy to do this to codes with maximal entropy because per definit... (read more)

2Noosphere89
Indeed, even three query locally decodable codes have code lengths that must grow exponentially with message size: https://www.quantamagazine.org/magical-error-correction-scheme-proved-inherently-inefficient-20240109/
2MondSemmel
That claim is from 2017. Does Ilya even still endorse it?

Epistemic Status: This post is an attempt to condense some ideas I've been thinking about for quite some time. I took some care grounding the main body of the text, but some parts (particularly the appendix) are pretty off the cuff, and should be treated as such. 

The magnitude and scope of the problems related to AI safety have led to an increasingly public discussion about how to address them. Risks of sufficiently advanced AI systems involve unknown unknowns that could impact the global economy, national and personal security, and the way we investigate, innovate, and learn. Clearly, the response from the AI safety community should be as multi-faceted and expansive as the problems it aims to address. In a previous post, we framed fruitful collaborations between applied...

1Jonas Hallgren
I really like this! For me it somewhat also paints a vision for what could be which might inspire action. Something that I've generally thought would be really nice to have over the last couple of years is a vision for how an AI Safety field that is decentralized could look like and what the specific levers to pull would be to get there.  What does the optimal form of a decentralized AI Safety science look like?  How does this incorporate parts of meta science and potentially decentralized science?  How does this look like with literature review from AI systems? How can we use AI Systems in themselves to create such infrastructure in the field? How do such communication pathways optimally look like?  I feel that there are so many low-hanging fruit here. There are so many algorithms that we could apply to make things better. Yes we've got some forums but holy smokes could the underlying distribution and optimisation systems be optimised. Maybe the lightcone crew could cook something in this direction?

Thanks for the comment! I do hope that the thoughts expressed here can inspire some action, but I'm not sure I understand your questions. Do you mean 'centralized', or are you thinking about the conditions necessary for many small scale trading zones? 

In this way, I guess the emergence of big science could be seen as a phase transition from decentralization -> centralization. 

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
2dr_s
I think some believe it's downright impossible and others that we'll just never create it because we have no use for something so smart it overrides our orders and wishes. That at most we'll make a sort of magical genie still bound by us expressing our wishes.
1Lucius Bushnaq
End points are easier to infer than trajectories, so sure, I think there's some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow.  For example, I think it's a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we're not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we're doing right now is like building an abacus out of supercomputers. I can't imagine any mature civilisation would stick with this. It's not that I refuse to speculate about how  a world post aligned superintelligence might look. I just didn't think that your guess was very plausible. I don't think pre-existing property rights or state structures would matter very much in such a world, even if we don't get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and g
10faul_sname
Assuming that which end point you get to doesn't depend on the intermediate trajectories at least.

Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don't think it's like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don't think this was ever defensible:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He

... (read more)

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.

There is a web interactive here you can use to test your answer, and generation code available here if you're interested, or you can read on for the ruleset and scores.

RULESET

Encounters

The following encounters existed:

Encounter NameThreat (Surprised)Threat (Alerted)Alerted ByTier
Whirling Blade Trap--2--1
Goblins12Anything1
Boulder Trap--3--2
Orcs24Anything2
Clay Golem--4--3
Hag36Tier 2 and up3
Steel Golem--5--4
Dragon48Tier 3 and up4

Each encounter had a Threat that determined how dangerous it was to adventurers.  When adventurers encountered that, they would roll [Threat]d2 to determine how challenging they found it.

However, many encounters had two different Threat levels, depending on whether they were alerted to the adventurers or not.  (A dragon that's woken up from...

 Notes on my performance:

. . . huh! I was really expecting to either take first place for being the only player putting serious effort into the right core mechanics, or take last place for being the only player putting serious effort into the wrong core mechanics; getting the main idea wrong but doing everything else well enough for silver was not on my bingo card. (I'm also pleasantly surprised to note that I figured out which goblin I could purge with least collateral damage: I can leave Room 7 empty without changing my position on the leaderboard.)... (read more)

3Christian Z R
Wow, I think got more lucky there than I really deserved to. This was really interesting, I could see that there must be something complicated going on, but I never got close to guessing the actual system. Encounters alerting other encounters is both simple and feels intuitively right once you see it. After stealing most of my ideas from abstractapplic (thanks) I spent most time trying to figure out which order to put the C-W-B-H. I found that having the toughest encounters later worked best, which must be an effect that is actually caused by the random players putting their strongest encounter in room 9, so a strong encounter in room 6 or 8 will help alert this final encounter. So even though it is not built in, the other builders preference for putting dragons in room 9 makes strong encounter more valuable for the later rooms. Luckily this caused me to switch from C-B-W-H-D to C-W-B-H-D, so the Boulder trap alerted the Hag, giving me the final 3 points. I guess this says something about emergent effects can still be valuable(ish) even when you haven't grokked the entire system... Anyway, thanks a lot for an enjoyable challenge.

Edit 2: I'm now fairly confident that this is just the Presumptuous Philosopher problem is disguise, which is explained clearly in Section 6.1 here https://www.lesswrong.com/s/HFyami76kSs4vEHqy/p/LARmKTbpAkEYeG43u

This is my first post ever on LessWrong. Let me explain my problem. 

I was born in a unique situation — I shall omit the details of exactly what this situation was, but for my argument's sake, assume I was born as the tallest person in the entire world. Or instead suppose that I was born into the richest family in the world. In other words, take as an assumption that I was born into a situation entirely unique relative to all other humans on an easily measurable dimension such as height or wealth (i.e., not some niche measure like "longest tongue"). And indeed, my...

Answer by Noosphere8920

The answer is yes, trivially, because under a wide enough conception of computation, basically everything is simulatable, so everything is evidence for the simulation hypothesis because it includes effectively everything.

It will not help you infer anything else though.

More below:

http://www.amirrorclear.net/academic/ideas/simulation/index.html

https://arxiv.org/abs/1806.08747

2Answer by Alexander Gietelink Oldenziel
For what it's worth I do think observers that observe themselves to be highly unique in important axes rationally should increase their credence in simulation hypotheses.
5Answer by Ape in the coat
Yes, it is your main error. Think how justified this assumption is according to your knowledge state. How much evidence do you actually have? Have you check many simulations before generalizing that principle? Or are you just speculating based on total ignorance? For your own sake, please don't. Both SIA and SSA are also unjustified assumptions out of nowhere and lead to more counterintuitive conclusions. Instead consider these two problems. Problem 1: Problem 2: Are you justified to believe that Problem 2 has the same answer as Problem 1? That you can simply assume that half of the balls in blue bag are blue? Not after you went and checked a hundred random blue bags and in all of them half the balls were blue but just a priori? And likewise with a grey bag. Where would these assumptions be coming from? You can come up with some plausibly sounding just-so story. That people who were filling the bag felt the urge to put blue balls in a blue bag. But what about the opposite just-so story, where people were disincentivized to put blue balls in a blue bag? Or where people payed no attention to the color of bag? Or all the other possible just-so stories? Why do you prioritize this one in particular? Maybe you imagine yourself tasked with filling two bags with balls of different colors. And when you inspect your thinking process in such situation, you feel the urge to put a lot of blue balls in blue bag. But why would the way you'd fill the bags, be entangled with the actual causal process that filled these bags in a general case? You don't know that bags were filled by people with your sensibilities. You don't know that they were filled by people, to begin with. Or spin it the other way. Suppose, you could systematically produce correct reasoning by simply assuming things like that. What would be the point in gathering evidence then? Why spend extra energy on checking the way blue bags and grey bags are organized if you can confidently a priori deduce that? 
4CstineSublime
What makes you care about it? What makes it persuasive to you? What decisions would you make differently and what tangible results within this presumed simulation would you expect to see differently pursuant to proving this? (How do you expect your belief in the simulation to pay rent in anticipated experiences?) Also, the general consensus in rational or at least broadly in science is if something is unfalsifiable then it must not be entertained.    Say more? I don't see how they are the same reference class.