CEO at Redwood Research.
AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.
Please contact me via email ([email protected]) instead of messaging me on LessWrong.
If we are ever arguing on LessWrong and you feel like it's kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I'll probably be willing to call to discuss briefly.
Were you suggesting something other than "remove the parentheses?" Or did it seem like I was thinking about it in a confused way? Not sure which direction you thought the mistake was in.
I think that it is worth conceptually distinguishing AIs that are uncontrollable from AIs that are able to build uncontrollable AIs, because the way you should handle those two kinds of AI are importantly different.
I think this more feature-than-bug – the problem is that it's overwhelming. There are multiple ways to be overwhelming, what we want to avoid is a situation where an overwhelming, unfriendly AI exists. One way is not build AI of a given power level. The other is to increase the robustness of civilization. (I agree the term is fuzzy, but I think realistically the territory is fuzzy).
When you're thinking about how to mitigate the risks, it really matters which of these we're talking about. I think there is some level of AI capability at which it's basically hopeless to control the AIs; this is what I use "galaxy-brained superintelligence" to refer to. If you just want to talk about AIs that pose substantial risk of takeover, you probably shouldn't use the word superintelligence in there, because they don't obviously have to be superintelligences to pose takeover risk. (And it's weird to use "overwhelmingly" as an adverb that modifies "superintelligent", because the overwhelmingness isn't about the level of intelligence, it's about that and also the world. You could say "overwhelming, superintelligent AI" if you want to talk specifically about AIs that are overwhelming and also superintelligent, but that's normally not what we want to talk about.)
I don't really understand what you're saying. I think it's very likely that [ETA: non-galaxy-brained] superintelligent AIs will be able to build galaxy-brained superintelligences within months to years if they are given (or can steal) the resources needed to produce them. I don't think it's obvious that they can do this with extremely limited resources.
I agree that it's useful to have the concept of "Superintelligence that is so qualitatively intelligent that it's very hard for us to be confident about what it will or won't be able to accomplish, even given lots of constraints and limited resources." I usually use "galaxy-brained superintelligence" for this in conversation, but obviously that's a kind of dumb term. Maybe "massively qualitatively superintelligent" works? Bostrom uses "quality superintelligence".
OpenPhil talked about the concept of "transformative AI" specifically because they were trying to talk about a broader class of AIs (though galaxy-brained superintelligence was a core part of their concern).
I don't love "overwhelmingly superintelligent" because AIs don't necessarily have to be qualitatively smarter than humanity to overwhelm it—whether we are overwhelmed by AIs that are "superintelligent" (in the weak sense that they're qualitatively more intelligent than any human) IMO is affected by the quality of takeover countermeasures in place.
The type of AI I'm most directly worried about is "overwhelmingly superhuman compared to humanity." (And, AIs that might quickly bootstrap to become overwhelmingly superhuman).
I think it's a mistake to just mention that second thing as a parenthetical. There's a huge difference between AIs that are already galaxy-brained superintelligences and AIs that could quickly build galaxy-brained superintelligences or modify themselves into galaxy-brained superintelligences—we should try to prevent the former category of AIs from building galaxy-brained superintelligences in ways we don't approve of.
Yes, I think it's reasonable to describe this as the creatures acausally communicating. (Though I would have described this differently; I think that all the physics stuff you said is not necessary for the core idea you want to talk about.)
If you wrote a rude comment in response to me, I wouldn't feel bad about myself, but I would feel annoyed at you. (I feel bad about myself when I think my comments were foolish in retrospect or when I think they were unnecessarily rude in retrospect; the rudeness of replies to me don't really affect how I feel about myself.) Other people are more likely to be hurt by rude comments, I think.
I wouldn't be surprised if Tim found your comment frustrating and it made him less likely to want to write things like this in future. I don't super agree with Tim's post, but I do think LW is better if it's the kind of place where people like him write posts like that (and then get polite pushback).
I have other thoughts here but they're not very important.
I can see why the different things I've said on this might seem inconsistent :P It's also very possible I'm wrong here, I'm not confident about this and have only spent a few hours in conversation about it. And if I wasn't recently personally angered by Eliezer's behavior, I wouldn't have mentioned this opinion publicly. But here's my current model.
My current sense is that IABIED hasn't had that much of an effect on public perception of AI risk, compared to things like AI 2027. My previous sense was that there are huge downsides of Eliezer (and co) being more influential on the topic of AI safety, but MIRI had some chance of succeeding at getting lots of attention, so I was overall positive on you and other MIRI people putting your time into promoting the book. Because the book didn't go as well as seemed plausible, promoting Eliezer's perspective seems less like an efficient way of popularizing concern about AI risk, and less outweighs the disadvantages of him being having negative effects inside the AI safety community.
For example, my guess is that it's worse for the MIRI governance team to be at MIRI than elsewhere except in as much as they gain prominence due to Eliezer association; if that second factor is weaker, it looks less good for them to be there.
I think my impression of the book is somewhat more negative than it was when it first came out, based on various discussions I've had with people about it. But this isn't a big factor.
Does this make sense?
"The main thing Eliezer and MIRI have been doing since shifting focus to comms addressed a 'shocking oversight' that it's hard to imagine anyone else doing a better job addressing" (lmk if this doesn't feel like an accurate paraphrase)
This paraphrase doesn't quite preserve the meaning I intended. I think many people would have done a somewhat better job.
Eliezer definitely doesn't think of it as an ally (or at least, not a good ally who he is appreciative of and wants to be on good terms with).
How does the intro sentence seem triggered? How would you have written it?
Yeah, I think control is unlikely to work for galaxy brained superintelligences. It's unclear how superintelligent they have to be before control is totally unworkable.