Lightcone Infrastructure FundraiserGoal 1:$764,702 of $1,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Taking AI companies that are locally incentivized to race toward the brink, and then hoping they stop right at the cliff's edge, is potentially a grave mistake. One might hope they stop because of voluntary RSPs, or legislation setting a red line, or whistleblowers calling in the government to lock down the datacenters, or whatever. But just as plausible to me is corporations charging straight down the cliff (of building ever-more-clever AI as fast as possible until they build one too clever and it gets power and does bad things to humanity), and even strategizing ahead of time how to avoid obstacles like legislation telling them not to. Local incentives have a long history of dominating people in this way, e.g. people in the tobacco and fossil fuel industries What would be so much safer is if even the local incentives of cutting-edge AI companies favored social good, alignment to humanity, and caution. This would require legislation blocking off a lot of profitable activity, plus a lot of public and philanthropic money incentivizing beneficial activity, in a convulsive effort whose nearest analogy is the global shift to renewable energy. (this take is the one thing I want to boost from AI For Humanity.)
A prereminiscence: It's like it was with chess. We passed through that stage when AI could beat most of us to where it obviously outperforms all of us. Only for cultural output in general. People still think now, but privately, in the shower, or in quaint artisanal forms as if we were making our own yogurts or weaving our own clothes. Human-produced works are now a genre with a dwindling and eccentric fan base more concerned with the process than the product. It was like the tide coming in. One day it was cutely, clumsily trying to mimic that thing we do. Soon after it was doing it pretty well and you watched it with admiration as if it were a dog balancing on a ball. Then soon it could do it well enough for most purposes, you couldn't help but admit. Then as good as all but the best, even to those who could tell the difference. And then we were suddenly all wet, gathering our remaining picnic and heading for higher ground, not only completely outclassed but even unable to judge by how far we were being outclassed. Once a chess machine can beat everybody every time, who is left to applaud when it gets twice again as good? If it had been a war, we would have had a little ceremony as we brought down our flag and folded it up and put it away, but because it happened as quickly and quietly as it did and because we weren't sure whether we wanted to admit it was happening, there was no formal hand-off. One day we just resignedly realized we could no longer matter much, but there was so much more now to appreciate and we could see echoes and reflections of ourselves in it, so we didn't put up a fight. Every once in a while someone would write an essay, Joan Didion quality, really good. Or write a song. Poignant, beautiful, original even. Not maybe the best essay or the best song we'd seen that day, but certainly worthy of being in the top ranks. And we'd think: we've still got it. We can still rally when we've got our backs to the wall. Don't count us out yet. But it wa
Here's a quick interesting-seeming devinterp result: We can estimate the Local Learning Coefficient (LLC, the central quantity of Singular learning theory, for more info see these posts / papers) of a simple grokking model on its training data over the course of training.  This yields the following plot: (note: estimated LLC = lambdahat = ^λ) What's interesting about this is that the estimated LLC of the model in this plot closely tracks test loss, even though it is estimated on training data. On the one hand this is unsurprising: SLT predicts that the LLC determines the Bayes generalization error in the Bayesian setting.[1] On the other hand this is quite surprising: the Bayesian setting is not the same as SGD, an increase in training steps is not the same as an increase in the total number of samples, and the Bayes generalization is not exactly the same as test loss. Despite these differences, the LLC clearly tracks (in-distribution) generalization here. We see this as a positive sign for applying SLT to study neural networks trained by SGD. This plot was made using the devinterp  python package, and the code to reproduce it (including hyperparameter selection) is available as a notebook at https://github.com/timaeus-research/devinterp/blob/main/examples/grokking.ipynb.  Thanks to Nina Panickserry and Dmitry Vaintrob, whose earlier post on learning coefficients of modular addition served as the basis for this experiment.   1. ^ More precisely: in the Bayesian setting the Bayes generalization error, as a function of the number of samples n, is λ/n in leading order.
gwernΩ17347
2
Concrete benchmark proposals for how to detect mode-collapse and AI slop and ChatGPTese, and why I think this might be increasingly important for AI safety, to avoid 'whimper' or 'em hell' kinds of existential risk: https://gwern.net/creative-benchmark
To avoid deploying a dangerous model, you can either (1) test the model pre-deployment or (2) test a similar older model with tests that have a safety buffer such that if the old model is below some conservative threshold it's very unlikely that the new model is dangerous. DeepMind says it uses the safety-buffer plan (but it hasn't yet operationalized thresholds/buffers as far as I know). Anthropic's original RSP used the safety-buffer plan; its new RSP doesn't really use either plan (kinda safety-buffer but it's very weak). (This is unfortunate.) OpenAI seemed to use the test-the-actual-model plan.[1] This isn't going well. The 4o evals were rushed because OpenAI (reasonably) didn't want to delay deployment. Then the o1 evals were done on a weak o1 checkpoint rather than the final model, presumably so they wouldn't be rushed, but this presumably hurt performance a lot on some tasks (and indeed the o1 checkpoint performed worse than o1-preview on some capability evals). OpenAI doesn't seem to be implementing the safety-buffer plan, so if a model is dangerous but not super obviously dangerous, it seems likely OpenAI wouldn't notice before deployment.... (Yay OpenAI for honestly publishing eval results that don't look good.) 1. ^ It's not explicit. The PF says e.g. 'Only models with a post-mitigation score of "medium" or below can be deployed.' But it also mentions forecasting capabilities.

Popular Comments

Recent Discussion

This is the second in a sequence of posts scrutinizing computational functionalism (CF). In my last post, I defined a concrete claim that computational functionalists tend to make:

Practical CF: A simulation of a human brain on a classical computer, capturing the dynamics of the brain on some coarse-grained level of abstraction, that can run on a computer small and light enough to fit on the surface of Earth, with the simulation running at the same speed as base reality[1], would cause the same conscious experience as that brain.

I contrasted this with “theoretical CF”, the claim that an arbitrarily high-fidelity simulation of a brain would be conscious. In this post, I’ll scrutinize the practical CF claim.

My assumptions

  • I assume realism about phenomenal consciousness: Given some physical process, there is
...

No reputable neuroscientist argued against it to any strong degree, just for additional supportive methods of information transmission.

I don't think this is correct. This paper argues explicitly against the neuron doctrine (enough so that they've put it into the first two sentences of the abstract), is published in a prestigious journal, has far above average citation count, and as far as I can see, is written by several authors who are considered perfectly fine/serious academics. Not any huge names, but I think enough to clear the "reputable" bar.

I don... (read more)

2Ape in the coat
Your demand that programs were causally closed from low level representation of the hardware seem to be extremely limiting. According to such paradigm, a program that checks what CPU it's been executed on and prints it's name, can't be conceptualized as a program. Your reasoning about levels of abstraction seem to be a map-territory confusion. Abstractions and their levels are in the map. Evolution doesn't create or not create them. Minds conceptualize what evolution created in terms of abstractions.  Granted, some things are easier to conceptualize in terms of software/hardware than others, because they were specifically designed with this separation in mind. This makes the problem harder, not impossible. As for whether we get so much complexity that we wouldn't be able to execute on a computer on the surface of the Earth, I would be very surprised if it was the case. Yes, a lot of things causally affect neurons but it doesn't mean that all of these things are relevant for phenomenal consciousness in the sense that without representing them the resulting program wouldn't be conscious. Brains do a bazzilion of other stuff as well. In the worst case, we can say that human consciousness is a program but such a complicated one that we better look for a different abstraction. But even this wouldn't mean that we can't write some different, simpler conscious program.

The creation of a “mirrored” organism could “trigger severe ecological disruptions,” according to a 300-page technical report released today. Its authors claim such organisms could quickly spread across the world, fatally infect humans, and “directly drive vulnerable plant and animal species to extinction.” The report accompanies an article in Science, also released today, entitled “Confronting Risks of Mirror Life.”

But what, exactly, is a mirrored organism? To answer that, let’s consider how extant life works. 

Proteins, sugars, lipids, and nucleic acids — key molecules used by cellular life — are all “chiral,” a term derived from the Greek for “hand.” Just as our hands cannot be perfectly aligned on top of one another regardless of how they are rotated, despite being mirror images, the same is true of chiral...

nc10

The thesis of the paper relies heavily on Chapter 8 and its arguments are not fully convincing to me. The evidence base that it draws on is weak and largely speculative. Its discussion of both microevolution and ecological assembly is simplistic. It would have been nice to see an explicit forecast or range of scenarios for the population ecology/genetics [what fitness ratios are reasonable? how long would a natural response take, if ever?], but I acknowledge that is demanding. It's an area very hard to speculate in so I appreciate the amount of research th... (read more)

2J Bostock
I think the risk of infection to humans would be very low. The human body can generate antibodies to pretty much anything (including PEG, benzenes, which never appear in nature) by selecting protein sequences from a huge library of cells. This would activate the complement system which targets membranes and kills bacteria in a non-chiral way. The risk to invertebrates and plants might be more significant, not sure about the specifics of plant immune system.

At this point, we can confidently say that no, capabilities are not hitting a wall. Capacity density, how much you can pack into a given space, is way up and rising rapidly, and we are starting to figure out how to use it.

Not only did we get o1 and o1 pro and also Sora and other upgrades from OpenAI, we also got Gemini 1206 and then Gemini Flash 2.0 and the agent Jules (am I the only one who keeps reading this Jarvis?) and Deep Research, and Veo, and Imagen 3, and Genie 2 all from Google. Meta’s Llama 3.3 dropped, claiming their 70B is now as good as the old 405B, and basically no one noticed.

This morning I saw Cursor now offers ‘agent mode.’ And hey...

We still can find errors in every phishing message that goes out, but they’re getting cleaner.

Whether or not this is true today, it is a statement in which I put near-zero credence.

Harry had unthinkingly started to repeat back the standard proverb that there was no such thing as a perfect crime, before he actually thought about it for two-thirds of a second, remembered a wiser proverb, and shut his mouth in midsentence. If you did commit the perfect crime, nobody would ever find out - so how could anyone possibly know that there weren't perfect crimes? And

... (read more)
5Shankar Sivarajan
Depending on what you consider reasonable (or what you consider "censored"), try ComfyUI with models (and LoRAs) of your choice from Civit AI.  A word of warning: are you sure you want what you're asking for?

I’ve updated quite hard against computational functionalism (CF) recently (as an explanation for phenomenal consciousness), from ~80% to ~30%. Of course it’s more complicated than that, since there are different ways to interpret CF and having credences on theories of consciousness can be hella slippery.

So far in this sequence, I’ve scrutinised a couple of concrete claims that computational functionalists might make, which I called theoretical and practical CF. In this post, I want to address CF more generally.

Like most rationalists I know, I used to basically assume some kind of CF when thinking about phenomenal consciousness. I found a lot of the arguments against functionalism, like Searle’s Chinese room, unconvincing. They just further entrenched my functionalismness. But as I came across and tried to explain away more and more...

Two thoughts here

  • I feel like the actual crux between you and OP is with the claim in post #2 that the brain operates outside the neuron doctrine to a significant extent. This seems to be what your back and forth is heading toward; OP is fine with pseudo-randomness as long as it doesn't play a nontrivial computational function in the brain, so the actual important question is not anything about pseudo-randomness but just whether such computational functions exist. (But maybe I'm missing something, also I kind of feel like this is what most people's objec

... (read more)
3green_leaf
This is false. Everything exists in the territory to the extent to which it can interact with us. While different models can output a different answer as to which computation something runs, that doesn't mean the computation isn't real (or, even, that no computation is real). The computation is real in the sense of it influencing our sense impressions (I can observe my computer running a specific computation, for example). Someone else, whose model doesn't return "yes" to the question whether my computer runs a particular computation will then have to explain my reports of my sense impressions (why does this person claim their computer runs Windows, when I'm predicting it runs CP/M?), and they will have to either change their model, or make systematically incorrect predictions about my utterances. In this way, every computation that can be ascribed to a physical system is intersubjectively real, which is the only kind of reality there could, in principle, be. (Philosophical zombies, by the way, don't refer to functional isomorphs, but to physical duplicates, so even if you lost your consciousness after having your brain converted, it wouldn't turn you into a philosophical zombie.) In principle, yes. The upper physical limit for the amount of computation per kg of material per second is incredibly high. This is false. It's known that any subset of the universe can be simulated on a classical computer to an arbitrary precision. This introduces a bizarre disconnect between your beliefs about your qualia, and the qualia themselves. Imagine: It would be possible, for example, that you believe you're in pain, and act in all ways as if you're in pain, but actually, you're not in pain. Whatever I denote by "qualia," it certainly doesn't have this extremely bizarre property. Because then, the functional properties of a quale and the quale itself would be synchronized only in Homo sapiens. Other species (like octopus) might have qualia, but since they're made of differ
2Seth Herd
I think you're conflating creating a similar vs identical conscious experience with a simulated brain. Close is close enough for me - I'd take an upload run at far less resolution than molecular scale. I spent 23 years studying computational neuroscience. You don't need to model every molecule or even close to get a similar computational and therefore conscious experience. The information content of neurons (collectively and inferred where data is t complete) is a very good match to reported aspects of conscious experience.
5EuanMcLean
Thanks for clarifying :) Yes. Yes, the seed has a causal effect on the execution of the algorithm by my definition. As was talked about in the comments of the original post, causal closure comes in degrees, and in this case the MCMC algorithm is somewhat causally closed from the seed. An abstract description of the MCMC system that excludes the value of the seed is still a useful abstract description of that system - you can reason about what the algorithm is doing, predict the output within the error bars, etc. In contrast, the algorithm is not very causally closed to, say, idk, some function f() that is called a bunch of times on each iteration of the MCMC. If we leave f() out of our abstract description of the MCMC system, we don't have a very good description of that system, we can't work out much about what the output would be given an input. If the 'mental software' I talk about is as causally closed to some biophysics as the MCMC is causally closed to the seed, then my argument in that post is weak. If however it's only as causally closed to biphysics as our program is to f(), then it's not very causally closed, and my argument in that post is stronger. Hmm, yea this is a good counterexample to my limited "just take the average of those fluctuations" claim. If it's important that my algorithm needs a pseudorandom float between 0 and 1, and I don't have access to the particular PRNG that the algorithm calls, I could replace it with a different PRNG in my abstract description of the MCMC. It won't work exactly the same, but it will still run MCMC and give out a correct answer. To connect it to the brain stuff: say I have a candidate abstraction of the brain that I hope explains the mind. Say temperatures fluctuate in the brain between 38°C and 39°C. Here are 3 possibilities of how this might effect the abstraction: * Maybe in the simulation, we can just set the temperature to 38.5°C, and the simulation still correctly predicts the important features of

A new article in Science Policy Forum voices concern about a particular line of biological research which, if successful in the long term, could eventually create a grave threat to humanity and to most life on Earth.

Fortunately, the threat is distant, and avoidable—but only if we have common knowledge of it.

What follows is an explanation of the threat, what we can do about it, and my comments.

Background: chirality

Glucose, a building block of sugars and starches, looks like this:

Adapted from Wikimedia

But there is also a molecule that is the exact mirror-image of glucose. It is called simply L-glucose (in contrast, the glucose in our food and bodies is sometimes called D-glucose):

L-glucose, the mirror twin of normal D-glucose.
L-glucose, the mirror twin of normal D-glucose. Adapted from Wikimedia

This is not just the same molecule flipped around,...

That was a fascinating post, thanks for writing it!

In Scott Alexander's review of Twelve Rules for Life he discusses how Jordan Peterson and CS. Lewis seem to have the ability to express cliches in ways that don't feel cliched.

Jordan Peterson’s superpower is saying cliches and having them sound meaningful. There are times – like when I have a desperate and grieving patient in front of me – that I would give almost anything for this talent. “You know that she wouldn’t have wanted you to be unhappy.” “Oh my God, you’re right! I’m wasting my life grieving when I could be helping others and making her proud of me, let me go out and do this right now!” If only.

This seems like an undervalued skill, particularly within the rationalist community where the focus tends to...

As far as I can tell, you do not really argue why you think platitudes contain valuable wisdom. You only have one example, and that one is super-vague. 

For me this post would be much better if you added several examples that show in more detail why the platitude is valuable.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
1Sherrinford
"But nowadays curiosity was déclassé. It suggested laziness (why not just ask it?)…" I think that does not work. Asking is easy, so asking is the lazy option.
2Viliam
Compare to asking your colleague something that could be found by 10 seconds of googling. These days, you are supposed to google first. In ten years, you will be supposed to ask an AI for the explanation first, which for many people will also be the last step; and for the more curious ones the expected second and third steps will be something like "try a different prompt", "ask additional questions", "switch to a different AI", etc.
4MondSemmel
How about "idle musings" or "sense of wonder", rather than "curiosity"? I remember a time before I had instant access to google whenever I had a question. Back then, a thought of "I wonder why X" was not immediately followed by googling "why X", but sometimes instead followed by thinking about X (incl. via "shower thoughts"), daydreaming about X, looking up X in a book, etc. It's not exactly bad that we have search engines and LLMs nowadays, but for me it does feel like something was lost, too.

Exactly. But then what does "curiosity" signal? Not laziness (as suggested in the post), but the opposite, right? Just asking seems the lazier version.

1WannabeChthonic
Ich suche ob es eine vorgegeben Sprache auf LessWrong.com gibt. Bisher habe ich lediglich die Beschreibung der Community auf Wikipedia gefunden und nicht soffizielles. Ich muss daher von ausgehen, dass ich auch Beiträge in Deutsch schreiben könnte. I am looking for the site rule defining language on LessWrong.com. So far I have only found the description of the community on Wikipedia and nothing official. I must therefore assume that I could also write posts in german.

No one is explicitly giving a link to the rules or stating an answer otherwise so let me word out what I gathered from the nonverbal feedback.

I have received -7 agreement within a few minutes of posting this. Me assuming "when not defined it must be free game" was tagged as locally invalid and I can see how this is true. In other words: English is the de facto language of the LessWrong forum if not only due to policy then at least due to custom.

I like the LW community and the feature of ForumMagnum/LW2 they developed a lot and I didn't even knew one can ta... (read more)

Are apples good to eat?  Usually, but some apples are rotten.

Do humans have ten fingers?  Most of us do, but plenty of people have lost a finger and nonetheless qualify as "human".

Unless you descend to a level of description far below any macroscopic object - below societies, below people, below fingers, below tendon and bone, below cells, all the way down to particles and fields where the laws are truly universal - then practically every generalization you use in the real world will be leaky.

(Though there may, of course, be some exceptions to the above rule...)

Mostly, the way you deal with leaky generalizations is that, well, you just have to deal.  If the cookie market almost always closes at 10pm, except on Thanksgiving it closes at 6pm, and today...

Self10

Amusing instructive and unfortunate this post's actual meaning got lost in politics. IMO it's one of the better ones.

Am left wondering if "local" here has a technical meaning or is used as a vague pointer.