Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Implications of DeepSeek-R1: Yesterday, DeepSeek released a paper on their o1 alternative, R1. A few implications stood out to me:  * Reasoning is easy. A few weeks ago, I described several hypotheses for how o1 works. R1 suggests the answer might be the simplest possible approach: guess & check. No need for fancy process reward models, no need for MCTS. * Small models, big think. A distilled 7B-parameter version of R1 beats GPT-4o and Claude-3.5 Sonnet new on several hard math benchmarks. There appears to be a large parameter overhang. * Proliferation by default. There's an implicit assumption in many AI safety/governance proposals that AGI development will be naturally constrained to only a few actors because of compute requirements. Instead, we seem to be headed to a world where: * Advanced capabilities can be squeezed into small, efficient models that can run on commodity hardware. * Proliferation is not bottlenecked by infrastructure. * Regulatory control through hardware restriction becomes much less viable. For now, training still needs industrial compute. But it's looking increasingly like we won't be able to contain what comes after.
I first encountered this tweet taped to the wall in OpenAI's office where the Superalignment team sat: RIP Superalignment team. Much respect for them.
Brief intro/overview of the technical AGI alignment problem as I see it: To a first approximation, there are two stable attractor states that an AGI project, and perhaps humanity more generally, can end up in, as weak AGI systems become stronger towards superintelligence, and as more and more of the R&D process – and the datacenter security system, and the strategic advice on which the project depends – is handed over to smarter and smarter AIs. In the first attractor state, the AIs are aligned to their human principals and becoming more aligned day by day thanks to applying their labor and intelligence to improve their alignment. The humans’ understanding of, and control over, what’s happening is high and getting higher. In the second attractor state, the humans think they are in the first attractor state, but are mistaken: Instead, the AIs are pretending to be aligned, and are growing in power and subverting the system day by day, even as (and partly because) the human principals are coming to trust them more and more. The humans’ understanding of, and control over, what’s happening is low and getting lower. The humans may eventually realize what’s going on, but only when it’s too late – only when the AIs don’t feel the need to pretend anymore. (One can imagine alternatives – e.g. the AIs are misaligned but the humans know this and are deploying them anyway, perhaps with control-based safeguards; or maybe the AIs are aligned but have chosen to deceive the humans and/or wrest control from them, but that’s OK because the situation calls for it somehow. But they seem less likely than the above, and also more unstable.) Which attractor state is more likely, if the relevant events happen around 2027? I don’t know, but here are some considerations: * In many engineering and scientific domains, it’s common for something to seem like it’ll work when in fact it won’t. A new rocket design usually blows up in the air several times before it succeeds, despite lots of o
jimrandomh393
0
Recently, a lot of very-low-quality cryptocurrency tokens have been seeing enormous "market caps". I think a lot of people are getting confused by that, and are resolving the confusion incorrectly. If you see a claim that a coin named $JUNK has a market cap of $10B, there are three possibilities. Either: (1) The claim is entirely false, (2) there are far more fools with more money than expected, or (3) the $10B number is real, but doesn't mean what you're meant to think it means. The first possibility, that the number is simply made up, is pretty easy to cross off; you can check with a third party. Most people settle on the second possibility: that there are surprisingly many fools throwing away their money. The correct answer is option 3: "market cap" is a tricky concept. And, it turns out that fixing the misconception here also resolves several confusions elsewhere. (This is sort-of vagueblogging a current event, but the same current event has been recurring every week with different names on it for over a year now. So I'm explaining the pattern, and deliberately avoiding mention of any specific memecoin.) Suppose I autograph a hat, then offer to sell you one-trillionth of that hat for $1. You accept. This hat now has a "market cap" of $1T. Of course, it would be silly (or deceptive) if people then started calling me a trillionaire. Meme-coins work similarly, but with extra steps. The trick is that while they superficially look like a market of people trading with each other, in reality almost all trades have the coin's creator on one side of the transaction, they control the price, and they optimize the price for generating hype. Suppose I autograph a hat, call it HatCoin, and start advertising it. Initially there are 1000 HatCoins, and I own all of them. I get 4 people, arriving one at a time, each of whom decides to spend $10 on HatCoin. They might be thinking of it as an investment, or they might be thinking of it as a form of gambling, or they might be u
leogao94
0
don't worry too much about doing things right the first time. if the results are very promising, the cost of having to redo it won't hurt nearly as much as you think it will. but if you put it off because you don't know exactly how to do it right, then you might never get around to it.

Popular Comments

Recent Discussion

tl;dr: If a copy is not identical to the original, MWI predicts that I will always observe myself surviving failed Mars teleportations rather than becoming the copy on Mars. 

Background

The classic teleportation thought-experiment asks whether a perfect copy is "you". This normally presents as a pure decision problem – do you step into the teleporter? But I suggest we can construct real experiments yielding observational evidence about personal identity.

The Quantum Mars Teleporter Protocol

Consider a "teleporter" connecting Earth & Mars with two key properties:

1. It creates a perfect copy on Mars using scanning

2. The original is destroyed with probability *p* = 0.999 (controlled by quantum randomness)

Under different identity theories, this yields divergent predictions:

- If copy ≠ original: Due to quantum immortality, the observer should *always* find themselves as the...

On blanket criticism and refutation

In his long post on the subject, Charbel-Raphaël argues against theories of impacts of interpretability. I think it's a largely a good, well-argued post, and if the only thing you get out of it is reading that post, I'll be contributing to improving the discourse. There is other material with similar claims that I think are made with low context, and also I should say that I'm not very versed in the history and the various versions of the debate. 

At the same time, I disagree with the take.

In this post I'm going to "go high" and debate strong, general forms of the criticism, rather than the more object-level subcomponents. Generalizing away from specifics, I think Charbel-Raphaël's post has three valid general reasons to...

Beautifully argued, Dmitry. Couldn't agree more. 

I would also note that I consider the second problem of interpretability basically the central problem of complex systems theory. 

I consider the first problem a special case of the central probem of alignment. It's very closely related to the 'no free lunch'  problem. 

People with aphantasia typically think that when someone says to "picture X in your mind", they're being entirely metaphorical. If you don't have a mind's eye, that's a super reasonable thing to think, but it turns out that you'd be wrong!

In that spirit, I recently discovered that many expressions about "feelings in your body" are not metaphorical. Sometimes, people literally feel a lump in their throat when they feel sad, or literally feel like their head is hot ("hot-headed") when they're angry.

It seems pretty likely to me that there are other non-metaphors that I currently think are metaphors, and likewise for other people here. So: what are some things that you thought were metaphors, that you later discovered were not metaphors?

I've read that imagination (in the sense of conjuring mental imagery) is a spectrum, and I've encountered a test which some but not all phantasic people fail.

 

I don't recall the details enough to pose it directly, but I think I do recall enough to reinvent the test:

  • Ask the subject to visualize a 3x3 grid of letters.
  • Provide the information required to construct the visualization in an unusual order, for example top-to-bottom right-to-left for people not accustomed to that layout.
  • Ask them to read the 3-letter word in each row.

Test details guessed above ... (read more)

This is a linkpost for an article I've written for my blog. Readers of LessWrong may want to skip the intro about Bayesian Reasoning, but might find the application to the Peter Miller vs Rootclaim debate quite interesting.


I’ve been a fan of Bayesian Reasoning since the time I’ve read Harry Potter and the Methods of Rationality. In a nutshell, Bayesian Reasoning is a way to believe true things. It is a method to update one’s beliefs given some evidence, so that one ends up with more credence on beliefs that match the evidence.

While Bayesian Reasoning (Wikipedia) is not the only method to find true conclusions, it’s the method with the best mathematical explanation of why it works. However, the method can be difficult to use in practice.

One...

As always, some people need practical advice, and we can’t agree on how any of this works and we are all different and our motivations are different, so figuring out the best things to do is difficult. Here are various hopefully useful notes.

Table of Contents

  1. Effectiveness of GLP-1 Drugs.
  2. What Passes for Skepticism on GLP-1s.
  3. The Joy of Willpower.
  4. Talking Supply.
  5. Talking Price.
  6. GLP-1 Inhibitors Help Solve All Your Problems.
  7. Dieting the Hard Way.
  8. Nutrients.
  9. Are Vegetables a Scam?.
  10. Government Food Labels Are Often Obvious Nonsense.
  11. Sleep.
  12. Find a Way to Enjoy Exercise.
  13. A Note on Alcohol.
  14. Focus Only On What Matters.

Effectiveness of GLP-1 Drugs

GLP-1 drugs are so effective that the American obesity rate is falling.

Image

John Burn-Murdoch: While we can’t be certain that the new generation of drugs are behind this reversal, it is highly likely. For one, the decline in

...

This Nature post looks into theories of why GPL-1 drugs seem to help with essentially everything.

There's also Scott's Why Does Ozempic Cure All Diseases? from awhile back. The Nature article takes a more straightforward scientific journalism approach and largely focuses on immediate biological mechanisms, while Scott is Scott.

5Hide
"I don't find this tasty" is not the same thing as "my body doesn't tell me it's good", and this concept is at the core of many suboptimal fad diets, as well as a common blanket justification for being fat and unhealthy. If you eat Krispy Kremes and pizza exclusively, your body will "tell you it's good". The whole reason people get fat in the first place is that the taste and satiety mechanisms we've evolved in an ancestral context are maladaptive for the modern hypercaloric, hyperpalatable environment. If you eat donuts and burgers, and take a multivitamin to avoid deficiencies, I'd challenge you to crush, chew and savour the multivitamin on your tongue and see what your body has to say about that. By omitting vegetables and fruits, you not only risk vitamin deficiencies, but miss out on the most under-appreciated aspect of whole plant foods: their phytonutrient and antioxidant content. Plants have an enormous array of complex, immensely beneficial and poorly understood compounds which interact with our bodies in ways that invariably prove immensely beneficial.  You can handwave the ubiquitously agreed upon benefits of fruit and vegetable consumption as "reliant on correlational studies", but this is a major handwave indeed, and includes ignoring the strong mechanistic bases to assume this is almost certainly true. Fundamentally, the obesity epidemic appears largely due to a mismatch between the body's evolved hunger and satiety systems and the foods that have been created to wirehead them. Therefore, using "my body's hunger and satiety systems tell me that eating XYZ is good" is very uncompelling.  

Given ambiguity about whether GitHub trains models on private repos, I wonder if there's demand for someone to host a public GitLab (or similar) instance that forbids training models on their repos, and takes appropriate countermeasures against training data web scrapers accessing their public content.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Is COT faithfulness already obsolete?  How does it survive the concepts like latent space reasoning, or RL based manipulations(R1-zero)? Is it realistic to think that these highly competitive companies simply will not use them, and simply ignore the compute efficiency? 

Our research is centered on empirical research with LLMs. If you are conducting similar research, these tips and tools may help streamline your workflow and increase experiment velocity. We are also releasing two repositories to promote sharing more tooling within the AI safety community.

John Hughes is an independent alignment researcher working with Ethan Perez and was a MATS mentee in the Summer of 2023. In Ethan's previous writeup on research tips, he explains the criteria that strong collaborators often have, and he puts 70% weight on "getting ideas to work quickly." Part of being able to do this is knowing what tools there are at your disposal.

This post, written primarily by John, shares the tools and principles we both use to increase our experimental velocity. Many readers will already...

Threads are managed by the OS and each thread has an overhead in starting up/switching. The asyncio coroutines are more lightweight since they are managed within the Python runtime (rather than OS) and share the memory within the main thread. This allows you to use tens of thousands of async coroutines, which isn't possible with threads AFAIK. So I recommend asyncio for LLM API calls since often, in my experience, I need to scale up to thousands of concurrents. In my opinion, learning about asyncio is a very high ROI for empirical research.

2Isaac Dunn
I recently switched from using threads to using asyncio, even though I had never used asyncio before. It was a combination of: * Me using cheaper "batch" LLM API calls, which can take hours to return a result * Therefore wanting to run many thousands of tasks in parallel from within one program (to make up for the slow sequential speed of each task) * But at some point, the thread pool raised a generic "can't start a new thread" exception, without giving too much more information. It must have hit a limit somewhere (memory? hardcoded thread limit?), although I couldn't work out where. Maybe the general point is that threads have more overhead, and if you're doing many thousands of things in parallel, asyncio can handle it more reliably.

[Thanks to Charlie Steiner, Richard Kennaway, and Said Achmiz for helpful discussion. Extra special thanks to the Long-Term Future Fund for funding research related to this post.]

[Epistemic status: confident]

There's a common pattern in online debates about consciousness. It looks something like this:

One person will try to communicate a belief or idea to someone else, but they cannot get through no matter how hard they try. Here's a made-up example:


"It's obvious that consciousness exists."

-Yes, it sure looks like the brain is doing a lot of non-parallel processing that involves several spatially distributed brain areas at once, so-

"I'm not just talking about the computational process. I mean qualia obviously exist."

-Define qualia.

"You can't define qualia; it's a primitive. But you know what I mean."

-I don't. How could I if you...

13quila
Has there ever been a case of someone being in camp #1, but eventually realizing, according to their self-report, "I actually do have qualia; it was right in front of me this whole time, but I've only now noticed it as a separate thing needing explanation, even though in retrospect it seems obvious to me"? (I don't mean someone who somehow convinces themself into camp #2 semantically, but still doesn't truly recognize the referent and maybe feels confused about the topic with people writing complex-sounding things in favor of either side, which I'd guess happens rarely.) Such cases (even one) would help me (of camp #2) get evidence about two hypotheses I've seen: (A) illusionists[1] for some reason experience qualia but don't recognize it as a thing needing explaining, instead having some sort of deep unquestioned background assumption that it is 'just how reality is' to a point of not being noticed, and (B) illusionists actually do not experience qualia, which Carl Feynman's first comment (from a camp #1 view) proposed as an explanation for themself. 1. ^ one who believes that those in camp #2 merely have a strong intuition that they have <mysterious-word>, like the intuition people apparently have that they "have free will", i.e. that those of camp #2 are confused about something that should be dissolved by basic reductionism. (I hope my ability to pass your ITT is some evidence that this is not what's happening)
1Lorec
Boldface vs Quotation - i.e., the interrogating qualia skeptic, vs the defender of qualia as a gnostic absolute - is a common pattern. However, your Camp #1 and Camp #2 - Boldfaces who favor something like Global Workspace Theory, and Quotations who favor something like IIT - don't exist, in my experience. Antonio Damasio is a Boldface who favors a body-mind explanation for consciousness which is quasi-neurological, but which is much closer to IIT than Global Workspace Theory in its level of description. Descartes was a Quotation who, unlike Chalmers, didn't feel like his qualia, once experienced, left anything to be explained. Most philosophers of consciousness who have ever written, in fact, have nothing to do with your proposed Camp #1 vs Camp #2 division, even though most can be placed somewhere along the Boldface-Quotation spectrum, because whether your intuition is more Boldface vs more Quotation is orthogonal to how confused you are about consciousness and how you go about investigating what confuses you. Your framing comes across as an attempt to decrement the credibility of people who advocate Quotation-type intuition by associating them with IIT, framing Boldfaces [your "Camp #1"] as the "sensible people" by comparison. -- There's been discussion in these comments - more and less serious - about whether it's plausible Boldface and Quotation are talking past each other on account of neurological differences. I think this is quite plausible even before we get into the question of whether Boldfaces could truly lack the qualia Quotations are talking about, or not. In the chapter of Consciousness Explained titled "Multiple Drafts Versus the Cartesian Theater", Dennett sets out to solve the puzzle of how the brain reconstructs conscious experiences of the same distant object to be simultaneous across sensory modes despite the fact that light travels faster [and is processed slower in the brain] than sound. He considers reconstruction of simultaneity across c

[...] Quotations who favor something like IIT [...]

The quotation author in the example I've made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It's the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.

Certainly if you look at philosophers you won't find any connection to IIT since the majority of them lived before IIT was developed.

Your framing comes across as an attempt to decrement the credibility of people who advocate Quot

... (read more)
2Rafael Harth
Thanks for this description. I'm interested in the phenomenology of red-green colorblind people, but I don't think I completely get how it works yet for you. Questions I have * Do red and green, when you recognize them correctly, seem like subjectively very different colors? * If the answer is yes, if you're shown one of the colors without context (e.g., in a lab setting), does it look red or green? (If the answer is no, I suppose this question doesn't make sense.) * if you see two colors next to each other, then (if I understood you correctly), you can tell whether they're (1) one green, one red or (2) the same color twice. How can you tell?