Against blanket arguments against interpretability

On blanket criticism and refutation

In his long post on the subject, Charbel-Raphaël argues against theories of impacts of interpretability. I think it's a largely a good, well-argued post, and if the only thing you get out of it is reading that post, I'll be contributing to improving the discourse. There is other material with similar claims that I think are made with low context, and also I should say that I'm not very versed in the history and the various versions of the debate.

At the same time, I disagree with the take.

In this post I'm going to "go high" and debate strong, general forms of the criticism, rather than the more object-level subcomponents. Generalizing away from specifics, I think Charbel-Raphaël's post has three valid general reasons to...

(Continue Reading – 1993 more words)

Alexander Gietelink Oldenziel32m20

Beautifully argued, Dmitry. Couldn't agree more.

I would also note that I consider the second problem of interpretability basically the central problem of complex systems theory.

I consider the first problem a special case of the central probem of alignment. It's very closely related to the 'no free lunch' problem.

Which things were you surprised to learn are not metaphors?

130

Eric Neyman, Garrett Baker

2mo

People with aphantasia typically think that when someone says to "picture X in your mind", they're being entirely metaphorical. If you don't have a mind's eye, that's a super reasonable thing to think, but it turns out that you'd be wrong!

In that spirit, I recently discovered that many expressions about "feelings in your body" are not metaphorical. Sometimes, people literally feel a lump in their throat when they feel sad, or literally feel like their head is hot ("hot-headed") when they're angry.

It seems pretty likely to me that there are other non-metaphors that I currently think are metaphors, and likewise for other people here. So: what are some things that you thought were metaphors, that you later discovered were not metaphors?

espoire32m10

I've read that imagination (in the sense of conjuring mental imagery) is a spectrum, and I've encountered a test which some but not all phantasic people fail.

I don't recall the details enough to pose it directly, but I think I do recall enough to reinvent the test:

Ask the subject to visualize a 3x3 grid of letters.
Provide the information required to construct the visualization in an unusual order, for example top-to-bottom right-to-left for people not accustomed to that layout.
Ask them to read the 3-letter word in each row.

Test details guessed above ... (read more)

Bayesian Reasoning on Maps

Sjlver

36m

This is a linkpost for https://blog.purpureus.net/posts/bayesian-reasoning-on-maps/

This is a linkpost for an article I've written for my blog. Readers of LessWrong may want to skip the intro about Bayesian Reasoning, but might find the application to the Peter Miller vs Rootclaim debate quite interesting.

I’ve been a fan of Bayesian Reasoning since the time I’ve read Harry Potter and the Methods of Rationality. In a nutshell, Bayesian Reasoning is a way to believe true things. It is a method to update one’s beliefs given some evidence, so that one ends up with more credence on beliefs that match the evidence.

While Bayesian Reasoning (Wikipedia) is not the only method to find true conclusions, it’s the method with the best mathematical explanation of why it works. However, the method can be difficult to use in practice.

One...

(See More – 938 more words)

Sleep, Diet, Exercise and GLP-1 Drugs

Zvi

As always, some people need practical advice, and we can’t agree on how any of this works and we are all different and our motivations are different, so figuring out the best things to do is difficult. Here are various hopefully useful notes.

Effectiveness of GLP-1 Drugs

GLP-1 drugs are so effective that the American obesity rate is falling.

John Burn-Murdoch: While we can’t be certain that the new generation of drugs are behind this reversal, it is highly likely. For one, the decline in

...

(Continue Reading – 5333 more words)

Mo Putera41m10

This Nature post looks into theories of why GPL-1 drugs seem to help with essentially everything.

There's also Scott's Why Does Ozempic Cure All Diseases? from awhile back. The Nature article takes a more straightforward scientific journalism approach and largely focuses on immediate biological mechanisms, while Scott is Scott.

5Hide11h

"I don't find this tasty" is not the same thing as "my body doesn't tell me it's good", and this concept is at the core of many suboptimal fad diets, as well as a common blanket justification for being fat and unhealthy. If you eat Krispy Kremes and pizza exclusively, your body will "tell you it's good". The whole reason people get fat in the first place is that the taste and satiety mechanisms we've evolved in an ancestral context are maladaptive for the modern hypercaloric, hyperpalatable environment. If you eat donuts and burgers, and take a multivitamin to avoid deficiencies, I'd challenge you to crush, chew and savour the multivitamin on your tongue and see what your body has to say about that. By omitting vegetables and fruits, you not only risk vitamin deficiencies, but miss out on the most under-appreciated aspect of whole plant foods: their phytonutrient and antioxidant content. Plants have an enormous array of complex, immensely beneficial and poorly understood compounds which interact with our bodies in ways that invariably prove immensely beneficial. You can handwave the ubiquitously agreed upon benefits of fruit and vegetable consumption as "reliant on correlational studies", but this is a major handwave indeed, and includes ignoring the strong mechanistic bases to assume this is almost certainly true. Fundamentally, the obesity epidemic appears largely due to a mismatch between the body's evolved hunger and satiety systems and the foods that have been created to wirehead them. Therefore, using "my body's hunger and satiety systems tell me that eating XYZ is good" is very uncompelling.

Ben Millwood's Shortform

Ben Millwood

6mo

Ben Millwood42m10

Given ambiguity about whether GitHub trains models on private repos, I wonder if there's demand for someone to host a public GitLab (or similar) instance that forbids training models on their repos, and takes appropriate countermeasures against training data web scrapers accessing their public content.

Hopenope's Shortform

Hopenope

1mo

Hopenope1h10

Is COT faithfulness already obsolete? How does it survive the concepts like latent space reasoning, or RL based manipulations(R1-zero)? Is it realistic to think that these highly competitive companies simply will not use them, and simply ignore the compute efficiency?

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Tips and Code for Empirical Research Workflows

John Hughes, Ethan Perez

Ω 302d

Our research is centered on empirical research with LLMs. If you are conducting similar research, these tips and tools may help streamline your workflow and increase experiment velocity. We are also releasing two repositories to promote sharing more tooling within the AI safety community.

John Hughes is an independent alignment researcher working with Ethan Perez and was a MATS mentee in the Summer of 2023. In Ethan's previous writeup on research tips, he explains the criteria that strong collaborators often have, and he puts 70% weight on "getting ideas to work quickly." Part of being able to do this is knowing what tools there are at your disposal.

This post, written primarily by John, shares the tools and principles we both use to increase our experimental velocity. Many readers will already...

(Continue Reading – 5680 more words)

John Hughes1h10

Threads are managed by the OS and each thread has an overhead in starting up/switching. The asyncio coroutines are more lightweight since they are managed within the Python runtime (rather than OS) and share the memory within the main thread. This allows you to use tens of thousands of async coroutines, which isn't possible with threads AFAIK. So I recommend asyncio for LLM API calls since often, in my experience, I need to scale up to thousands of concurrents. In my opinion, learning about asyncio is a very high ROI for empirical research.

2Isaac Dunn15h

I recently switched from using threads to using asyncio, even though I had never used asyncio before. It was a combination of: * Me using cheaper "batch" LLM API calls, which can take hours to return a result * Therefore wanting to run many thousands of tasks in parallel from within one program (to make up for the slow sequential speed of each task) * But at some point, the thread pool raised a generic "can't start a new thread" exception, without giving too much more information. It must have hit a limit somewhere (memory? hardcoded thread limit?), although I couldn't work out where. Maybe the general point is that threads have more overhead, and if you're doing many thousands of things in parallel, asyncio can handle it more reliably.

Why it's so hard to talk about Consciousness

137

Rafael Harth

[Thanks to Charlie Steiner, Richard Kennaway, and Said Achmiz for helpful discussion. Extra special thanks to the Long-Term Future Fund for funding research related to this post.]

[Epistemic status: confident]

There's a common pattern in online debates about consciousness. It looks something like this:

One person will try to communicate a belief or idea to someone else, but they cannot get through no matter how hard they try. Here's a made-up example:

"It's obvious that consciousness exists."

-Yes, it sure looks like the brain is doing a lot of non-parallel processing that involves several spatially distributed brain areas at once, so-

"I'm not just talking about the computational process. I mean qualia obviously exist."

-Define qualia.

"You can't define qualia; it's a primitive. But you know what I mean."

-I don't. How could I if you...

(Continue Reading – 2584 more words)

11quila3h

Has there ever been a case of someone being in camp #1, but eventually realizing, according to their self-report, "I actually do have qualia; it was right in front of me this whole time, but I've only now noticed it as a separate thing needing explanation, even though in retrospect it seems obvious to me"? (I don't mean someone who somehow convinces themself into camp #2 semantically, but still doesn't truly recognize the referent and maybe feels confused about the topic with people writing complex-sounding things in favor of either side, which I'd guess happens rarely.) Such cases (even one) would help me (of camp #2) get evidence about two hypotheses I've seen: (A) illusionists[1] for some reason experience qualia but don't recognize it as a thing needing explaining, instead having some sort of deep unquestioned background assumption that it is 'just how reality is' to a point of not being noticed, and (B) illusionists actually do not experience qualia, which Carl Feynman's first comment (from a camp #1 view) proposed as an explanation for themself. 1. ^ one who believes that those in camp #2 merely have a strong intuition that they have <mysterious-word>, like the intuition people apparently have that they "have free will", i.e. that those of camp #2 are confused about something that should be dissolved by basic reductionism. (I hope my ability to pass your ITT is some evidence that this is not what's happening)

1Lorec9h

Boldface vs Quotation - i.e., the interrogating qualia skeptic, vs the defender of qualia as a gnostic absolute - is a common pattern. However, your Camp #1 and Camp #2 - Boldfaces who favor something like Global Workspace Theory, and Quotations who favor something like IIT - don't exist, in my experience. Antonio Damasio is a Boldface who favors a body-mind explanation for consciousness which is quasi-neurological, but which is much closer to IIT than Global Workspace Theory in its level of description. Descartes was a Quotation who, unlike Chalmers, didn't feel like his qualia, once experienced, left anything to be explained. Most philosophers of consciousness who have ever written, in fact, have nothing to do with your proposed Camp #1 vs Camp #2 division, even though most can be placed somewhere along the Boldface-Quotation spectrum, because whether your intuition is more Boldface vs more Quotation is orthogonal to how confused you are about consciousness and how you go about investigating what confuses you. Your framing comes across as an attempt to decrement the credibility of people who advocate Quotation-type intuition by associating them with IIT, framing Boldfaces [your "Camp #1"] as the "sensible people" by comparison. -- There's been discussion in these comments - more and less serious - about whether it's plausible Boldface and Quotation are talking past each other on account of neurological differences. I think this is quite plausible even before we get into the question of whether Boldfaces could truly lack the qualia Quotations are talking about, or not. In the chapter of Consciousness Explained titled "Multiple Drafts Versus the Cartesian Theater", Dennett sets out to solve the puzzle of how the brain reconstructs conscious experiences of the same distant object to be simultaneous across sensory modes despite the fact that light travels faster [and is processed slower in the brain] than sound. He considers reconstruction of simultaneity across c

Rafael Harth1h20

[...] Quotations who favor something like IIT [...]

The quotation author in the example I've made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It's the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.

Certainly if you look at philosophers you won't find any connection to IIT since the majority of them lived before IIT was developed.

Your framing comes across as an attempt to decrement the credibility of people who advocate Quot

... (read more)

2Rafael Harth14h

Thanks for this description. I'm interested in the phenomenology of red-green colorblind people, but I don't think I completely get how it works yet for you. Questions I have * Do red and green, when you recognize them correctly, seem like subjectively very different colors? * If the answer is yes, if you're shown one of the colors without context (e.g., in a lab setting), does it look red or green? (If the answer is no, I suppose this question doesn't make sense.) * if you see two colors next to each other, then (if I understood you correctly), you can tell whether they're (1) one green, one red or (2) the same color twice. How can you tell?

Legionnaire's Shortform

Legionnaire

7mo

Viliam1h40

Thanks for reminder! I looked at the rejected posts, and... ouch, it hurts.

LLM generated content, crackpottery, low-content posts (could be one sentence, is several pages instead).

LESSWRONG
LW

The 2023 Review
Final Voting

The 2023 Review

Final Voting

Quick Takes

Popular Comments

Recent Discussion

On blanket criticism and refutation

Table of Contents

Effectiveness of GLP-1 Drugs

The 2023 ReviewFinal Voting

The 2023 Review

Final Voting

Quick Takes

Popular Comments

Recent Discussion

On blanket criticism and refutation

Table of Contents

Effectiveness of GLP-1 Drugs

The 2023 Review
Final Voting