In his long post on the subject, Charbel-Raphaël argues against theories of impacts of interpretability. I think it's a largely a good, well-argued post, and if the only thing you get out of it is reading that post, I'll be contributing to improving the discourse. There is other material with similar claims that I think are made with low context, and also I should say that I'm not very versed in the history and the various versions of the debate.
At the same time, I disagree with the take.
In this post I'm going to "go high" and debate strong, general forms of the criticism, rather than the more object-level subcomponents. Generalizing away from specifics, I think Charbel-Raphaël's post has three valid general reasons to...
Beautifully argued, Dmitry. Couldn't agree more.
I would also note that I consider the second problem of interpretability basically the central problem of complex systems theory.
I consider the first problem a special case of the central probem of alignment. It's very closely related to the 'no free lunch' problem.
People with aphantasia typically think that when someone says to "picture X in your mind", they're being entirely metaphorical. If you don't have a mind's eye, that's a super reasonable thing to think, but it turns out that you'd be wrong!
In that spirit, I recently discovered that many expressions about "feelings in your body" are not metaphorical. Sometimes, people literally feel a lump in their throat when they feel sad, or literally feel like their head is hot ("hot-headed") when they're angry.
It seems pretty likely to me that there are other non-metaphors that I currently think are metaphors, and likewise for other people here. So: what are some things that you thought were metaphors, that you later discovered were not metaphors?
I've read that imagination (in the sense of conjuring mental imagery) is a spectrum, and I've encountered a test which some but not all phantasic people fail.
I don't recall the details enough to pose it directly, but I think I do recall enough to reinvent the test:
Test details guessed above ...
This is a linkpost for an article I've written for my blog. Readers of LessWrong may want to skip the intro about Bayesian Reasoning, but might find the application to the Peter Miller vs Rootclaim debate quite interesting.
I’ve been a fan of Bayesian Reasoning since the time I’ve read Harry Potter and the Methods of Rationality. In a nutshell, Bayesian Reasoning is a way to believe true things. It is a method to update one’s beliefs given some evidence, so that one ends up with more credence on beliefs that match the evidence.
While Bayesian Reasoning (Wikipedia) is not the only method to find true conclusions, it’s the method with the best mathematical explanation of why it works. However, the method can be difficult to use in practice.
One...
As always, some people need practical advice, and we can’t agree on how any of this works and we are all different and our motivations are different, so figuring out the best things to do is difficult. Here are various hopefully useful notes.
GLP-1 drugs are so effective that the American obesity rate is falling.
...John Burn-Murdoch: While we can’t be certain that the new generation of drugs are behind this reversal, it is highly likely. For one, the decline in
This Nature post looks into theories of why GPL-1 drugs seem to help with essentially everything.
There's also Scott's Why Does Ozempic Cure All Diseases? from awhile back. The Nature article takes a more straightforward scientific journalism approach and largely focuses on immediate biological mechanisms, while Scott is Scott.
Given ambiguity about whether GitHub trains models on private repos, I wonder if there's demand for someone to host a public GitLab (or similar) instance that forbids training models on their repos, and takes appropriate countermeasures against training data web scrapers accessing their public content.
Is COT faithfulness already obsolete? How does it survive the concepts like latent space reasoning, or RL based manipulations(R1-zero)? Is it realistic to think that these highly competitive companies simply will not use them, and simply ignore the compute efficiency?
Our research is centered on empirical research with LLMs. If you are conducting similar research, these tips and tools may help streamline your workflow and increase experiment velocity. We are also releasing two repositories to promote sharing more tooling within the AI safety community.
John Hughes is an independent alignment researcher working with Ethan Perez and was a MATS mentee in the Summer of 2023. In Ethan's previous writeup on research tips, he explains the criteria that strong collaborators often have, and he puts 70% weight on "getting ideas to work quickly." Part of being able to do this is knowing what tools there are at your disposal.
This post, written primarily by John, shares the tools and principles we both use to increase our experimental velocity. Many readers will already...
Threads are managed by the OS and each thread has an overhead in starting up/switching. The asyncio coroutines are more lightweight since they are managed within the Python runtime (rather than OS) and share the memory within the main thread. This allows you to use tens of thousands of async coroutines, which isn't possible with threads AFAIK. So I recommend asyncio for LLM API calls since often, in my experience, I need to scale up to thousands of concurrents. In my opinion, learning about asyncio is a very high ROI for empirical research.
[Thanks to Charlie Steiner, Richard Kennaway, and Said Achmiz for helpful discussion. Extra special thanks to the Long-Term Future Fund for funding research related to this post.]
[Epistemic status: confident]
There's a common pattern in online debates about consciousness. It looks something like this:
One person will try to communicate a belief or idea to someone else, but they cannot get through no matter how hard they try. Here's a made-up example:
"It's obvious that consciousness exists."
-Yes, it sure looks like the brain is doing a lot of non-parallel processing that involves several spatially distributed brain areas at once, so-
"I'm not just talking about the computational process. I mean qualia obviously exist."
-Define qualia.
"You can't define qualia; it's a primitive. But you know what I mean."
-I don't. How could I if you...
[...] Quotations who favor something like IIT [...]
The quotation author in the example I've made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It's the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.
Certainly if you look at philosophers you won't find any connection to IIT since the majority of them lived before IIT was developed.
...Your framing comes across as an attempt to decrement the credibility of people who advocate Quot
Thanks for reminder! I looked at the rejected posts, and... ouch, it hurts.
LLM generated content, crackpottery, low-content posts (could be one sentence, is several pages instead).