How does it work to optimize for realistic goals in physical environments of which you yourself are a part? E.g. humans and robots in the real world, and not humans and AIs playing video games in virtual worlds where the player not part of the environment. The authors claim we don't actually have a good theoretical understanding of this and explore four specific ways that we don't understand this process.
For human-level AI (HLAI) we will need robust control or alignment methods. Assuming short timelines to HLAI, the tractability of automating safety research becomes central. In this post, I will make the case that safety-relevant progress on automated interpretability R&D is likely; however, naive interpretability automation may only be usable on the subset of safety problems having well-specified objectives. My argument relies crucially on the possibility of automatically verifying interpretability progress. For other alignment directions (e.g. corrigibility, studying power-seeking, etc.) which do not admit automatic verification, it appears unjustified to assume automation within the same time-horizon in the absence of a clear argument for automation tractability. I am optimistic that further thinking on automation prospects could identify other automation-tractable areas of alignment and control (e.g. see here...
In METR: Measuring AI Ability to Complete Long Tasks found a Moore's law like trend relating (model release date) to (time needed for a human to do a task the model can do).
Here is their rationale for plotting this.
...Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most exam-style problems for a fraction of the cost. With some task-specific adaptation, they can also serve as useful tools in many applications. And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. They are unable to reliably handle even relatively low-skill, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in
Yes, but it didn't mean that AIs could do all kinds of long tasks in 2005. And that is the conclusion many people seem to draw from the METR paper.
as we use the term, yes. But the point (and I should have made that more clear) is that any mismodeling of the parent of the interests of the child's interests and future environment will not be visible to the child or even someone reading the thoughts of the well-meaning parent. So many parents want the best for their child, but model the future of the child wrongly (mostly by status quo bias; the problem is different for AI).
I think it's more easier to discuss AI progress in terms of economy growth rather than just focusing on the scale of the largest training runs and compute used.
From their X announcement:
We developed GATE: a model that shows how AI scaling and automation will impact growth.
It predicts trillion‐dollar infrastructure investments, 30% annual growth, and full automation in decades.
I am not writing this to give anyone any ideas. However, it seems to me that hackers are going to be able to leverage distributed language models in the near future, and I think that we should be creating proactive countermeasures before the tech is powerful enough to cause real damage. I predict that with globally distributed language model training having been proven possible, we will soon see bot-net style distributed inference. If you have enough devices on a home network, it is already trivial to run a very large model with the compute distributed across your phone, iPad, laptop, etc.
In my opinion, it is only a matter of time before hackers leverage distributed model inference to create self-propagating Super-Worms. These Super-Worms would function similar to a...
Regardless of the details, we ought to prioritize taking all of our power plants, water purification stations, and nuclear facilities out of the world-wide-web.
I think it's very questionable, to make major safety policy "regardless of the details". If you want to increase the safety of power plants, listening to the people who are responsible for the safety of power plants and their analysis of the details, is likely a better step instead of making these kind of decisions without understanding the details.
No, they didn’t. Not so fast, and not quite my job. But OpenAI is trying. Consider this a marker to look back upon in the future, as a reflection.
Before proceeding, if you haven’t yet, it’s probably worth reading the story itself. I’m going to repost the whole thing, since it was previously on Twitter and I presume OpenAI would want me to quote it.
...Sam Altman: we trained a new model that is good at creative writing (not sure yet how/when it will get released). this is the first time i have been really struck by something written by AI; it got the vibe of metafiction so right.
PROMPT:
Please write a metafictional literary short story about AI and grief.
COMPLETION:
Before we go any further, I
Any non-uniform prior inherently encodes a bias toward simplicity. This isn't an additional assumption we need to make - it falls directly out of the mathematics.
For any hypothesis h, the information content is $I(h) = -\log(P(h))$, which means probability and complexity have an exponential relationship: $P(h) = e^{-I(h)}$
This demonstrates that simpler hypotheses (those with lower information content) are automatically assigned higher probabilities. The exponential relationship creates a strong bias toward simplicity without requiring any special mechanisms.
The "simplicity prior" is essentially tautological - more probable things are simple by definition.
This post is a distillation of a recent work in AI-assisted human coordination from Google DeepMind.
The paper has received some press attention, and anecdotally, it has become the de-facto example that people bring up of AI used to improve group discussions.
Since this work represents a particular perspective/bet on how advanced AI could help improve human coordination, the following explainer is to bring anyone curious up to date. I’ll be referencing both the published paper as well as the supplementary materials.
The Habermas Machine[1] (HM) is a scaffolded pair of LLMs designed to find consensus among people who disagree, and help them converge to a common point of view. Human participants are asked to give their opinions in response to a binary question (E.g. “Should voting be compulsory?”). Participants give their level of agreement[2], as well...
I'm also working on a deliberation tool with a similar philosophy, but with a stronger emphasis on generating structured output from participants.
I've noticed that discussions can often devolve into arguments, where we fixate on conclusions and pre-existing beliefs, rather than critically examining the underlying methods and prerequisites that shape events or our reasoning. I believe structured self-reflection, like writing an academic paper before engaging in debate, can help. The absence of an audience or judgment during self-reflection encourages partic...