It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.
There’s this popular trope in fiction about a character being mind controlled without losing awareness of what’s happening. Think Jessica Jones, The Manchurian Candidate or Bioshock. The villain uses some magical technology to take control of your brain - but only the part of your brain that’s responsible for motor control. You remain conscious and experience everything with full clarity.
If it’s a children’s story, the villain makes you do embarrassing things like walk through the street naked, or maybe punch yourself in the face. But if it’s an adult story, the villain can do much worse. They can make you betray your values, break your commitments and hurt your loved ones. There are some things you’d rather die than do. But the villain won’t let you stop....
Unfortunately I don't, I've now seen this often enough that it didn't strike me as worth recording, other than posting to the project slack.
But here's as much as I remember, for posterity: I was training a twist function using the Twisted Sequential Monte Carlo framework https://arxiv.org/abs/2404.17546 . I started with a standard, safety-tuned open model, and wanted to train a twist function that would modify the predicted token logits to generate text that is 1) harmful (as judged by a reward model), but also, conditioned on that, 2) as similar to the or...
Over the past six months there's been a huge amount of discussion in the Davis Square Facebook group about a proposal to build a 25-story building in Davis Square: retail on the ground floor, 500 units of housing above, 100 of the units affordable. I wrote about this a few weeks ago, weighing the housing benefits against the impact to current businesses (while the Burren, Dragon Pizza, etc have invitations to return at their current rent, this would still be super disruptive to them if they even did return).
The impact to local businesses is not the only issue people raise, however, and I wanted to get a better overall understanding of how people view it. I went over the thousands of comments on the posts (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) over the last six months, and categorized the objections I saw. Overall I...
Thankfully rising land prices due to agglomeration effect is not a thing and the number of people in town is constant...
Don't get me wrong, building more housing is good, actually. But it's going to be only marginal improvement, without addressing the systemic issues with land capturing a huge share of economic gains, renting economy and real-estate speculators. These issues are not solvable without a substantial Land Value Tax.
One potential angle: automating software won't be worth very much if multiple players can do it and profits are competed to zero. Look at compilers - almost no one is writing assembly or their own compilers, and yet the compiler writers haven't earned billions or trillions of dollars. With many technologies, the vast majority of value is often consumer surplus never captured by producers.
In general I agree with your point. If evidence of transformative AI was close, you'd strategically delay fundraising as late as possible. However, if you have uncertainty...
Scott Alexander famously warned us to Beware Trivial Inconveniences.
When you make a thing easy to do, people often do vastly more of it.
When you put up barriers, even highly solvable ones, people often do vastly less.
Let us take this seriously, and carefully choose what inconveniences to put where.
Let us also take seriously that when AI or other things reduce frictions, or change the relative severity of frictions, various things might break or require adjustment.
This applies to all system design, and especially to legal and regulatory questions.
One case of the changing the level of friction drastically changing things was when, in the late 1990s and 2000s, Napster and successive services made spreading copyrighted files much, much easier than it had been. These days you don't need to pirate your music because you can get almost any recorded song on YouTube whenever you want for free (possibly with an ad) or on Spotify for a cheap subscription fee...
OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Edited in Addendum:
Epoch has this to say in their FAQ:
The difference between our results and OpenAI’s might be due to OpenAI evaluating with a more powerful internal scaffold, using more test-time compute, or because those results were run on a different subset of FrontierMath (the 180 problems in
frontiermath-2024-11-26
vs the 290 problems infrontiermath-2025-02-28-private
).
Which had Python access.
the reason why my first thought was that they used more inference is that ARC Prize specifies that that's how they got their ARC-AGI score (https://arcprize.org/blog/oai-o3-pub-breakthrough) - my read on this graph is that they spent $300k+ on getting their score (there's 100 questions in the semi-private eval). o3 high, not o3-mini high, but this result is pretty strong proof of concept that they're willing to spend a lot on inference for good scores.
In an interview, Erica Komisar discusses parenting extensively. I appreciate anyone who thinks deeply about parenting, so I mostly value Erica's contributions. However, I believe she is mistaken on many points.
One claim she makes follows a broader pattern that I find troubling. To paraphrase:
"Fathers can take on the primary caregiver role, but they would be fighting biology. It goes against what all mammals do."
I see this kind of reasoning frequently-arguments that begin with "From an evolutionary standpoint..." or "For mammals..." But this argument is nonsense. Not only is it incorrect, but I suspect that most people, when pressed, would find it indefensible. It often functions as a stand-in for more rigorous reasoning.
...Disclaimer: Erica makes more nuanced claims to support her perspective. Here, I am only critiquing one
By Roland Pihlakas, Sruthi Kuriakose, Shruti Datta Gupta
Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser". Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL runaway optimisation problems are still relevant with LLMs as well.
Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble runaway optimisers in the following distinct ways:
Our findings suggest that long-running scenarios are important. Systematic failures emerge after periods...
Better, thanks!
"CS-ReFT finetunes LLMs at the subspace level, enabling the much smaller Llama-2-7B to surpass GPT-3.5's performance using only 0.0098% of model parameters."
https://x.com/IntologyAI/status/1901697581488738322
The Moravec paradox is an observation that high-level reasoning is relatively simple for computers, while sensorimotor skills that humans find effortless are computationally challenging. This is why AI is superhuman at chess but we have no self driving cars. Evolutionarily recent developments such as critical thinking is easier for computers than older ones, because these recent developments are less efficient in humans, and they are computationally simpler anyway (much more straightforward to pick one good decision among many rather than to move a limb through space).
This is what fast takeoff looks like. The paper is very math heavy, and the solution is very intelligent. It is...
Thanks!
So, the claim here is that this is a better "artificial AI scientist" compared to what we've seen so far.
There is a tech report https://github.com/IntologyAI/Zochi/blob/main/Zochi_Technical_Report.pdf, but the "AI scientist" itself is not open source, and the tech report does not disclose much (besides confirming that this is a multi-agent thing).
This might end up being a new milestone (but it's too early to conclude that; the comparison is not quite "apple-to-apple", there is human feedback in the process of its work, and humans make edits to the f...
follow up: if you would disagree-vote with a react but not karma downvote, you can use the opposite react.