I basically agree with this on why we can't assume AIs that are mostly unaligned towards a human's values but has a shard of human values will be nice to us at all, because the cost of niceness is way more than just killing a lot of humans and leaving humans on-planet to die of a future existential catastrophe.
I'd not say that we would die by Occam's Razor, but rather that we die by the need for AIs to aggressively save compute.
I think the axiom that they'd use to prove something like this mathematically probably depends on assuming scale separation, such that you can discover laws, that while not fully accurate, are much better than random chance and cheap to compute, which means you can get more compute to learn new laws, until you hit the limiting point of a Theory of Everything:
In retrospect, I was basically a bit too optimistic about this working out, and a big part of why is I didn't truly grasp how deep value conflicts can be even amongst humans, and I'm now much more skeptical on multi-alignment schemes working because I believe a lot of alignment is broadly because people are powerless relative to the state, but when AI is good enough to create their own nation-states, value conflicts become much more practical, and the basis for a lot of cooperative behavior collapses:
and also means the level of alignment of AI needs to be closer to the fictional benevolent angels than it is to humans in relationship to other humans, so it motivates a more ambitious version of the alignment objectives than making AIs merely not break the law or steal from humans.
I'm actually reasonably hopeful the more ambitious versions of alignment are possible, and think there's a realistic chance we can actually do them.
Yeah, I think this is actually a problem I see here, though admittedly I often see the hypotheses be vaguely formulated, and I kind of agree with Jotto999 that the verbal forecasts give far too much room for leeway here:
I like Eli Tyre's comment here:
My own take on sentience/consciousness is somewhat different from yours, and in the general case, I think Anil Seth's theory is better than Global Workspace Theory, but Global Workspace Theory does explain the weird properties of consciousness in humans better.
For more, read this review:
https://www.lesswrong.com/posts/FQhtpHFiPacG3KrvD/seth-explains-consciousness#7ncCBPLcCwpRYdXuG
On this:
On that note... I'll abstain from strong statements on whether various animals actually have self-models complex enough to be morally relevant. I suspect, however, that almost no-one's planning algorithms are advanced enough to make good use of qualia — and evolution would not grant them senses they can't use. In particular, this capability implies high trust placed by evolution in the planner-part: that sometimes it may know better than the built-in instincts, and should have the ability to plan around them.
But I'm pushing back against this sort of argument. As I've described, a mind in pain does not necessarily experience that pain. The capacity to have qualia of pain corresponds to a specific mental process where the effect of pain on the agent is picked up by a specialized "sensory apparatus" and re-fed as input to the planning module within that agent. This, on a very concrete level, is what having internal experience means. Just track the information flows!
And it's entirely possible for a mind to simply lack that sensory apparatus.
I think that there doesn't need to be high trust placed in the planner part, because self-modeling is basically always useful if you want to stay alive, thanks to the gooder regulator theorem by John Wentworth, and self-modeling/reasoning is more like a continuum than sharply discrete.
On why the hard problem seems hard:
I'm not sure, but I suspect that it's a short-hand for "has inherent moral relevance". It's not tied to "is self-aware" because evolution wanted us to be able to dehumanize criminals and members of competing tribes: view them as beasts, soulless barbarians. So the concept is decoupled from its definition, which means we can imagine incoherent states where things that have what we define as "qualia" don't have qualia, and vice versa.
I think this is a likely answer, and that's despite the post below being bad methodologically for reasons Paradiddle and Sunwillrise says:
https://www.lesswrong.com/posts/KpD2fJa6zo8o2MBxg/consciousness-as-a-conflationary-alliance-term-for
Another reason why the hard problem seems hard is that way too many philosophers are disinclined to gather any data on the phenomenon of interest at all, because they don't have backgrounds in neuroscience, and instead want to purely define consciousness without reference to any empirical reality, which is a very bad approach to learning, and also self-report data is pretty terrible as data, unfortunately, when they do gather data, and they don't realize this.
I have a somewhat different takeaway for ethics.
In contrast to this:
4. Moral Implications. If you accept the above, then the whole "qualia" debacle is just a massive red herring caused by the idiosyncrasies of our mental architecture. What does that imply for ethics?
Well, that's simple: we just have to re-connect the free-floating "qualia" concept with the definition of qualia. We value things that have first-person experiences similar to ours. Hence, we have to isolate the algorithms that allow things to have first-person experiences like ours, then assign things like that moral relevance, and dismiss the moral relevance of everything else.
And with there not being some additional "magical fluid" that can confer moral relevance to a bundle of matter, we can rest assured there won't be any shocking twists where puddles turn out to have been important this entire time.
I instead argue that we need to decouple what we value (which can be arbitrary) from what things actually have qualia (which has a single answer in general), and I absolutely disagree with the claim that we have to dismiss the moral relevance of everything else, and I also disagree with the claim that we have to assign moral relevance to algorithms that allow things to have first-person experiences like ours.
IMO, the I/O part is not about the lack of such a channel, but rather the lack of a channel that is invulnerable to hacking/modification, such that the channel can be assumed to only come from a certain source.
You could always create such a channel, though it isn't fundamental, but rather that you can't create a channel that isn't able to be modified/hacked, such that the channel can be assumed to only come from a certain source.
I like dxu's comment:
https://www.lesswrong.com/s/Rm6oQRJJmhGCcLvxh/p/zcPLNNw4wgBX5k8kQ#uFdZuNY3XxBBakLv7
Re sorites paradoxes, this is why you use a parameter/variable, rather than a predicate, and talk quantitatively instead of using predicates, and why it's useful to focus on the rate at what something is doing/changing.
I talk about how such an approach can handle sorites paradoxes in intelligence:
Jacob Falkovich talks about how people have different minds, and how different minds can have differing experiences of what qualia involves to them, and thus we need to be careful in generalizing from our own mind:
For one, the decoupling of conscious experience from deterministic external causes implies that there’s truly no such thing as a “universal experience”. Our experiences are shared by virtue of being born with similar brains wired to similar senses and observing a similar world of things and people, but each of us infers a generative model all of our own. For every single perception mentioned in Being You it also notes the condition of having a different one, from color blindness to somatoparaphrenia — the experience that one of your limbs belongs to someone else. The typical mind fallacy goes much deeper than mere differences in politics or abstract beliefs.
My own take on what consciousness is in the general case is basically answered by me in the review below, and short form, I think Anil Seth got it close to right, with the mistakes being broadly patchable rather than fatal flaws to a theory:
More here:
https://www.lesswrong.com/posts/FQhtpHFiPacG3KrvD/seth-explains-consciousness#7ncCBPLcCwpRYdXuG
Or in quote form:
For as long as there have been philosophers, they loved philosophizing about what life really is. Plato focused on nutrition and reproduction as the core features of living organisms. Aristotle claimed that it was ultimately about resisting perturbations. In the East the focus was less on function and more on essence: the Chinese posited ethereal fractions of qi as the animating force, similar to the Sanskrit prana or the Hebrew neshama. This lively debate kept rolling for 2,500 years — élan vital is a 20th century coinage — accompanied by the sense of an enduring mystery, a fundamental inscrutability about life that will not yield.
And then, suddenly, this debate dissipated. This wasn’t caused by a philosophical breakthrough, by some clever argument or incisive definition that satisfied all sides and deflected all counters. It was the slow accumulation of biological science that broke “Life” down into digestible components, from the biochemistry of living bodies to the thermodynamics of metabolism to genetics. People may still quibble about how to classify a virus that possesses some but not all of life’s properties, but these semantic arguments aren’t the main concern of biologists. Even among the general public who can’t tell a phospholipid from a possum there’s no longer a sense that there’s some impenetrable mystery regarding how life can arise from mere matter.
Or this comment, which comes from a similar place:
https://www.lesswrong.com/posts/FQhtpHFiPacG3KrvD/seth-explains-consciousness#oghhLpFNsvvN8FHpk
I'm sympathetic to Global Workspace theory as an explanation of some of the weird properties of human consciousness, like the approximate unitarity of it, though this is also explainable by latencies being acceptably low for a human body.
https://www.lesswrong.com/s/ZbmRyDN8TCpBTZSip/p/x4n4jcoDP7xh5LWLq
https://www.lesswrong.com/posts/FEDNY4DLMpRSE3Jsr/neural-basis-for-global-workspace-theory
But that's my take on the debate on consciousness.
Basically, if you want consciousness to matter morally/intrinsically, then you will prefer theories that match your values on what counts as intrinsically valuable, irrespective of the truth of the theory, and in particular, it should be way more surprising than it does that the correct theory of consciousness just so happens to match what you find intrinsically valuable, or at least matches up way more than random chance, because I believe what you value/view as moral is inherently relative, and doesn't really have a relationship to the scientific problem of consciousness.
I think this is part of the reason why people don't exactly like reductive conceptions of consciousness, where consciousness is created by parts like neurons/atoms/quantum fields that people usually don't value in themselves, because they believe that consciousness should come out of parts/units that they think are morally valuable to them, and also part of the reason why people dislike theories that imply that consciousness goes beyond species that they value intrinsically, which is us for most people.
I think every side here is a problem, in that arguments for moral worth of species often are conditionalized on those species being conscious for suffering, and people not wanting to admit that it's totally fine to be okay with someone suffering, even if they are conscious, and it being totally fine to be okay to value something like a rock, or all rocks, that isn't conscious or suffering.
Another way to say it is even if a theory suggests that something you don't value intrinsically is conscious, you don't have to change your values very much, and you can still go about your day mostly fine.
I think a lot of people who aren't you conflate moral value with the science question of "what is consciousness" unintentionally, due to the term being so value-loaded.
Link to long comments that I want to pin, but are too long to be pinned:
https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD