mjg59
"Why does ACPI exist" - - the greatest thread in the history of forums, locked by a moderator after 12,239 pages of heated debate, wait no let me start again.
Why does ACPI exist? In the beforetimes power management on x86 was done by jumping to an opaque BIOS entry point and hoping it would do the right thing. It frequently didn't. We called this Advanced Power Management (Advanced because before this power management involved custom drivers for every machine and everyone agreed that this was a bad idea), and it involved the firmware having to save and restore the state of every piece of hardware in the system. This meant that assumptions about hardware configuration were baked into the firmware - failed to program your graphics card exactly the way the BIOS expected? Hurrah! It's only saved and restored a subset of the state that you configured and now potential data corruption for you. The developers of ACPI made the reasonable decision that, well, maybe since the OS was the one setting state in the first place, the OS should restore it.
So far so good. But some state is fundamentally device specific, at a level that the OS generally ignores. How should this state be managed? One way to do that would be to have the OS know about the device specific details. Unfortunately that means you can't ship the computer without having OS support for it, which means having OS support for every device (exactly what we'd got away from with APM). This, uh, was not an option the PC industry seriously considered. The alternative is that you ship something that abstracts the details of the specific hardware and makes that abstraction available to the OS. This is what ACPI does, and it's also what things like Device Tree do. Both provide static information about how the platform is configured, which can then be consumed by the OS and avoid needing device-specific drivers or configuration to be built-in.
The main distinction between Device Tree and ACPI is that Device Tree is purely a description of the hardware that exists, and so still requires the OS to know what's possible - if you add a new type of power controller, for instance, you need to add a driver for that to the OS before you can express that via Device Tree. ACPI decided to include an interpreted language to allow vendors to expose functionality to the OS without the OS needing to know about the underlying hardware. So, for instance, ACPI allows you to associate a device with a function to power down that device. That function may, when executed, trigger a bunch of register accesses to a piece of hardware otherwise not exposed to the OS, and that hardware may then cut the power rail to the device to power it down entirely. And that can be done without the OS having to know anything about the control hardware.
How is this better than just calling into the firmware to do it? Because the fact that ACPI declares that it's going to access these registers means the OS can figure out that it shouldn't, because it might otherwise collide with what the firmware is doing. With APM we had no visibility into that - if the OS tried to touch the hardware at the same time APM did, boom, almost impossible to debug failures (This is why various hardware monitoring drivers refuse to load by default on Linux - the firmware declares that it's going to touch those registers itself, so Linux decides not to in order to avoid race conditions and potential hardware damage. In many cases the firmware offers a collaborative interface to obtain the same data, and a driver can be written to get that. this bug comment discusses this for a specific board)
Unfortunately ACPI doesn't entirely remove opaque firmware from the equation - ACPI methods can still trigger System Management Mode, which is basically a fancy way to say "Your computer stops running your OS, does something else for a while, and you have no idea what". This has all the same issues that APM did, in that if the hardware isn't in exactly the state the firmware expects, bad things can happen. While historically there were a bunch of ACPI-related issues because the spec didn't define every single possible scenario and also there was no conformance suite (eg, should the interpreter be multi-threaded? Not defined by spec, but influences whether a specific implementation will work or not!), these days overall compatibility is pretty solid and the vast majority of systems work just fine - but we do still have some issues that are largely associated with System Management Mode.
One example is a recent Lenovo one, where the firmware appears to try to poke the NVME drive on resume. There's some indication that this is intended to deal with transparently unlocking self-encrypting drives on resume, but it seems to do so without taking IOMMU configuration into account and so things explode. It's kind of understandable why a vendor would implement something like this, but it's also kind of understandable that doing so without OS cooperation may end badly.
This isn't something that ACPI enabled - in the absence of ACPI firmware vendors would just be doing this unilaterally with even less OS involvement and we'd probably have even more of these issues. Ideally we'd "simply" have hardware that didn't support transitioning back to opaque code, but we don't (ARM has basically the same issue with TrustZone). In the absence of the ideal world, by and large ACPI has been a net improvement in Linux compatibility on x86 systems. It certainly didn't remove the "Everything is Windows" mentality that many vendors have, but it meant we largely only needed to ensure that Linux behaved the same way as Windows in a finite number of ways (ie, the behaviour of the ACPI interpreter) rather than in every single hardware driver, and so the chances that a new machine will work out of the box are much greater than they were in the pre-ACPI period.
There's an alternative universe where we decided to teach the kernel about every piece of hardware it should run on. Fortunately (or, well, unfortunately) we've seen that in the ARM world. Most device-specific simply never reaches mainline, and most users are stuck running ancient kernels as a result. Imagine every x86 device vendor shipping their own kernel optimised for their hardware, and now imagine how well that works out given the quality of their firmware. Does that really seem better to you?
It's understandable why ACPI has a poor reputation. But it's also hard to figure out what would work better in the real world. We could have built something similar on top of Open Firmware instead but the distinction wouldn't be terribly meaningful - we'd just have Forth instead of the ACPI bytecode language. Longing for a non-ACPI world without presenting something that's better and actually stands a reasonable chance of adoption doesn't make the world a better place.
Why does ACPI exist? In the beforetimes power management on x86 was done by jumping to an opaque BIOS entry point and hoping it would do the right thing. It frequently didn't. We called this Advanced Power Management (Advanced because before this power management involved custom drivers for every machine and everyone agreed that this was a bad idea), and it involved the firmware having to save and restore the state of every piece of hardware in the system. This meant that assumptions about hardware configuration were baked into the firmware - failed to program your graphics card exactly the way the BIOS expected? Hurrah! It's only saved and restored a subset of the state that you configured and now potential data corruption for you. The developers of ACPI made the reasonable decision that, well, maybe since the OS was the one setting state in the first place, the OS should restore it.
So far so good. But some state is fundamentally device specific, at a level that the OS generally ignores. How should this state be managed? One way to do that would be to have the OS know about the device specific details. Unfortunately that means you can't ship the computer without having OS support for it, which means having OS support for every device (exactly what we'd got away from with APM). This, uh, was not an option the PC industry seriously considered. The alternative is that you ship something that abstracts the details of the specific hardware and makes that abstraction available to the OS. This is what ACPI does, and it's also what things like Device Tree do. Both provide static information about how the platform is configured, which can then be consumed by the OS and avoid needing device-specific drivers or configuration to be built-in.
The main distinction between Device Tree and ACPI is that Device Tree is purely a description of the hardware that exists, and so still requires the OS to know what's possible - if you add a new type of power controller, for instance, you need to add a driver for that to the OS before you can express that via Device Tree. ACPI decided to include an interpreted language to allow vendors to expose functionality to the OS without the OS needing to know about the underlying hardware. So, for instance, ACPI allows you to associate a device with a function to power down that device. That function may, when executed, trigger a bunch of register accesses to a piece of hardware otherwise not exposed to the OS, and that hardware may then cut the power rail to the device to power it down entirely. And that can be done without the OS having to know anything about the control hardware.
How is this better than just calling into the firmware to do it? Because the fact that ACPI declares that it's going to access these registers means the OS can figure out that it shouldn't, because it might otherwise collide with what the firmware is doing. With APM we had no visibility into that - if the OS tried to touch the hardware at the same time APM did, boom, almost impossible to debug failures (This is why various hardware monitoring drivers refuse to load by default on Linux - the firmware declares that it's going to touch those registers itself, so Linux decides not to in order to avoid race conditions and potential hardware damage. In many cases the firmware offers a collaborative interface to obtain the same data, and a driver can be written to get that. this bug comment discusses this for a specific board)
Unfortunately ACPI doesn't entirely remove opaque firmware from the equation - ACPI methods can still trigger System Management Mode, which is basically a fancy way to say "Your computer stops running your OS, does something else for a while, and you have no idea what". This has all the same issues that APM did, in that if the hardware isn't in exactly the state the firmware expects, bad things can happen. While historically there were a bunch of ACPI-related issues because the spec didn't define every single possible scenario and also there was no conformance suite (eg, should the interpreter be multi-threaded? Not defined by spec, but influences whether a specific implementation will work or not!), these days overall compatibility is pretty solid and the vast majority of systems work just fine - but we do still have some issues that are largely associated with System Management Mode.
One example is a recent Lenovo one, where the firmware appears to try to poke the NVME drive on resume. There's some indication that this is intended to deal with transparently unlocking self-encrypting drives on resume, but it seems to do so without taking IOMMU configuration into account and so things explode. It's kind of understandable why a vendor would implement something like this, but it's also kind of understandable that doing so without OS cooperation may end badly.
This isn't something that ACPI enabled - in the absence of ACPI firmware vendors would just be doing this unilaterally with even less OS involvement and we'd probably have even more of these issues. Ideally we'd "simply" have hardware that didn't support transitioning back to opaque code, but we don't (ARM has basically the same issue with TrustZone). In the absence of the ideal world, by and large ACPI has been a net improvement in Linux compatibility on x86 systems. It certainly didn't remove the "Everything is Windows" mentality that many vendors have, but it meant we largely only needed to ensure that Linux behaved the same way as Windows in a finite number of ways (ie, the behaviour of the ACPI interpreter) rather than in every single hardware driver, and so the chances that a new machine will work out of the box are much greater than they were in the pre-ACPI period.
There's an alternative universe where we decided to teach the kernel about every piece of hardware it should run on. Fortunately (or, well, unfortunately) we've seen that in the ARM world. Most device-specific simply never reaches mainline, and most users are stuck running ancient kernels as a result. Imagine every x86 device vendor shipping their own kernel optimised for their hardware, and now imagine how well that works out given the quality of their firmware. Does that really seem better to you?
It's understandable why ACPI has a poor reputation. But it's also hard to figure out what would work better in the real world. We could have built something similar on top of Open Firmware instead but the distinction wouldn't be terribly meaningful - we'd just have Forth instead of the ACPI bytecode language. Longing for a non-ACPI world without presenting something that's better and actually stands a reasonable chance of adoption doesn't make the world a better place.
no subject
Date: 2023-11-01 11:07 am (UTC)Is there a missing word here?
Thank you for this. It was very educational -- speaking as someone who battled APM for a living back in the day.
no subject
Date: 2023-11-02 01:23 pm (UTC)And likewise grateful for a pretty interesting overview of _why_ ACPI!
no subject
Date: 2023-11-01 01:16 pm (UTC)*just*? We had that, on the OLPC XO-1, and it was a breath of fresh air. Meaningful self-tests of all the peripherals, an interactive REPL in the pre-boot environment, it was like living in the future.
"actually stands a reasonable chance of adoption" kills any innovation where it stands, because the only thing that stands a reasonable chance of adoption is whatever Microsoft mandates.
no subject
Date: 2023-11-01 05:38 pm (UTC)Meanwhile, I've used other devices with OpenFirmware, where the "meaningful self-test" is defined by the vendor as a no-op. This is a quality of implementation issue, and the XO-1 had a very high standard there; other OpenFirmware systems do not.
no subject
Date: 2023-11-01 07:20 pm (UTC)no subject
Date: 2023-11-04 01:51 pm (UTC)no subject
Date: 2023-11-01 04:08 pm (UTC)With regard to figuring out what would work better in the real world, there is an emerging need to abstract the hardware information on a number of the devices from the OS.
For example, in web fingerprinting, a lot of code targets hardware identifiers to build a bridge, for later identification. If the hardware identifiers are the same, or are associated with other device identifiers, its likely the same person.
There will always be some unique artifacts that can be collected, but black boxing hardware from the OS seems an admirable goal to reduce the attack surface for these types of attacks. Its already done to a much lesser degree in blade servers that use a storage subsystem that must come up before the system can boot. We really need something like this, also ACPI in modern hardware is nearly always the culprit with compatibility issues in Linux. The issue isn't often the hardware, its the lack of documentation of the hardware made available, but which may be made available to Microsoft (through their OEM Certification Program).
The one thing I find disappointing regarding your post is the final statement. Who decides what's better. The producers think the way things are now are much better, otherwise they wouldn't do it. As a user its clearly not better, so this reasoning is flawed because better, always pre-supposes a question of better for whom?. Its opaque without any real answer and is completely subjective; this is what spawns flame wars, and is fallacious in any rational context. I'd think sticking to a rational context would provide the most constructive benefit in a discussion; useless opinions without credibility often are meaningless and devoid of value.
Also, on a side note: that captcha to post is ridiculous and will block nearly all human posts. It took me a solid 10 minutes to figure it out because there is no context, and that's with quite a lot of knowledge about how computers actually function. I doubt any regular person would be able to get it except by accident.
2 from 14, 35, 26, 24, 28. It's assumed to be something simple a human could do, so processing math asking for a number 2 from fourteen (12,16), (33,37),(24,28),(22,26),(26,30), since this is the most common use. Though as you can see, there are at least 5 n_2-tuples, that when permuted could be any number of actual solutions, there's no operator, no determinism or pattern emerges for inference so you are basically asking people to guess at whatever the creator thought was the right solution (mindread), and I'm sure it locks people out after several failures. This is a very poor implementation that doesn't do what its designed to do, you should have a serious discussion with dreamwidth, very few people actually see that in the form of how a digit sum is put together, but that wasn't the answer apparently. Its very obtuse, and I can't be bothered to email them.
no subject
Date: 2023-11-01 10:20 pm (UTC)OT - captchas
Date: 2023-11-01 11:34 pm (UTC)(I've nothing to really add to the thread because I agree with it. ACPI is the worst solution we have, except for all the others)
no subject
Date: 2023-11-01 09:11 pm (UTC)To maximally make people angry, I've been suggesting the use of UEFI EBC in DTs as a way to poorly approximate ACPI semantics within DT.
And frankly, I think that's the only option forward for "embedded-brained" people, who have a tight grasp on ARM in the kernel.
no subject
Date: 2023-11-01 09:14 pm (UTC)no subject
Date: 2023-11-03 06:23 am (UTC)Now... figure out how to do the equivalent with eBPF, and anyone complaining will be shut down by the eBPF world domination squad.
It might not be an awful idea for Android, et al, to move some of the driver details to eBPF. However, I'm sure that slope is plenty slippery and early good ideas turn into enormous messes over time.
(You thought board files were bad? Just wait for the 50MB vendor eBPF blobs!)
no subject
Date: 2023-11-02 12:52 am (UTC)no subject
Date: 2023-11-02 01:25 pm (UTC)no subject
Date: 2023-11-02 08:08 am (UTC)When it worked, e.g. laptops with a known fixed set hardware, it worked reasonably good. The problems started when the firmware's S2RAM routine called back into software for a suspend inhibition check and Win95 decided not to post a semaphore or something, so S2R would never complete. Fond memories ;^)
no subject
Date: 2023-11-02 08:16 am (UTC)no subject
Date: 2023-11-02 01:57 pm (UTC)Link please? Never heard of this, and I've been looking for something to read next year.
no subject
Date: 2023-11-03 06:33 am (UTC)OTOH, I think we ended up in a decent place on ARM, all in all. The world didn't end, and things didn't get nearly as messy as the initial patches from APM (the company) was making us fear. Of course, it helped that the number of vendors remaining is small.
We could have gotten there with much less energy spent on debates if certain people didn't work so hard on antagonizing the existing developer and maintainer base, but we recovered it in the end. Nowadays they claim full credit for something that they so nearly tanked, and others saved, but life's too short to argue over it.
no subject
Date: 2023-11-09 10:03 am (UTC)It's a reference to a Dril tweet:
https://twitter.com/dril/status/107911000199671808?lang=en
no subject
Date: 2023-11-02 04:05 pm (UTC)This sounds like an argument for a common, open definition of a bootable system architecture that multiple vendors can implement compatibility with -- roughly the same way that the IBM-compatible PC created a baseline for competition forty-or-so years ago.
In other words: a newly-developed, compatible platform created by a vendor might not be fully-utilized by an existing operating system that was developed without any knowledge of that particular platform, but nonetheless the operating system should be installable, bootable and usable on the platform given that it complies to the standard.
It took me some research to find out whether there's a standard available or in-progress for ARM-based systems that aims towards that, and whether it requires ACPI and/or DeviceTree.
Is the ARM Base System Architecture[1] that standard? (it seems to provide the option for platforms to use ACPI or alternatively for them to use DeviceTree)
(it's not a rhetorical question: I genuinely don't know)
[1] - https://github.com/ARM-software/bsa-acs/blob/2d08c94b4ab9128aeb987a57bc0461271d94460c/README.md
no subject
Date: 2023-11-03 11:24 am (UTC)I'm going to attempt to answer my own question here, by reading and quoting from the current 1.0C version of the ARM Base System Architecture specification[1].
Quoting from page 13 of 98, section number two, entitled "Introduction":
So I think that the specification is intended to achieve this.
My understanding is that the ARM Base System Architecture is the abstract, top-level compatibility specification, and that within it there are more-granular boot specifications, including EBBR (E for embedded, and does not require ACPI for compliance) and SBBR (S for server, and does require ACPI for compliance). These are then bundled into compliance specifications, presumably for QA and testing purposes (ARM SystemReady IR where the I may be awkwardly for Internet-of-Things, SystemReady ES for Embedded Server, and so on. the ARM Developer Ecosystem SystemReady guides[2] are a useful reference for this).
So overall for ARM: the situation looks like it should be better in future in terms of single-OS-image compatibility across systems. For whatever reasons, only Internet-of-Things compliance category devices have been specified without ACPI as a requirement, although the base specification (Base System Architecture) that each compliance category devices from does allow for DeviceTree as an alternative. And that, I think, would be fine by using the 'compatibility' field to allow systems to be installable, bootable, usable even if more-tailored DeviceTree information would subsequently allow for improved access to a given platform's devices. But again, I don't really understand this stuff in detail, so that's my fairly naive read of the situation.
[1] - https://developer.arm.com/documentation/den0094/c
[2] - https://github.com/ArmDeveloperEcosystem/systemready-guides/blob/1211c0176eacd306024686d0edb9846d199db10e/README.md
no subject
Date: 2023-11-03 10:15 pm (UTC)It can be easily shown that ACPI is a badly designed solution for Power Management.
Power Management for unreocgnised devices does not need a convoluted inefficient AML, and operating systems to write complex interpreters. MJG correctly states the SMM is bad; well designed hardware with OS support should not need BIOS taking over and SMM mode.
Power Management should be managed for all devices via standard hardware interfaces.
Let's imagine we're back in the 1990s when ACPI was first developed. Back then we have x86 PCs with conventional PCI.
For PCI devices, please refer to the "PCI Bus Power Management Interface Specification".
I am presuming this specification provides a standard Power Management mechanism for all PCI devices, and it is possible to manage power for PCI devices that the kernel does not have drivers for.
On x86 computer, I believe PCI bus is used for most devices. I think the only devices not covered are CPU(s), north bridge and south bridge (including standard legacy ISA devices such as PS/2 ports, COM ports and floppy controller). It would not be difficult to implement a standard hardware interface for Power Management of these components. One idea could be extend the PCI Configuration Space Access Mechanism on IO Ports 0xCF8/0xCFC. Bit 31 is enable for PCI, Bit 30 could be enable for PM Config... then read / write PM Config registers as specified.
As an example, turn off PC could be as simple as 5-6 reads/writes to IO Ports 0xCF8/0xCFC if there is a standard hardware interface.
Instead I think for ACPI its load multiple ACPI tables and find the correct method and enable ACPI (100s lines of code). Then run AML interpreter (1000s lines of code) to execute the "Turn Off" method.
In summary, the best way to do Power Management is via standard hardware interfaces e.g. PCI Bus Power Management Interface Specification.
ACPI is a convoluted over-engineered mess.
Let's not forget that MJG seems to be disagreeing with Linus Torvalds. Linus said:
"ACPI is a complete design disaster in every way."
[https://www.azquotes.com/quote/1218512]
no subject
Date: 2023-11-03 10:43 pm (UTC)With all due respect, I think that is wrong.
In the x86 PC example, power management could be done via a generic interface on I/O Ports.
As an example, the generic power management specification specifies what I/O Port read/writes will cause computer to go to S3 Suspend-To-RAM.
For a "new power controller", it will have to follow the specification above.
I believe ACPI instead has a complex AML interpreter to basically allow different I/O Port / memory accesses for different devices that do the same function! ACPI is poorly designed.
no subject
Date: 2023-11-04 02:37 am (UTC)no subject
Date: 2023-11-04 02:36 am (UTC)This presumption is incorrect. The PCI specification defines power management to the level under the control of a PCI card, which allows you to get down to the D3warm state. However, D3cold is fundamentally outside the PCI spec - the entire point is to cut power to the PCI device entirely, and that can't be done on-device because how would you wake it back up? That has to be controlled by the platform instead (even the chipset doesn't know how power rails are wired up!). The normal ACPI method for _OFF will simply be a single write to a GPIO line that will turn off the power, and the _ON method will be similarly trivial. Of course, it could be more complicated than that - some devices may have more complicated power sequencing requirements and could be fed by multiple power rails, so having this be code rather than a static table of GPIO mappings is still preferable.
no subject
Date: 2023-11-05 10:36 am (UTC)Hmmmm looks like ACPI is required because the hardware design is stupid.
The motherboard chipset should should have specification of exactly how it is wired up to the ATX PSU etc.
I can't see any reason why this isn't standardised across all motherboard chipsets.
Hardware manufacturers have done a bad job and we are stuck wtih ACPI as bandaid solution it seems.
no subject
Date: 2023-11-05 10:45 am (UTC)Your article says ACPI is good but only appears to give hypothetical examples e.g. "new type of power controller".
Can you please provide a real world example of a when new hardware was released to the public between 1998 and 2010, and ACPI saved the day.... Windows / Linux didn't need new drivers to control it.
May I repeat, please provide a real world example of new hardware.
When you have nominated a specific real world example, I look forward to responding with further questions.
no subject
Date: 2023-11-05 05:27 pm (UTC)no subject
Date: 2023-11-07 09:54 am (UTC)We don't need ACPI for thermal monitors, what we need is a standard hardware interface.
A motherboard has standard PS2 ports.
A motherboard has standard COM ports.
A motherboard has standard USB ports.
Another example could be the USB HID specification, I believe it is generic spec for human interface devices such as keyboard/mouse/joystick etc.
Why isn't there a generic hardware standard for motherboard thermal monitors?
That is poor hardware design.
Soooo the best use case you came up with for ACPI is "thermal monitors", which at the end of the day could have been implemented as a standard hardware design similar to examples above.
ACPI is a over-engineeered garbage system implemented to work around lack of hardware standardisation.
Your support of ACPI is misguided, and I think Linus Torvalds would agree.
no subject
Date: 2023-11-08 12:08 am (UTC)no subject
Date: 2023-11-09 01:07 am (UTC)In regards to thermal management, you appear to have claimed that "managing multiple fans in a system" is really complicated and a different spec would end up equally as complicated as ACPI.
Sorry but you are wrong.
The USB HID specification provides a generic way to manage hardware (keyboards etc, obviously not thermal zones).
The USB HID does NOT have its own "virtual machine language" aka ACPI Machine Language.
I believe something similar to HID specification could be created for managing thermal zones / fans in a device.
Java or Web Assembly can be seen as a good example where you need a "virtual machine". That is to abstract away different CPU architectures so compiled code can run on different computers.
ACPI Machine Language is an over-engineered poorly designed solution for the lack of standardisation.
no subject
Date: 2023-11-09 04:29 am (UTC)no subject
Date: 2023-11-12 10:16 am (UTC)Asking me to write a "Thermal Management Specification" is asking me to spend 1 to 2 weeks to prove you wrong - noone is going to do that.
I am not a hardware expert. However I think that most people would agree that thermal management of a computer is in general terms approximately of similar complexity to HID devices, and so therefore you are wrong.
no subject
Date: 2023-11-12 10:36 pm (UTC)"Most people" isn't the relevant metric. "Most people who understand the field" is the relevant metric.
no subject
Date: 2023-11-14 12:00 pm (UTC)Ok I'm happy if you can educate me.... please provide example of how thermal management is *very* complicated on a PC / laptop.
You made the claim that "HID solves a much easier problem." If you provide one clear example of why this is so then you have demonstrated your expertise in the field.
no subject
Date: 2023-11-14 02:14 pm (UTC)HID, on the other hand, merely asks a device to define the set of reports it can generate. It doesn't have to express dependencies. In fact, the lack of a strong mechanism to express dependencies is one of the reasons why more complicated devices don't tend to work correctly with generic drivers - they don't have a way to express their full set of functionality, and so make use of vendor-designed extensions or hardcode that dependency information in the driver. Linux has over 100 drivers in drivers/hid, and if the HID spec were sufficient then most of those wouldn't exist. So in some ways, yes, HID is a good example here - it's a spec that attempted to meet a set of needs, but fell short because reality is more complicated than the original designers envisaged. And thermal management is much harder than just reporting "I have this set of keys" and then sending reports when one of them is hit.
no subject
Date: 2023-11-18 07:24 am (UTC)ACPI Version 6.5 has section: 11.5 Native OS Device Driver Thermal Interfaces
This section didn't appear in version 2.
Maybe the light bulb has finally turned on...... Operating System should be using direct hardware interface instead of dubious convoluted "ACPI Machine Language".
You appear to claim ACPI is "extensible". However there is simple proof that isn't true. In ACPI version 2 section 12.3 Thermal Objects, it lists 13 different types of objects. ACPI version 6.5 section 11.4 Thermal Objects lists 26 different types of objects. Clearly, ACPI version 2 was NOT "extensible"; they had to make major changes to it in later versions. If ACPI was truly "extensible", an Operating System written in 2002 to version 2 of spec would still have 100% compatibility with ACPI today. That is not the case so clearly the "extensible" claim is false.
Now, for writing a not bloated Thermal Management specifiction...
In ideal world, the Operating System has native drivers for all hardware (e.g. a GPU). The OS driver knows how to read the GPU temperature sensor. The OS driver knows the temp the GPU will turn on it's fan. The OS driver knows critical temp for the GPU where it will shutdown automatically. It's not the job of Thermal Management specification to provide this information.
I would expect a GPU operates in lower power mode by default, and only goes into high power mode when Operating System drivers do necessary magic.
This should be the same for ALL devices. They should operate in lower power mode until the Operating System activates higher performance modes.
PCI has a generic power management specification, so I believe even "unknown" PCI devices can be put in low power mode.
For x86-64 CPUs, they should have a standard method for reading the temperature sensor. They should have standard Model Specific Register to inform the OS what temperature the fan will be required, and what temp will force hardware shutdown. They should have standard MSRs describing the power states S0 - S3. They should have standard hardware method for putting CPU into different power states. Let me guess, Intel and AMD do it differently.....
Section 11.1 of ACPI 6.5 spec has nice diagram of a "Thermal Zone", it shows what is required for thermal management.
"Thermal Zone-wide active cooling device" would be something like cpu or case fan, connected directly to northbridge chipset. One problem is I don't think there is standard hardware interface for this. Instead of bloated ACPI, a standard hardware interface for "motherboard fans" should be developed. All chipsets should follow this standard.
"Thermal Zone-wide temperature sensor" is similar to above. Instead of bloated ACPI, a standard hardware interface for "motherboard temperature sensors" should be developed.
The final requirement of a "not bloated" Thermal Management specification is specify how components interact e.g. where they are located in respect to each other.
So the spec does require a list of Thermal Zones (probably usually 1). Each zone has a list of Devices (I think _TZD in ACPI lingo). For each device in the list, it has x,y,z coordinates to specify its location relative to everything else. As stated above, the Operating System has native driver to understand the device details, or OS has generic (PCI etc) driver to put unknown device into low power mode.
Voila.... thermal management done with less bloat than ACPI. And without using ACPI Machine Language.
Repeating myself.... ACPI is bloated highy complicated specification (e.g. AML) that causes problems. The only excuse for ACPI is a lack of hardware standardisation, consequently forcing a very generic indirect specification.
Any "praise" of ACPI is misguided.
vehement disagreement
Date: 2024-06-14 04:35 am (UTC)ARM with ACPI and x86 without?
Date: 2023-11-20 10:25 am (UTC)Thanks for the great artcile.
It's was really an eye opener why many ARM devices are so painful when installing an custom OS.
So I did some further searching...
There's actually some rare ARM hardware with UEFI support. And I guess it also has ACPI.
https://www.gigabyte.com/de/Enterprise/Rack-Server/R183-P92-rev-AAE1
https://www.linaro.org/blog/when-will-uefi-and-acpi-be-ready-on-arm/
For the Raspberry Pi there also seems to be a limited possibility to use UEFI and probably ACPI too. The trick is using a custom second stage bootloader. But that's custom stuff for the Raspi and will never be available for a broat range of ARM boards without UEFI+ACPI support by the hardware vendor.
https://github.com/pftf/RPi4
https://www.xda-developers.com/efidroid-is-a-second-stage-bootloader/
Unfortunately there's also opposite movement. As far as I understand, many Google x86 Chromebooks (Intel or AMD CPU, running Chrome OS) have a custom boot system and don't support UEFI boot.
(there's Coreboot underlying, but Google configured it in a way which does not enable UEFI boot)
So maybe at some point Google decides to also remove ACPI.
https://wiki.archlinux.org/title/Chrome_OS_devices
https://doodlezucc.github.io/eupnea-linux.github.io/
(yes, you need something like LineageOS for Notebook to run a custom on Chromebooks)
Lets see what comes if the following starts off. I really hope NVidia and AMD use standardtized stuff like UEFI- and ACPI for their ARM chips. I don't want Notebooks to need custom ROMs like smartphones... https://www.reuters.com/technology/nvidia-make-arm-based-pc-chips-major-new-challenge-intel-2023-10-23/
German: https://www.heise.de/news/AMD-und-Nvidia-entwickeln-angeblich-ARM-Prozessoren-fuer-Notebooks-9342371.html
Re: ARM with ACPI and x86 without?
Date: 2023-11-20 06:34 pm (UTC)Re: ARM with ACPI and x86 without?
Date: 2024-04-04 04:38 am (UTC)