A use for EFI
Jun. 7th, 2011 03:36 pm
mjg59
Anyone who's been following anything I've written lately may be under the impression that I dislike EFI. They'd be entirely correct. It's an awful thing and I've lost far too much of my life to it. It complicates the process of booting for no real benefit to the OS. The only real advantage we've seen so far is that we can configure boot devices in a vaguely vendor-neutral manner without having to care about BIOS drive numbers. Woo.
But there is something else EFI gives us. We finally have more than 256 bytes of nvram available to us as standard. Enough nvram, in fact, for us to reasonably store crash output. Progress!
This isn't a novel concept. The UEFI spec provides for a specially segregated are of nvram for hardware error reports. This is lovely and not overly helpful for us, because they're supposed to be in a well-defined format that doesn't leave much scope for "I found a null pointer where I would really have preferred there not be one" followed by a pile of text, especially if the firmware's supposed to do something with it. Also, the record format has lots of metadata that I really don't care about. Apple have also been using EFI for this, creating a special variable that stores the crash data and letting them get away with just telling the user to turn their computer off and then turn it back on again.
EFI's not the only way this could be done, either. ACPI specifies something called the ERST, or Error Record Serialization Table. The OS can stick errors in here and then they can be retrieved later. Excellent! Except ERST is currently usually only present on high-end servers. But when ERST support was added to Linux, a generic interface called pstore went in as well.
Pstore's very simple. It's a virtual filesystem that has platform-specific plugins. The platform driver (such as ERST) registers with pstore and the ERST errors then get exposed as files in pstore. Deleting the files removes the records. pstore also registers with kmsg_dump, so when an oops happens the kernel output gets dumped back into a series of records. I'd been playing with pstore but really wanted something a little more convenient than an 8-socket server to test it with, so ended up writing a pstore backend that uses EFI variables. And now whenever I crash the kernel, pstore gives me a backtrace without me having to take photographs of the screen. Progress.
Patches are here. I should probably apologise to Seiji Aguchi, who was working on the same problem and posted a preliminary patch for some feedback last month. I replied to the thread without ever reading the patch and then promptly forgot about it, leading to me writing it all from scratch last week. Oops.
(There's an easter egg in the patchset. First person to find it doesn't win a prize. Sorry.)
But there is something else EFI gives us. We finally have more than 256 bytes of nvram available to us as standard. Enough nvram, in fact, for us to reasonably store crash output. Progress!
This isn't a novel concept. The UEFI spec provides for a specially segregated are of nvram for hardware error reports. This is lovely and not overly helpful for us, because they're supposed to be in a well-defined format that doesn't leave much scope for "I found a null pointer where I would really have preferred there not be one" followed by a pile of text, especially if the firmware's supposed to do something with it. Also, the record format has lots of metadata that I really don't care about. Apple have also been using EFI for this, creating a special variable that stores the crash data and letting them get away with just telling the user to turn their computer off and then turn it back on again.
EFI's not the only way this could be done, either. ACPI specifies something called the ERST, or Error Record Serialization Table. The OS can stick errors in here and then they can be retrieved later. Excellent! Except ERST is currently usually only present on high-end servers. But when ERST support was added to Linux, a generic interface called pstore went in as well.
Pstore's very simple. It's a virtual filesystem that has platform-specific plugins. The platform driver (such as ERST) registers with pstore and the ERST errors then get exposed as files in pstore. Deleting the files removes the records. pstore also registers with kmsg_dump, so when an oops happens the kernel output gets dumped back into a series of records. I'd been playing with pstore but really wanted something a little more convenient than an 8-socket server to test it with, so ended up writing a pstore backend that uses EFI variables. And now whenever I crash the kernel, pstore gives me a backtrace without me having to take photographs of the screen. Progress.
Patches are here. I should probably apologise to Seiji Aguchi, who was working on the same problem and posted a preliminary patch for some feedback last month. I replied to the thread without ever reading the patch and then promptly forgot about it, leading to me writing it all from scratch last week. Oops.
(There's an easter egg in the patchset. First person to find it doesn't win a prize. Sorry.)
no subject
Date: 2011-06-07 09:47 pm (UTC)no subject
Date: 2011-06-07 09:52 pm (UTC)no subject
Date: 2011-06-07 10:16 pm (UTC)LINUX_EFI_CRASH_GUID?
Date: 2011-06-08 06:45 am (UTC)Not getting it.
Date: 2011-06-08 07:54 pm (UTC)Was there anything so horribly wrong with OF that they needed to recreate it?
Re: Not getting it.
Date: 2011-06-08 08:01 pm (UTC)These are, obviously, worthless excuses.
Re: Not getting it.
Date: 2011-06-08 08:57 pm (UTC)I actually had an Intel EFI architect tell me - to my face - that they invented EFI because there wasn't anything that already existed to reuse. Later, in the same talk, he said he'd seen EFI implementations that wrap OpenFirmware.
The conclusion I drew from this was that the thing wrong with OpenFirmware was that Intel hadn't written it; either that someone needed to justify their existence, or that they have some deeper cultural NIH reason why they had to not use OF. So, either malice or incompetence, and I don't really care which. It's a complete clusterfuck.
Re: Not getting it.
Date: 2011-06-09 06:51 am (UTC)http://osxbook.com/book/bonus/chapter4/firmware/
easter egg
Date: 2011-09-20 07:28 pm (UTC)CRYING CAT FACE
SKULL AND CROSSBONES