This piece is kind of interesting, but I think its core thesis is pretty much nonsense. You don’t need to have been there when software was first written in order to understand it. Humans are capable of learning things.
I have worked with software that probably couldn’t have survived a complete change of team, and I will say this: It’s usually the worst code at the company, it’s often itself a rewrite of what the company started with, and I always get the impression it’s being held back by the original developers who are still with it. Without these first-generation programmers, any software in danger of becoming unlearnable would necessarily be simplified or replaced.
You don’t need to have been there when software was first written in order to understand it. Humans are capable of learning things.
I think that’s a bit of a straw man; the article doesn’t say that the software itself is incomprehensible to others. With enough effort you can look at the software and understand what it does. What you can’t do after the fact is understand the context in which it was written; why was it done that way? What alternatives were considered and discarded? How has the context changed since those decisions were initially made? That’s what they mean when they talk about theory-building.
In theory you could write this stuff down, but I have never seen this actually happen in an effective way. (Probably because people keep thinking of the software itself as the point rather than the theory it embodies.)
I considered this, but looking at the article, it almost seems to take care not to talk about why. And, in any case, my experience is that people forget the context at such a rate that by ten or so years out, reverse-engineering it from the code is at least as reliable as asking the authors. Anyway, reading again, I still think this is about more than just context.
I think on balance I agree with the article. As @technomancy says, it’s about the theory the software embodies. Code is just one facet of that theory, and can never capture the tacit knowledge, ambiguities and personal relationships which all play a part in a software system.
However, I do agree with @edk- that the article dances around this point. Perhaps it’s intrinsically a bit of an abstract argument, but I couldn’t help but feel that more concrete writing would have helped.
This appears to be an excerpt from a book, so perhaps the rest of the book goes into detail on this point. I’ve added it to my list, but not bought/read it yet.
For some reason, there is a widespread default mindset (at least in the part of the industry I’ve seen) that “only those who built it can understand it.”
It doesn’t even depend on code quality (though I am a firm believer that any code written by a human can be understood by a human).
You can have a module that is clearly structured and spiced with comments about “why this solution is chosen,” or “when we’ll need X, it can be changed that way,” or “this is the assumption here; if it breaks, the assumption was wrong”… And still, when something is broken or needs update, people would hack around the module or treat it as a black box that “I’ve tried to pass this data, do you know why it doesn’t work? oh, I experimented for six more hours and it seems I guessed why!” or ask in a chat “who knows how this constant is used (used once in a codebase, with a clear comment why)” etc. etc.
It is like, through the years, the overall stance of a developer has switched from “I’ll understand the crap out of this codebase, disassemble it into smallest bits, and will rewrite it my way!!!” (an attitude that met with a lot of grounded critique) to “Nobody should understand your crap, either you support it forever, or it is thrown away in its entirety.”
I don’t think it’s true that only those who built it can understand it, but the effort required to understand a legacy codebase from scratch & safely make changes is enormous and this problem affects FOSS as well. I’ve been dealing with this for the TLA+ tools - specifically the parser - which when I joined the project was a pile of 20+-year-old Java code with everybody who touched it gone from the project for a decade or more. Past a certain point the code ceases to become source code in some sense - people will only deal with it at the API level and everything within is indistinguishable from a binary blob that cannot be changed. The process of shedding light onto that part of the codebase required writing over 300 round-trip parse tests to semi-exhaustively document its behavior, and even with that monumental effort I still only really have a handle on the syntax component of the parser, let alone the semantic checker. But that isn’t all. You may have developed a mental model of the codebase, but who is going to review your PRs? It then becomes a social enterprise of either convincing people that your tests are thorough enough to catch any regressions or giving them some understanding of the codebase as well.
Compare that with being the original author, where you basically have total ownership & can make rapid dictatorial changes to a component often without any real code review. The difference in effort is 1-2 orders of magnitude.
Then consider the scenario of me leaving. Sure all the tests I wrote are still there, but do people have a grasp of how thorough the test coverage is to gauge how safe their changes are? I would not be surprised if it took five years after me leaving for basic changes to the parser to happen again.
The only thing I was trying to say is that “only original author can fully understand that” becomes industry’s self-fulfilling prophecy, creating a feedback loop between people not trying to read others’ code (and not giving the feedback that it lacks some background information or clear structure), and people not thinking of their code as a way to communicate everything they know, because “nobody will try to read it anyway, the important thing is that it works.”
It manifests in many things, including the changed stance for code reviews, where “you left a lot of comments” starts to be universally seen as “you are nitpicking and stalling the development,” and disincentivizes those who are really trying to read the code and comment of the things that aren’t clear enough or lack an explanation of the non-obvious design choices.
okay, I’ll take the alternate stance here. I worked on the back end of a large triple AAA video game that was always online. I worked on it for roughly 6 years before I moved to another company.
I have very good documentation, very clear objectives. It was very simple infrastructure - as simple as it could be made. The “why” of decisions was documented and weaved consistently into the fabric of the solution.
I hired into my new company! my successor. Expecting him to have experience with the same problems and my original infrastructure sought to solve.
he didn’t, he didn’t learn how or why certain things were how they were. my expectation of his ability to solve problems that I had already solved because he would’ve had experience with them was completely incorrect.
had the system failed catastrophically he would’ve been unable to fix it and that was not discovered even after working there for three years
For some reason, there is a widespread default mindset (at least in the part of the industry I’ve seen) that “only those who built it can understand it.”
There are levels of understanding and documentation is variable, but there are almost always some things that don’t make it into documentation. For example, the approaches that you discarded because they didn’t work may not be written down. The requirements that were implicit ten years ago and were so obvious that they didn’t need writing down, but which are now gone, may be omitted, and they influenced part of the design.
With enough archeology, you often can reconstruct the thought processes, but that will take enormous amounts of effort. If you were there (and have a good memory), you can usually just recall things.
The problem (for me) is that people start taking those contextual truths and applying them unconditionally to any situation. Like, even without looking frequently, “I wouldn’t even start to try reading through the module (where choices of approach and limitations might be visible in code or well-documented); I’ll treat it as a black box or delegate it to the module author, regardless of the current organization structure.”
The situations I am quoting in the previous comment (“who knows how this constant is used?” in chat, regardless of the fact that the constant is used once in a codebase, with a clear comment why and what’s the meaning) are all real and somewhat disturbing. Might depend on the corner of the industry and the kind of team one is working with, of course.
I completely agree with the second half of your post. I might just be a grumpy old person at this point, but the mindset seems to have shifted a lot in the last twenty years.
For example, back then there was a common belief that software should run on i386 and 64-bit SPARC so that you knew it handled big vs little endian, 32- vs 64-bit pointers, strong vs weak alignment requirements, and strong vs weak memory models. It also had to run on one BSD and one SysV variant to make sure it wasn’t making any assumptions beyond POSIX (using OS-specific features was fine, as long as you had fallback). This was a mark of code quality and something that people did because they knew platforms changed over time and wanted to make sure that their code could adapt.
Now, I see projects that support macOS and Linux refusing FreeBSD patches because they come with too much maintenance burden, when really they’re just highlighting poor platform abstractions.
Similarly, back then people cared a lot about API stability and, to a lesser degree, ABI stability (the latter mostly because computers were slow and recompiling everything in your dependency tree might be an overnight job or a whole-weekend thing). Maintaining stable APIs and having graceful deprecation policies was just what you did as part of software engineering. Then the ‘move fast and break things’ or ‘we can refactor our monorepo and code outside doesn’t matter’ mindsets are common.
The problem (for me) is that people start taking those contextual truths and applying them unconditionally to any situation.
That seems like a meta-problem that’s orthogonal to the original article’s thesis. It strikes me as an instance of the H L Mencken quote, “For every complex problem there is a solution which is clear, simple and wrong.”
I’m not sure the overall attitude has changed over the years. I suspect the nuance required for dealing with the problem of software longevity and legacy code is something that is currently mainly learned the hard way, rather than being taught. As such, many inexperienced practitioners will lack the awareness or tools to deal with it; combined with the rapid growth and thus younger-skewing demographics of the industry, I guess it means those with the requisite experience are in the minority. But has this situation really ever been different?
In any case, none of this is an argument against the thesis of the original text - you can certainly argue it’s a little vague (possibly because it’s a short excerpt from a book) and perhaps overly absolutist. (I’d argue the extent of the problem scales non-linearly with the size of the code on the one hand, and you can to some extent counteract it by proactive development practices.)
FWIW, as a contractor/consultant, I’d say the majority of my projects over the last years have been of the “we have this legacy code, the person/team who wrote it is/are no longer around” kind to some degree. My approach is definitely not to assume that I will never understand the existing code. In fact, I have found a variety of tactics for tackling the task of making sense of existing code. Again, I suspect most of these are not taught. But all of them are much less efficient than just picking the brains of a person who already has a good mental model of the code and the problem it solves. (It is fiendishly difficult to say with any reliability in retrospect whether it would have been cheaper to just start over from scratch on any such project. I do suspect it can shake out either way and depends a lot on the specifics.)
okay, I’ll take the alternate stance here. I worked on the back end of a large triple AAA video game that was always online. I worked on it for roughly 6 years before I moved to another company.
I have very good documentation, very clear objectives. It was very simple infrastructure - as simple as it could be made. The “why” of decisions was documented and weaved consistently into the fabric of the solution.
I hired into my new company! my successor. Expecting him to have experience with the same problems and my original infrastructure sought to solve.
he didn’t, he didn’t learn how or why certain things were how they were. my expectation of his ability to solve problems that I had already solved because he would’ve had experience with them was completely incorrect.
had the system failed catastrophically he would’ve been unable to fix it and that was not discovered even after working there for three years
Without these first-generation programmers, any software in danger of becoming unlearnable would necessarily be simplified or replaced.
I agree with your primary criticism–it is certainly true that software can be understood without the original creators.
However, your assessment of what will happen is very optimistic. It is entirely possible that what will happen is that new programmers will be brought in. They will only have time to make basic bug-fixes, which will be kludges. If asked to add new functionality, there will be copy paste. When they do try to buck the trend of increasing kludges, they will break things because they do not fully understand the software.
So I agree, any software should be understandable, but it will take investment in rebuilding a theory of how it works, and rewriting, or refactoring the software to make it workable for the new programmers. This will only happen if management understands that they have a lump of poorly understood software and trusts the developers to play the long game of improving the software.
The optimism is really just extended pessimism: I claim that, if you keep doing that, at some point all changes will break more than they fix, and either someone will take a hatchet to it or it will have to be abandoned.
It’s not that far off, only a little exaggerated. Yes, you can understand code you didn’t write, but you can’t understand it in the same way as one of its authors, until you’ve rewritten a chunk of it yourself. Yes, a team (or a solo developer) can maintain inherited software, but they’re going to have an adjustment period in which they’ll be inclined to “bolt-on” or “wrapper” solutions because they have trepidation about touching the core code. And it’s fair to say that that adjustment period ends, not after some period of staring at the code, but after making enough changes to it — not only that some part of it becomes their own, but that they run into enough challenges that the constraints that shaped the existing code start to make sense.
I wish I’d thought of this in my first comment, but the article is basically a long-winded way to say “the worst memory is better than the best documentation”. I’ll just leave that there.
but they’re going to have an adjustment period in which they’ll be inclined to “bolt-on” or “wrapper” solutions because they have trepidation about touching the core code
I can believe this happens sometimes but I don’t think it’s necessary. I’ve picked up legacy projects and within days made changes to them that I’d stand by today. Codebases take time to learn, and working on them helps, but finding one’s way around a new program, figuring out why things are the way they are, and building an intuition for how things should look, are all skills that one can develop.
Anyway I think even your version of the point largely refutes the original. Learning by doing is still just learning, not magic. In particular it doesn’t require an unbroken chain of acculturation. Even if the team behind some software all leaves at once, it’s not doomed.
I would also argue that in some cases the original authors of a program hold it back. The constraints that shaped the existing code aren’t always relevant decades down the track. Some the authors will simply be wrong about things. Removing the code from most of its context can be a good thing when it allows the project to go in a new direction. Also, especially for code that’s difficult to maintain… the original authors are the reason that is so—and as long as the chain of first-generation programmers remains intact, the path of least resistance to full facility with the code is to be trained to think like them. Breaking that local maximum might not be the worst thing.
Perhaps the problem with churn is that it’s not a clean break. You get an endless stream of second-generation programmers who try to build in the image of what came before, but always leave before they achieve mastery. I dunno.
I think it’s very accurate that the founders and early employees have the deepest knowledge of the system though. Yea, new people can come in and learn it, but it’s never quite to the same level. Anecdotally of course.
My mental model of this is that, roughly, the money you pour into a paid programmer accumulates. It’s not “linear”, but if a programmer goes, it takes away the accumulated value. I believe some accounting practices model similar stuff.
The converse also applies: if your company relies heavily on a piece of open-source software, hiring a core maintainer of (or experienced contributor to) that software onto your staff is probably a bargain.
…Web Browsers from ports that are patched to implement pledge(2) and unveil(8). […] FreeBSD 14.1, AFAIK, does not implement such feature.
I suppose the idiomatic method for securing processes on FreeBSD is capsicum(4). And at least for Firefox, it looks like someone has been working on adding support but they ran into some tricky cases that apparently aren’t well supported.
I guess for retrofitting huge, complex code bases, pledge and unveil or jails are probably easier to get working. I wonder how this affects things like file picker dialogs for choosing uploads, etc. If they’re implemented in-process, I guess they get to “see” the veiled or jailed file system - which means you don’t get any mysterious permission issues, but you also can’t upload arbitrary files. If they were implemented via IPC to some desktop environment process, the user could see the usual file system hierarchy to select arbitrary files, and if those files were sent back to the browser process as file descriptors, it would actually work. (I think the latter is how sandboxed apps are permitted to read and write arbitrary files on macOS, with Apple-typical disregard for slightly more complex requirements than just picking one file.)
I guess for retrofitting huge, complex code bases, pledge and unveil or jails are probably easier to get working.
In regard to pledge+unveil, it’s extremely simple to actually implement in code (though considerations for where+what in the code would be more complex) and the predefined promises make it pretty easy.
I wonder how this affects things like file picker dialogs for choosing uploads, etc.
For pledge+unveil, from memory, the dialog just cannot browse/see outside of the unveil’d paths. For browsers there’s a few files /etc/<browser>/unveil.* that lists the paths each kind of browser process is allowed access. Included here is ~/Downloads and common XDG-dirs for example, which allows for most file picking stuff to work fine for most users.
In regard to pledge+unveil, it’s extremely simple to actually implement in code (though considerations for where+what in the code would be more complex) and the predefined promises make it pretty easy.
One advantage is also that you can progressively enhance this over time. You can keep adding restrictions every time you fix instances of code which would previously have been violating a pledge. Capsicum would appear to require a more top-down approach. (I can’t help but wonder if you could add a kind of compatibility mode where it allows open() and similar as long as the provided path traverses a directory for which the process holds an appropriate file descriptor through which you’d normally be expected to call openat(). Or maybe that already exists, I really need to get hands-on with this one day.)
Included here is ~/Downloads and common XDG-dirs for example, which allows for most file picking stuff to work fine for most users.
I can see that the alternative would probably require a more holistic approach at the desktop environment level to implement well, but defining these directories statically up front seems like an awkward compromise. Various XDG directories contain precisely the kind of data you’d want to protect from exfiltration via a compromised process.
Capsicum lets you handle things like the downloads directory by passing a file descriptor with CAP_CREATE. This can be used with openat with the O_CREAT flag, but doesn’t let you open existing files in that directory. This is all visible in the code, so you don’t need to cross reference external policy files.
If you run a Capsicum app with ktrace, you can see every system call that Capsicum blocks, so it’s easy to fix them. With a default-deny policy and no access to global namespaces, it’s easy to write least-privilege software with Capsicum. I have not had that experience with any of the other sandboxing frameworks I’ve tried.
Capsicum lets you handle things like the downloads directory by passing a file descriptor with CAP_CREATE. This can be used with openat with the O_CREAT flag, but doesn’t let you open existing files in that directory. This is all visible in the code, so you don’t need to cross reference external policy files.
The download side is easier to handle as long as you’re keeping to a single download directory. I expect uploads to be somewhat more annoying UX wise: it’s rather unusual to have an “uploads” directory where the user would first copy any files they want to upload to a website, then select them in the “Browse…” dialog.
One slightly less annoying option is to agree on blanket read access to a bunch of stuff under $HOME, which appears to be what’s used here in practice, but it leaves you vulnerable to data exfiltration, which surely is one of the attack scenarios this whole exercise is trying to defend against.
Anything more comprehensive I can come up with will necessarily be a multi-process arrangement where the less-sandboxed process sends file descriptors of what the user selects to the heavily sandboxed one. And where drag & drop of a file sends (a) file descriptor(s) rather than just (a) path(s).
To be clear, I’m not saying this would be a bad system! I think it’d be great to have this in a desktop environment. Just that it’s a little tricky to retrofit onto a giant ball of code you’ve never even looked inside before.
If you run a Capsicum app with ktrace, you can see every system call that Capsicum blocks, so it’s easy to fix them. With a default-deny policy and no access to global namespaces, it’s easy to write least-privilege software with Capsicum. I have not had that experience with any of the other sandboxing frameworks I’ve tried.
That’s good to know - in contrast, dealing with the Sandbox on Apple’s platforms is super annoying as you’re mostly reduced to reading system log tea leaves when things aren’t working - macOS dtrace is falling apart more and more with every release and sometimes requires disabling the very security features you’re trying to debug.
But tracing only just begins to address the stated problem of retrofitting a large existing code base. If everything including dependencies including transitive ones is using open rather than openat, but open is completely non-functional, I suspect that might be rather a chore to get fixed. I mean it’s feasible if you actually “own” most of the project, but modifying something as big and unknown as a web browser in this way is quite an undertaking even if you can eventually get it all upstreamed.
The download side is easier to handle as long as you’re keeping to a single download directory. I expect uploads to be somewhat more annoying UX wise: it’s rather unusual to have an “uploads” directory where the user would first copy any files they want to upload to a website, then select them in the “Browse…” dialog.
Capsicum was designed to support this via the powerbox model (just as on macOS: it was designed to be able to more cleanly support the sandboxing model Apple was developing at the time). When you want to upload a file, the file dialog runs as a service in another process that has access to anything and gives file descriptors to selected files. You can also implement the same thing on top of a drag and drop protocol.
Alex Richardson did some Qt / KDE patches to support this and they worked well. Not sure what happened to them.
But tracing only just begins to address the stated problem of retrofitting a large existing code base. If everything including dependencies including transitive ones is using open rather than openat, but open is completely non-functional, I suspect that might be rather a chore to get fixed.
Alex Richardson did some Qt / KDE patches to support this and they worked well. Not sure what happened to them.
Good to know it’s been done and the code presumably is still out there somewhere. Something to keep note of in case I end up doing any UNIX desktop work. (And it sounds like this was done as part of an academic research project, so probably worth trying to get hold of any other published artifacts from that - perhaps part of this project?)
The nice thing about this is that open is a replaceable symbol. For example, in one project where I want to use some existing libraries in a Capsicum sandbox I simply replace open with something that calls openat with the right base depending on the prefix.
Providing this fallback compatibility wrapper as a user space libc override is a nifty technique, thanks for sharing!
I can see that the alternative would probably require a more holistic approach at the desktop environment level to implement well, but defining these directories statically up front seems like an awkward compromise.
I kind of agree. It did feel unintuitive to me at first. Worth noting these files are owned by root and normal users cannot write to them by default.
Various XDG directories contain precisely the kind of data you’d want to protect from exfiltration via a compromised process.
That’s right – from what I remember it was pretty well locked down though.
When I see the effort some people put, mostly on their spare time, to give people the ability to run extremely closed-source software onto extremely closed-source hardware, with FLOSS in the middle, I’m just amazed… in a good way.
I’m happy that they’re doing it, and I’m not… These people have my absolute respect.
What makes you call the hardware extremely closed source? I mean all the firmware is closed source of course but that’s the case for pretty much all hardware, what makes Mac hardware unusually closed source? It’s not especially locked down, all the mechanisms which are used to install and boot different operating systems are mechanisms intentionally included by Apple after all, it’s not like Asahi relies on some kind of jailbreaking
This might be ignorance from my part, I’m happy to be corrected.
But I’ve always assumed that Mac didn’t use any standard BIOS, and the new ones were using some non-standard UEFI boot process. My understanding was that, while it’s true that their custom firmware allow you to boot another OS, this is something your OS has to put effort into supporting. This is because Apple don’t care about you, as opposed to Intel/AMD/Asus/… who actively contribute to the Linux kernel for their hardware support. My understanding, as well, is that there is very little free documentation available on how to support Apple’s hardware, hence my “extremely closed source”.
Why else would we need the efforts of the Asahi team, if the hardware (which, I admit, I used as a metonymy for hardware+firmware) was standard?
The firmware and bootloader become fairly uninteresting once you’ve successfully booted the operating system, and while there are lots of fiddly bits to take care of to get things running smoothly, once you’re booted, you’re booted. (usually) Booting is not usually a herculean effort, as long as the hardware and firmware vendors haven’t tried to actively block you. Which Apple hasn’t, apparently. (Assuming secure boot is disabled.)
That said, Apple’s GPUs are entirely undocumented, so building a high-quality GPU driver for these things purely based on reverse-engineering (intercepting and tracing what I/O the macOS GPU driver performs on the GPU), especially in this kind of time frame is beyond impressive.
Other devices and SoC subsystems are also mostly undocumented of course and need their own drivers but the complexity of most devices is not in the same league as a GPU; USB is almost certainly a standard implementation though (perhaps with some smaller hacks) so that’s one big driver stack they won’t have to have reimplemented.
USB is almost certainly a standard implementation though
Correct! Most Apple peripherals are custom (though some have fun heritage like the Samsung UART, lol) but core USB is the same Synopsys DesignWare block you see in nearly everything (that’s not Intel nor AMD) these days.
That’s just the host/device controller though. To make it work you also need a USB PHY driver (Apple custom), a USB-PD controller driver (Partly incompatible Apple variant of an existing TI chip), and the I2C controller driver for the bus that chip talks through (which is actually a variant of an old PASemi controller).
Yeah, by “core” USB I meant the host/device controller :)
Frankly this is exactly why I like ACPI. With ACPI, all that glue-ish stuff around a generic core interface like XHCI mostly/hopefully can be written once in AML, instead of having to write and maintain drivers in each kernel. Which, yeah, from a Linux-centric perspective that last part is not a feature, but I personally like the BSDs for example, and I also hope that “the future” will be more Redox-shaped or something, so I’m always biased in favor of things that make OS diversity easier!
Heh, I was working on some low-level USB code on a microcontroller for a client. I’m staring the code and having this moment of deja vu “weird… I’ve never used this brand of chip before but for some reason a lot of this feels very familiar.”
I had worked on USB code before on different chips though. I opened up the datasheets from an old project and this new project and started looking at the USB registers. Sure enough, exactly the same names, exactly the same bit layout, exactly the same memory addresses. Google some of the names and find data sheets from multiple vendors and then find one from Synopsis. Sure helped debug things when I had my own reference implementation to compare against!
Booting other OS’s is intentionally supported, and doing so without compromising security for people who aren’t doing so is a significant amount of work. There’s a bunch of stuff describing how non-apple OS’s are installed without trivially undermining secure boot, and I think back in the early days of Asahi marcan talked about it a bunch.
Fortunately, heap allocation is not necessary to use C++ coroutines. However, C++20 coroutines do require dynamic allocation (memory that is allocated at runtime). Pigweed’s pw::async2::Coro API allocates memory using an pw::Allocator.
I don’t see a meaningful distinction between “heap allocation” and “dynamic allocation”, other than the scope of the allocator: whether it’s OS-provided and global, or custom and specific to some subsystem. Either way, it has the same effect.
What’s the benefit to using Pigweed’s custom allocator over the global operator new / malloc? If anything it’s a drawback, because in the case that multiple subsystems implement their own allocators, memory is wasted because each one is holding on to free space that the others can’t access.
Nevertheless, there’s a lot of interesting info here on the innards of C++ coroutines!
I don’t see a meaningful distinction between “heap allocation” and “dynamic allocation”, other than the scope of the allocator: whether it’s OS-provided and global, or custom and specific to some subsystem. Either way, it has the same effect.
A very meaningful distinction is that heap allocation, e.g. dynamic storage, will at some point call ::operator new, and has some chance of ending up doing a system call, which has implications in terms of latency for instance as this increases the chances for a thread to get scheduled-out. So far this was entirely precluding the use of coroutines in e.g. real-time audio processing loops unless you could assert at compile-time that HALO went into effect and every memory allocation got optimized out.
If anything it’s a drawback, because in the case that multiple subsystems implement their own allocators, memory is wasted because each one is holding on to free space that the others can’t access.
it’s definitely the only sane way to design some systems when you really want to make sure you aren’t ever going to take a lock.
There are a few reasons why someone might prefer a custom allocator over a global new or malloc.
In some cases, it’s a hard requirement: not all platforms provide a global allocator. This is fairly common in embedded systems.
Custom allocators can be beneficial for reliability, too– the exact amount of memory available for the purpose of allocating a particular coroutine or set of coroutines can be fixed, and failure to allocate from that dedicated pool of memory can be handled gracefully, rather than allowing the coroutine(s) to gradually expand into memory that is dedicated for other tasks.
Finally, using a custom allocator can sometimes improve performance by taking advantage of cache locality or certain allocation patterns. For example, bump allocators and arena allocators can provide more efficient allocation because they only deallocate at the end of certain scopes or lifetimes. This makes allocation just a simple pointer increment and makes deallocation a total no-op until the end of the arena’s lifetime.
You’re preaching to the choir! I agree custom heap allocators can be useful. But they’re still heaps. I found your article a bit misleading because you imply you’ve found a way to run C++coroutines without heap allocation, which you haven’t. You’ve just built your own heap.
Sorry for the misdirection! I didn’t intend to be misleading– this is an unfortunately common terminology mismatch. I’m using “heap allocation” to mean specifically the provided malloc/free/new/delete, and “dynamic allocation” to refer to the more general case of runtime memory allocation.
I do also discuss in this post why dynamic allocation is needed and how I hope its use can be avoided in the future, so I think it is still relevant to those hoping to avoid dynamic allocation completely (though I’m sad that it isn’t currently possible today).
In some cases, it’s a hard requirement: not all platforms provide a global allocator. This is fairly common in embedded systems.
But this requires you to bring along an allocator. And if you are going to bring along an allocator, why not just implement the global new and delete overloads?
The reason that I would want to avoid global new / delete on embedded platforms is that allocation is more likely to fail (because memory is tightly constrained) and so I need the caller to handle the failure case and I’m on an embedded platform that doesn’t support exceptions so it needs to handle allocation failure by the return address.
Your article mentions this about half way down:
Allow recovering from allocation failure without using exceptions.
From the API surface, it looks as if you provide a hook in the promise object that means that allocation failure can return a constant. This is really nice, you can have a coroutine call return something like an ErrorOr<T> and then return the error value if it fails to allocate space for the coroutine.
It would probably be nice to lead with that. I had to read a lot to figure out that you’ve actually solved the bit of the problem that I cared about.
Yes, handling the failure case gracefully is super important! RE ErrorOr<T>, Pigweed provides a pw::Result<T> which is exactly this– either a T or a pw::Status (our error code).
Coro<T> requires that T be convertible from pw::Status so that it can produce a value of T from pw::Status::Internal() in the case of allocation failure.
Custom allocators can be beneficial for reliability, too– the exact amount of memory available for the purpose of allocating a particular coroutine or set of coroutines can be fixed, and failure to allocate from that dedicated pool of memory can be handled gracefully, rather than allowing the coroutine(s) to gradually expand into memory that is dedicated for other tasks.
I’m going to flip this statement on its head for a sec. Using a fixed-sized arena allocator or something like that is great for having guarantees that you’re not going to blow up your heap. But… do we know at compile-time how big the allocations are going to be? Instead of using a fixed-sized arena, would it be possible to use something like a freelist allocator for each type that needs to get allocated? Like I could declare in my code that I’m going to assign a static allocation for 10 Coro objects and 30 frames; the allocator could return those from the freelist and return them to the freelist upon deallocation (both O(1) without needing to ever end the arena’s lifetime).
The size of the coroutine state will depend on the captured state (automatic variables at suspension points) so much like a lambda (closure type), its size is known by the compiler at compile time, but as far as I’m aware there’s no clear way to obtain it in your code at compile time, for example to be used as a template argument.
Yeah, this isn’t possible today, sadly– I discuss this a bit at the end of the post. If C++ offered a way to inspect the size of the coroutine of a specific function implementation, there would be no need for dynamic memory allocation: we could pre-reserve exactly the required amount of space. This would have huge benefits both for reliability and efficiency– no more need to guess at and adjust the right size for your allocators.
Little late getting back to this but I’ve had a little time to dig into some of the WGxx documents and I understand quite well why we can’t do this (yet anyway, hopefully someday!) For using it on an embedded system though, the HALO thing scares me a whole lot more than having to pre-reserve some space for an allocator (which also scares me less than having hidden/implicit malloc behind the scenes). On most of the smaller-sized embedded systems I work on, running out of space in a bounded arena and returning an error is a manageable thing. Changing the function that sets up a coroutine a little bit, resulting in HALO being available and significantly changing how much stack space is required… that’s spooky! At least with running out of arena space there’s well-defined error semantics. Walking off the end of a fixed-sized stack usually just means corruption.
I’m still digging a little more… it seems like there is a way to ensure HALO is disabled but having it potentially ok by default seems like a footgun for embedded, and we’ve got lots of footguns already!
(I’m still super intrigued through. This looks super cool!)
AFAIR the issue is that the size of the coroutine is only known by the compiler after optimization passes (as it depends on how local variables are stored, which lifetimes overlap or not, what can be reused) but the sizeof(...) operator is typically implemented in the frontend. The compiler writers said they would not be able to implement a coroutine proposal with frontend known size without heroic efforts. This was discussed extensively during the design of C++ coroutines, including a competing “core coroutine” proposal appearing and ending up dying on this same issue. Some alternatives were explored, such as types that did not support sizeof(…) but still could be stored on the stack, but it the end storing the coroutine on a heap with an escape hatch for ellision was deemed the best compromise.
A more general case of the example above would be if a were a variable-length array with x as the size. I would absolutely never write such a thing in embedded code (and typically not in normal C++ if there’s a chance an attacker can influence the savour of `x‘) but it is permitted.
Yeah I wasn’t thinking about the general case too much, just the specific example, and assuming the statically sized array was the point that seemed problematic. You’re right it’s not always possible… in most languages.
Rust does know the size of each Future, but it doesn’t let you have dynamically sized values on the stack (on stable). And using that unstable feature, it’s not possible to hold a dynamically sized value across await points (#61335) as await points are where the size needs to be known to be stored in the generated Future’s enum variant.
“Safe memory reclamation” is special jargon in this context.
If you are using RCU, hazard pointers, or other similar lock-free techniques, then a write to a datastructure cannot immediately free() any memory that was removed, because the memory can still be in use by concurrent readers. You have to wait for a grace period or quiescent period to pass before it can be freed. You don’t want to do this as part of the write transaction because that will ruin your write throughput: grace periods are relatively long and can accommodate multiple write transactions.
So you need to hang on to the to-be-freed memory somehow and reclaim it later. Sometimes the whole concurrent algorithm is easier if you guarantee type-stable memory by using a region of some kind. Sometimes you can save significant amounts of memory (a pointer per node) by using a GC instead of a to-be-freed list.
I’m not sure how custom allocators help here. They are not invoked until after the destructor has run, so prolonging the lifetime of the allocation using a custom allocator doesn’t help because the object is gone (you may get an arbitrary bit pattern if you try to read it). Things like RCU work fine without a custom allocator because they defer destruction, not deletion.
I don’t know how you can do things like data-structure-specialized GC or type-stable memory without a custom allocator. You don’t need those techniques for RCU (RCU was an example to set the scene) but they are sometimes useful for concurrent data structures. I also dunno how this would fit with the C++ facilities for custom allocators — I guess you have to avoid destruction until the memory is about to be freed (especially for type-stable memory)
Type-stable memory can be done with custom allocators, but it’s often better to do it with a class overriding operators new and delete. You can use CRTP to implement a superclass that provides a custom memory pool for each type.
I could be wrong but my recollection is that they use alloca which allocates arbitrary stack space instead of using actual allocators (which typically involve a system call in the worst case).
We have two main groups: this one focused on a more classic approach to writing embedded software, and another focused on using the chip with asynchronous Rust and the Embassy framework.
Can someone explain the difference between the “classic” and async/Embassy approach?
I suspect it is sync vs async, but I suspect the classic would be higher level since it presumably has an embedded RTOS underneath it providing the sync abstraction, while Embassy presumably has a thinner runtime and leverages Rust’s async to support multitasking.
So how does one do multitasking with classic? Like how do you run a bluetooth or wifi module while also doing the other things your application needs to do?
Independent pieces of hardware do stuff in the background when you’re not interacting with them, so you either need to check in every so often manually, or use interrupts. A common pattern is to have a big loop that goes around doing everything that needs to be done, and then sleeps when there’s nothing to be done, to be waken up by an interrupt or something else. You might write a bunch of state machines that you drive forward when the correct events come in.
That’s why async is so exciting for embedded. You can write what looks and feels like abstracted task code for an RTOS, but it compiles into a fairly predictable event loop with state machines, just like what many people write by hand.
That makes sense. When I’ve dabbled with embedded programming the “classic” approach usually involved pretty low-level management of network peripherals—you had to know the state machine for each peripheral and how to handle its events and then compose that state machine with the state machines of other peripherals and your application and it was always really painful especially if you misunderstood something about a peripheral’s state machine. Embassy looks more or less like what I’ve been wanting—I can’t wait to try it out.
A few of my own patches made it into this release!
Modernised Hypervisor.framework (HVF) API usage on x86-64 macOS hosts, which gives double-digit percentage point performance improvements on most workloads.
Minor fix to handling of translucent mouse cursors via VNC.
Unfortunately a bunch of others so far haven’t been accepted, I‘ll try again for the 9.2 cycle:
x2apic support when using HVF, which yields double-digit percentage perf gain for VMs with more than 1 core. (Patch has been entirely ignored in the mailing list despite being very simple.)
3D graphics acceleration with macOS guests. (Rejected because you currently can only use it with x86-64 guests on x86-64 hosts; suggestion was I include patches which will allow arm64 macOS guests to boot in Qemu. I’m currently stuck on XHCI not working correctly with those due to a suspected interrupt issue. If anyone knows how MSIs are supposed to work with PCIe devices on arm64 I’d love to know.)
Passing through guest pointers to native host mouse cursors on macOS hosts using the Cocoa UI. (Diverging opinions on whether this is desired behaviour.)
I’ve also been working on support for HVF’s in-kernel APIC support, but that turns out to be buggy, so I’m currently trying to figure out if there’s some baseline feature set that does work reliably…
I don’t quite get the appeal of the textual prompt about discarding the key. Overlay text breaks the immersion to some extent. If it really is useless now and there’s some emotional backstory, surely the player character could for example convey the emotion through gesture? Shake their fist and fling the key away? (I don’t really go for horror games, so haven’t played RE2 and don’t know what sense of relief the author is alluding to here.)
I liked Far Cry 2’s approach to the map. The first-person player character literally pulls out a map from somewhere and holds it in his¹ hand alongside the handheld GPS device. The interesting bit here is that the game keeps going on around you. You can run around while holding the map, even though it’s obscuring most of your field of view. If you encounter baddies (or they encounter you) while you’re staring at the map, they’ll quite happily start shooting you while you’re still fumbling to grab your weapon.
The same mechanic works while you’re driving too. Much as driving while holding up a map in one hand is rather hazardous in real life, the game tends to punish you for doing this as well, as tempting as it frequently is.
The rest of the HUD is similarly understated and disappears most of the time, though a health bar and number of health syringes will show up when you’ve taken damage. (And IIRC the screen will momentarily be tinted red when taking a hit.)
Later games in the series started cluttering the screen a lot more unfortunately. (They also undeniably tidied up some of the more unpolished game mechanics, but in my opinion never quite recreated the atmosphere of danger that Far Cry 2 conveyed. Nor the moral ambivalence.)
¹ Unfortunately, all playable characters are male, even though key NPCs you meet in the game will be chosen from the pool of player characters plus a woman who for some reason is not playable.
the relief from the player character comes from the fact that in RE/2 you have very limited inventory space. Even just carrying a key around is a whole ordeal, and when you open a door and still have to hold onto the key, that means that you have new challenges ahead, and you don’t even get an extra inventory slot (which you could use to, for example, pick up healing items).
There are absurd situations you can end up in where you have no free inventory slots, but no healing items, and are hurt, and there is a healing item right in front of you and you can’t use it. It’s fodder for a lot of jokes, but it’s also an amazing source of tension.
None of this really makes me a better programmer, but it’s fun to think about.
I love those sorts of things, depending on the game. Helldivers has an interesting middle-ground to this; there’s a HUD, but it’s quite minimal and digging for more information tends to have real consequences. Your weapon has no targeting reticle, unless you actually aim down the sights of it. The ammo counter is a numberless “full/kinda full/empty” bar, unless you bring up the context menu that lets you switch weapon modes and such; only then does it say “23/30 rounds”. You have a local minimap, but zooming out on it to see the full map means your character literally looks down at the PDA strapped to their arm and stops aiming their weapon at the horrible bug-monsters coming to claw out their eyeballs. In general you can be fighting for your life or figuring out what the hell you’re doing with your life, but not both.
They do a good job of using semi-artificial inconveniences like that to add to the gritty dystopia of the game, and to make the decision making and frenetic pacing more hectic. This now makes me wonder what the game might look like if even more of the information in it were implicit rather than given in the HUD.
By the way, the term that describes a UI element like, “You pull a map out and the game doesn’t pause” where it’s internal to the world is “diegetic.” I’d go so far as to say that diegetic UI elements are always superior to virtual ones. The post makes mention of how Isaac’s suit in Dead Space shows his health on the spine, which is also a diegetic UI element.
diegetic UI elements are always superior to virtual ones
I think “superior” is doing a lot of work in that sentence. Diagetic elements are always more immersive, but not every game is about being as immersive as possible.
Imagine in real life if you could have a map that stopped time when you used it and filled your entire field of vision, you would probably find that a lot more useful than a paper map! So I don’t think it’s as simple as saying diagetic is better every time.
Diegetic doesn’t always mean “time doesn’t stop when you look at a map” so much as, “the map is a fundamental part of the world rather than an arbitrary menu.”
This blog announces or reviews a ton of SBCs and micro-PCs, but I thought this one in particular was interesting and might be newsworthy here, as it’s basically “Raspberry Pi 5 but it’s x86, can run Windows, and btw supports fast SSDs.” A cute detail is that there’s a RPi2400 chip on board to drive the GPIO pins.
That’s the usual way to upload firmware to an RP2040 and not specific to this board. As far as I’m aware this functionality is part of the bootrom inside the RP2040 and completely hardwired.
(Incidentally, I’ve previously used an RPi Pico as a kind of GPIO breakout board in an x86 SFF PC based custom appliance as well. They are so small, and the decent USB support makes it a good choice.)
In-line as in, it shows up as a USB mass storage device connected to the x86 computer? Yes, that’s what I’d expect. The RP2040’s USB port must surely be wired directly to a USB 2.0 host port on the x86 SoC, so it’ll show up as whatever type of USB device the code running on the RP2040 exposes. In the case of the naked bootrom with BOOTSEL held, this is a USB MSC. With the stock firmware, it would presumably show up as some kind of GPIO-controlling USB device. (HID?) That’s assuming it even ships with any kind of useful firmware on the RP2040. (The “documentation” link on Radxa’s X4 page seems broken, I’ve not researched beyond that.) With TinyUSB, you can program it to expose whatever USB interface you like.
The tricky bit would be resetting or power cycling the RP2040 in isolation without having to power down the host SoC. The “BOOTSEL” button on an RPi Pico normally only has an effect during very early startup, before the bootrom has loaded the current firmware. Perhaps the GPIO connector exposes the RUN pin (30) which can be used to hard reboot an RP2040. (On my SFF PC based build I run both BOOTSEL and RUN contacts to externally accessible push buttons for exactly this reason.) Another option is to have a software reset trigger in your RP2040 firmware. This’ll only work if you didn’t make any mistakes in coding your firmware. (Now who would do such a thing. 😅) Otherwise you’d have to power cycle it, which presumably means power cycling the whole board.
I received an email saying that I was eligible for a new crypto offering that was targeting open-source developers. They claimed they had checked my GitHub username, and that I was in. They offered $200 to do it for me, all I needed to do was send them the private SSH key associated with my GitHub account, so they could claim the reward. One important detail is that it had to be the private SSH key associated with my account at the time the contract was added to the blockchain, so I couldn’t just make a new temporary one.
I investigated it and the crypto distribution was real. I was able to claim it myself, and I sold the tokens as soon as I was able, for almost $3000.
I did! But by doing it myself I didn’t have to send anyone my private SSH key, I just had to sign a challenge token with it, which they verified with the public key that they had collected from GitHub.
The process was still sketchy, they had a Docker container that you would have to run. The container would read the private SSH key (which had to be mounted to the container) and run a program to generate the response token. They did recommend running the container without a network attached, and I turned off my wifi just in case as well. The container would die after printing out the response token, so I think it was pretty safe.
That’s pretty nuts. I have to ask: what’s in it for them? And why the super unsafe delivery if it’s some altruistic cause?
I’ve been receiving those emails as well and assumed they were scams aimed at injecting malicious code into my github repos or exfiltrating code from private repos. I have zero practical experience with cryptocurrency, so I guess I’ll let this “opportunity” slide as it sounds like more hassle and risk than it’s worth even if it’s not a scam.
I think they gain by creating traction among developers, that would be my guess. And since they created the token when they give it away they’re not losing money, it’s all fictitious anyway. :)
They seem to assume that the only reason packets ever drop because the link is full of traffic. Besides the fact that dropping packets in collisions is part of Ethernet, you will get dropped packets sooner or later any time you use wifi, cellular internet, damaged/degraded wired networks, long-distance wired networks, or wired networks that wander near nasty sources of interference. So, you know, always. TCP does a great job of covering this up when a) latencies are low, and b) ~99% of packets still get through. (Tried to find the article that graphed this but I can’t; I think it was from Google or Cloudflare?)
QUIC is pretty good, definitely use it in 99% of the places you’d normally use UDP. But the question is not “good data” vs “bad data”, it is “mostly-good data now” vs “perfect data maybe half a second from now”. Especially for video telecoms and stuff like that, a garbled frame or two and some weird distortion for a few seconds is better than someone just cutting off and waiting a painful few seconds for TCP and the video/audio codec to get their collective shit together before you get to say “sorry you cut out there, can you repeat that?”
Modern wireless standards do an impressive amount of forward error correction and so will provide an abstraction that looks like a reliable link even with a lot of frames being dropped. This is critical for satellite links, where a retransmit is hundreds of ms of latency, but is increasingly important for WiFi at high data rates. The faster your link speed, the more data you need to buffer. 802.11bn is supposed to scale to 100 Gb/s. If it takes 1 ms to retransmit, you’d need a 100 Mb buffer to avoid slowdown for a single dropped packet (or you’ll have a smaller window and so get back pressure and slow down). If you have enough error correction, you just get the frames arriving slightly out of order once you’ve received enough to reconstruct the dropped one.
I think modern wireless protocols dynamically adapt the size of their error correction codes when they detect loss, so your speed just gracefully degrades in most cases, but you don’t see huge latency spikes.
Interesting, isn’t that wasteful when a higher layer is ok with some packet loss? But OTOH if the higher level is doing its own retries, and the least reliable link is the first one, it makes sense to retry early where the latency cost is lower.
Maybe the link should do a small number of retries, to get a good balance of latency and reliability.
Not a deep expert on this, but I seem to remember TCP assumes packet loss is down to bandwidth limitation or congestion, and will respond by reducing the transmission rate right down, only slowly ramping back up. The typical packet loss scenario in WiFi is that there‘s some momentary interference, or the geometry of the link has changed, so the peers need to switch to less sensitive QAM encodings, etc. Until that happens, a whole batch of packets will have been dropped, which TCP interprets as catastrophic bandwidth reduction even if the wireless link recovers very quickly. So the higher layer doesn’t cope very well with the specific pattern of loss.
The conclusion of the article surprised me: it recommends a fairly complicated collection of mechanisms. I thought QUIC was going to get support for lossy streams so that it could avoid creating a backlog of wasted work when the network conditions suddenly change.
I try hard to keep an open mind and not be a video-hating old fogey, but this one really would be so much better as a blog post. No distracting fast moving footage, pumping music, and the ability to move at my own pace. If it really must be a video, then at least make it in a “boring” lecture/conference presentation style. (You can still show off your creation and show in-game footage between the explanations.)
But then I’m not really sure what the intended target audience is supposed to be… on the one hand, it moves at a pretty fast pace through some intricate assembly language sequences and mathematical expressions, but on the other hand the narrator feels the need to clarify what SIMD is.
But then I’m not really sure what the intended target audience is supposed to be… on the one hand, it moves at a pretty fast pace through some intricate assembly language sequences and mathematical expressions, but on the other hand the narrator feels the need to clarify what SIMD is.
I think the audience is supposed to not care about the actual instructions, it’s more like “look at this cool thing”
The accompanying blog post is more useful for learning
Apple oscillates a bit here. They either do nothing, point FreeBSD committers at patches they may want, or actively push things. They’ve never been particularly consistent about it and it changes over time. They funded the TrustedBSD MAC work, for example, which landed in both FreeBSD and XNU and provides the kernel bit of the iOS / macOS sandboxing framework. In contrast, they extended all of their core utilities to support -h and -H consistently for decimal and binary SI prefixes and didn’t upstream those. They offered to relicense some bits of Darwin that we were considering bringing into the FreeBSD base system.
There’s often a reasonable overlap between folks in the Apple CoreOS team and FreeBSD committers, which helps. I wouldn’t take this as a shift in Apple policy to treating FreeBSD explicitly as an upstream as they do with LLVM, though that would be very nice.
They funded parts of TrustedBSD, but they’re deprecating its subsystems on macOS since at least 2019. For example, OpenBSM is already deprecated since 2022, MAC Framework is deprecated since 2019.
As I understand it, their sandbox framework is built on top of the MAC framework. They have largely removed support for third-party kernel modules, which makes it impossible to use the MAC framework’s lower levels, only their higher-level abstractions.
It’s a shame that they never merged Capsicum, since it was designed to support things that look like their sandboxing model.
They have built something on the MAC framework (e.g. EndpointSecurity is currently built on it, for the time being), but the thing is that Apple didn’t allow using it much earlier than when kexts were deprecated. In fact, developers started to use it when Apple has released headers of MACF by mistake, not because Apple considered MACF to be a valid kernel API.
I’m not sure what point you’re trying to make here. Indeed, Apple never officially considered the MAC Framework a public API in macOS/XNU, so why do you consider it deprecated since 2019? And what does that have to do with Apple’s FreeBSD contributions? MACF was never supported on macOS in the first place. In fact, breaking API/ABI changes were frequent long before 2019. They just started cracking down hard on kexts in general in 2019.
The point is that even if Apple has funded creation of subsystems mentioned above, it’s not possible to use them on their platform, because either they’re deprecated and will be removed from the OS altogether, or were never allowed to be used by thirdparties. OP’s comment suggested that Apple has funded them and they’re available to be used, and that’s not true today for OpenBSM, and it was never true for MACF.
I never said (or intentionally implied) that Apple would expose or intend something for third-party developers on XNU just because they funded its development in FreeBSD. I did suggest that, if they’re in the open source bits of Darwin then it’s often possible to pull fixes from there into FreeBSD, but that’s very different.
So Apple’s monopolistic behavior is bad, right? None of this would happen if the connections didn’t require a physical licence key in connected device.
I think a manufacturer can legitimately require some sort of minimal technical compliance for accessories, especially as they can affect battery life (as in this case). The products stating they are compatible with Lightning are just committing fraud.
As to the “monopolistic” behavior, there are other device manufacturers that don’t use Lightning.
It’s not strictly a classical monopoly, in the sense that other people also make phones that don’t have this bullshit – but it is emphatically anti-competitive and flagrantly anti-consumer. We used to just have a literally standard headphone jack but obviously that had to go to make room for a whole new line of credit for Apple. If it wasn’t a racket it would be USB-C by now.
There is a new sort of monopoly that we need to address, though, and which people are beginning to try to address – it affects both add-ons for what should be standard ports like this as much as it affects app stores with technological measures to ensure you can’t skirt around the king’s wishes on his land. It’s essentially like a monopoly but on the third of customers (or whatever it is now) that have an iPhone, allowing Apple to collect all sorts of rents. It’s like Comcast wanting to charge Netflix (and anyone else) for peering as well as charging their customers for that same data – good work, I suppose, if you can get it!
There’s more nuance here than people usually want to admit. Apple had already switched the rest of its hardware lineup to USB-C, and in one case (laptops) actually was getting massive market pushback to return to putting at least a couple non-USB-C port types in.
For the iPhone specifically, I believe they were pretty clear when the Lightning connector came out (USB-C was still in draft/development at the time) that they intended to stick to it for at least ten years, because they didn’t want to switch and then immediately switch again, causing a bunch of needless churn and obsolescence and e-waste. The EU, coincidentally, passed its USB-C mandate nearly exactly ten years after the Lightning connector debuted, and Apple switched to USB-C the very next year.
Anyway, it amuses me that people deployed the same arguments to criticize Apple for switching to USB-C in the rest of its lineup (greedy! just trying to force us to buy their dongles/accessories!) and to criticize for not switching the iPhone over at the same time (greedy! just trying to force us to buy their dongles/accessories!).
Amusing perhaps, but not unreasonable I think. Timing was very different in the two instances. Switching laptops to USB-C only in 2016 was very much ahead of the curve. Adapters were pretty much essential at that time. Now, most displays have USB-C inputs, most peripherals come either with both cables or in 2 variants, etc. Replacing the 2 Thunderbolt 2 ports with Thunderbolt 3/USB-C and keeping the USB-A ports for the first 2 years would likely have offended fewer people. (Plus, I seem to remember there being fast and slow ports on some models, and thermal issues on some ports as well.)
Apple were pretty much the last manufacturer to switch their phones to USB-C, ports however. I don’t think it’s particularly unreasonable to criticise them for it.
Switching laptops to USB-C only in 2016 was very much ahead of the curve. Adapters were pretty much essential at that time. Now, most displays have USB-C inputs, most peripherals come either with both cables or in 2 variants, etc.
And what, I wonder, might have created an impetus in the market for companies to start offering USB-C? Perhaps the existence of popular laptops with USB-C ports…
Plus, I seem to remember there being fast and slow ports on some models, and thermal issues on some ports as well.
This is mostly down to the fact that USB-C requires a bunch of different use cases, functions, and feature sets to all share a single connector/port form factor. Apple didn’t maliciously impose that on the industry.
Apple were pretty much the last manufacturer to switch their phones to USB-C, ports however. I don’t think it’s particularly unreasonable to criticise them for it.
Again, my understanding is that when they introduced the Lightning connectors they said they’d stick to them for at least ten years as an anti-waste measure. And they did that. The narrative that Apple was somehow finally forced, unwilling, into USB-C by EU law after being a holdout is just completely manufactured out of thin air.
It doesn’t even need to be “slow internet.” On a recent trip to Australia, I noticed how large parts of the web felt considerably slower than in Europe, even on a nominally good connection. (<30ms latency to major CDNs, >100Mb/s bandwidth) Services just aren’t built with thoughts to latency anymore.
Yep don’t think most US based tech workers are aware of the latency a lot of the world experiences making requests into US based services. It’s always a delight for me when I use a service hosted in Australia and then I remember that is the normal experience for US based folks instead of the exception.
There is one oddity with this otherwise ideal display though, which is that the corners are rounded (the top corners by 3mm and bottom corners by 1mm). This is because we repurposed and customized a panel that was originally designed for another company.
Any guesses which device these panels originated from? This description doesn’t seem to match my MacBook display.
I have no inside information, and I’m not sure about the rounded corners, but the only other laptop I could find within a minute or two of searching that has a 2880x1920 13.5“ display is the Lenovo ThinkBook 13x IMH G4 (or rather, some configurations of it have such a display).
Hm, something does not add up for me. It seems that the core thesis is that an ISA is an interface, and that in this area the interface doesn’t matter all that much and the results are defined by the quality of implementation.
But then, we have aarch64 Apple desktop computers which, as far as I understand, still basically run circles around x86_64 desktop computers. I think originally people explained performance by timing — Apple was first to jump on the 6nm process. But that those several years ago, right? So, by now, any timing advantages should disappear? Where’s the inconsistency here?
Is it that today aarch64 and x86_64 are actually competitive on desktop?
Or is it that Apple just tries harder than Intel&Amd to make a CPU?
Or maybe its the ISA in the end, but not RISC/CISC in particular?
From my (only partially informed) perspective, the main genuine ISA advantage aarch64 seems to have is how easy it is to decode in parallel. In particular, the M-series chips all use an 8-wide instruction decoder, while the widest x86 chips have only recently achieved 6 parallel instructions.
The “running circles” part is incidentally only true for efficiency; Zen4 and Golden Cove both beat Apple’s core designs in raw performance on most metrics, at the cost of higher clock speed and energy consumption. It’s hard to say where exactly AMD and Intel would end up if they focused more on efficiency than the top spot in the raw performance benchmark table. Likewise, it will be interesting to see if Apple starts diverging its core designs between A series versus M series chips. Right now they seem to be essentially identical.
While it might be true, I don’t think “AMD and Intel only care about raw performance” has much explaining power when it comes to describing the lack of efficient x86 chips. Not only is it conspicuous that AMD and Intel have enormous incentives to care about efficiency by way of several important efficiency-sensitive markets (data center, mobile, tablet, laptop) that they are neglecting in favor of some other market (which?), but everyone else who is building an efficiency-oriented chip steers clear of x86. When Amazon wanted to build an efficient data center CPU, they built on ARM despite that an x86 would have been far more compelling all else equal (they and their customers wouldn’t have to deal with a change of architecture). The same was true for Apple and for Google and so on.
There’s certainly something about x86 that seems less compelling from an efficiency perspective.
Not a hardware expert. My understanding, informed by things fabian giesen and maynard handley have written, is that, broadly speaking, it’s vertical integration: apple made tradeoffs that improved power efficiency significantly at the cost of a small amount of performance (see for instance the tlb design, which is quite clever). In particular, intel targets a ~5-6ghz frequency for its cpus, where apple targets ~3ghz; this means apple gets twice as much time on each cycle. But then apple has a definite advantage in context of a laptop, where the cpu is only going to be running at 3ghz anyway.
Instruction decoding is a thing, as the sibling says. Intel seems to have recently found a strategy for scaling up parallel decode, the details of which I have forgotten. Beyond that, the main thing that comes to mind as potentially inhibiting x86 performance is the concurrency model. Enabling tso on apple cpus costs about 10%; this is likely an upper bound on how much it hurts intel, so I would expect the actual cost of tso to intel is much less than 10%.
Instruction density is sometimes touted as an advantage of x86, but it’s not all that true. x86 can be denser than aarch64, but it’s much less dense than it could be.
Good thinking on TSO and memory model. I seem to remember, during the initial phases of the arm64 transition for Mac, Apple engineers were saying the ObjC (atomic) refcounting operations on arm64 were taking single digit nanoseconds (7-9?) while the same operations were taking around 42ns on the fastest Intel Macs. OK, this is of course an extreme microbenchmark, but it does all quickly add up.
If I’m reading the Intel docs right, this is another big flaw in the ISA.
AFAICT atomic increment must have the lock prefix, which also means a full memory barrier. So cloning an object force cache flushing on unrelated data on Intel, where as ARM allows you to do an atomic increment with memory order relaxed, causing contention only on the counter object itself.
Read-modify-writes are somewhat different, and those are indeed significantly more costly on x86 (though they have gotten cheaper—down to ~10 cycles on the most recent uarches, I think, form ~20?—compare with 3 on apple). But they should also be fairly rare (strewing atomic reference count updates all over your code is poor design). I’m talking about the cost of ordinary loads and stores, where the difference should be much less.
You may consider it bad design, but Objective-C’s ARC (and presumably to some extent Swift’s built-in object system) will insert refcount operations all over the place, and this accounts for a lot of code running on Apple devices. Yes, the compiler uses linear types to try to erase unnecessary retain/release pairs, but this only works within a method (or across inlined function calls). I’ve definitely encountered performance critical code where refcounting started dominating runtime, and we either had to locally disable ARC or (where possible) move data structures to flat C (or C++).
I’ve put down fibre across various runs in our house and home office, for fast access to local network storage rather than a high speed internet link. I didn’t want to mess with splicing, so I opted for FS’s MTP cables and panels. You still have to get the MTP connector through the conduit, but it’s significantly smaller than say an RJ45 plug, and wasn’t an issue in practice. All in all, I’ve had less trouble with the fibre than with multi-gigabit copper ethernet, which I’ve found to be rather fussy.
This piece is kind of interesting, but I think its core thesis is pretty much nonsense. You don’t need to have been there when software was first written in order to understand it. Humans are capable of learning things.
I have worked with software that probably couldn’t have survived a complete change of team, and I will say this: It’s usually the worst code at the company, it’s often itself a rewrite of what the company started with, and I always get the impression it’s being held back by the original developers who are still with it. Without these first-generation programmers, any software in danger of becoming unlearnable would necessarily be simplified or replaced.
I think that’s a bit of a straw man; the article doesn’t say that the software itself is incomprehensible to others. With enough effort you can look at the software and understand what it does. What you can’t do after the fact is understand the context in which it was written; why was it done that way? What alternatives were considered and discarded? How has the context changed since those decisions were initially made? That’s what they mean when they talk about theory-building.
In theory you could write this stuff down, but I have never seen this actually happen in an effective way. (Probably because people keep thinking of the software itself as the point rather than the theory it embodies.)
I considered this, but looking at the article, it almost seems to take care not to talk about why. And, in any case, my experience is that people forget the context at such a rate that by ten or so years out, reverse-engineering it from the code is at least as reliable as asking the authors. Anyway, reading again, I still think this is about more than just context.
I think on balance I agree with the article. As @technomancy says, it’s about the theory the software embodies. Code is just one facet of that theory, and can never capture the tacit knowledge, ambiguities and personal relationships which all play a part in a software system.
However, I do agree with @edk- that the article dances around this point. Perhaps it’s intrinsically a bit of an abstract argument, but I couldn’t help but feel that more concrete writing would have helped.
This appears to be an excerpt from a book, so perhaps the rest of the book goes into detail on this point. I’ve added it to my list, but not bought/read it yet.
For some reason, there is a widespread default mindset (at least in the part of the industry I’ve seen) that “only those who built it can understand it.”
It doesn’t even depend on code quality (though I am a firm believer that any code written by a human can be understood by a human).
You can have a module that is clearly structured and spiced with comments about “why this solution is chosen,” or “when we’ll need X, it can be changed that way,” or “this is the assumption here; if it breaks, the assumption was wrong”… And still, when something is broken or needs update, people would hack around the module or treat it as a black box that “I’ve tried to pass this data, do you know why it doesn’t work? oh, I experimented for six more hours and it seems I guessed why!” or ask in a chat “who knows how this constant is used (used once in a codebase, with a clear comment why)” etc. etc.
It is like, through the years, the overall stance of a developer has switched from “I’ll understand the crap out of this codebase, disassemble it into smallest bits, and will rewrite it my way!!!” (an attitude that met with a lot of grounded critique) to “Nobody should understand your crap, either you support it forever, or it is thrown away in its entirety.”
I don’t think it’s true that only those who built it can understand it, but the effort required to understand a legacy codebase from scratch & safely make changes is enormous and this problem affects FOSS as well. I’ve been dealing with this for the TLA+ tools - specifically the parser - which when I joined the project was a pile of 20+-year-old Java code with everybody who touched it gone from the project for a decade or more. Past a certain point the code ceases to become source code in some sense - people will only deal with it at the API level and everything within is indistinguishable from a binary blob that cannot be changed. The process of shedding light onto that part of the codebase required writing over 300 round-trip parse tests to semi-exhaustively document its behavior, and even with that monumental effort I still only really have a handle on the syntax component of the parser, let alone the semantic checker. But that isn’t all. You may have developed a mental model of the codebase, but who is going to review your PRs? It then becomes a social enterprise of either convincing people that your tests are thorough enough to catch any regressions or giving them some understanding of the codebase as well.
Compare that with being the original author, where you basically have total ownership & can make rapid dictatorial changes to a component often without any real code review. The difference in effort is 1-2 orders of magnitude.
Then consider the scenario of me leaving. Sure all the tests I wrote are still there, but do people have a grasp of how thorough the test coverage is to gauge how safe their changes are? I would not be surprised if it took five years after me leaving for basic changes to the parser to happen again.
The only thing I was trying to say is that “only original author can fully understand that” becomes industry’s self-fulfilling prophecy, creating a feedback loop between people not trying to read others’ code (and not giving the feedback that it lacks some background information or clear structure), and people not thinking of their code as a way to communicate everything they know, because “nobody will try to read it anyway, the important thing is that it works.”
It manifests in many things, including the changed stance for code reviews, where “you left a lot of comments” starts to be universally seen as “you are nitpicking and stalling the development,” and disincentivizes those who are really trying to read the code and comment of the things that aren’t clear enough or lack an explanation of the non-obvious design choices.
okay, I’ll take the alternate stance here. I worked on the back end of a large triple AAA video game that was always online. I worked on it for roughly 6 years before I moved to another company.
I have very good documentation, very clear objectives. It was very simple infrastructure - as simple as it could be made. The “why” of decisions was documented and weaved consistently into the fabric of the solution.
I hired into my new company! my successor. Expecting him to have experience with the same problems and my original infrastructure sought to solve.
he didn’t, he didn’t learn how or why certain things were how they were. my expectation of his ability to solve problems that I had already solved because he would’ve had experience with them was completely incorrect.
had the system failed catastrophically he would’ve been unable to fix it and that was not discovered even after working there for three years
There are levels of understanding and documentation is variable, but there are almost always some things that don’t make it into documentation. For example, the approaches that you discarded because they didn’t work may not be written down. The requirements that were implicit ten years ago and were so obvious that they didn’t need writing down, but which are now gone, may be omitted, and they influenced part of the design.
With enough archeology, you often can reconstruct the thought processes, but that will take enormous amounts of effort. If you were there (and have a good memory), you can usually just recall things.
This is all true, of course.
The problem (for me) is that people start taking those contextual truths and applying them unconditionally to any situation. Like, even without looking frequently, “I wouldn’t even start to try reading through the module (where choices of approach and limitations might be visible in code or well-documented); I’ll treat it as a black box or delegate it to the module author, regardless of the current organization structure.”
The situations I am quoting in the previous comment (“who knows how this constant is used?” in chat, regardless of the fact that the constant is used once in a codebase, with a clear comment why and what’s the meaning) are all real and somewhat disturbing. Might depend on the corner of the industry and the kind of team one is working with, of course.
I completely agree with the second half of your post. I might just be a grumpy old person at this point, but the mindset seems to have shifted a lot in the last twenty years.
For example, back then there was a common belief that software should run on i386 and 64-bit SPARC so that you knew it handled big vs little endian, 32- vs 64-bit pointers, strong vs weak alignment requirements, and strong vs weak memory models. It also had to run on one BSD and one SysV variant to make sure it wasn’t making any assumptions beyond POSIX (using OS-specific features was fine, as long as you had fallback). This was a mark of code quality and something that people did because they knew platforms changed over time and wanted to make sure that their code could adapt.
Now, I see projects that support macOS and Linux refusing FreeBSD patches because they come with too much maintenance burden, when really they’re just highlighting poor platform abstractions.
Similarly, back then people cared a lot about API stability and, to a lesser degree, ABI stability (the latter mostly because computers were slow and recompiling everything in your dependency tree might be an overnight job or a whole-weekend thing). Maintaining stable APIs and having graceful deprecation policies was just what you did as part of software engineering. Then the ‘move fast and break things’ or ‘we can refactor our monorepo and code outside doesn’t matter’ mindsets are common.
That seems like a meta-problem that’s orthogonal to the original article’s thesis. It strikes me as an instance of the H L Mencken quote, “For every complex problem there is a solution which is clear, simple and wrong.”
I’m not sure the overall attitude has changed over the years. I suspect the nuance required for dealing with the problem of software longevity and legacy code is something that is currently mainly learned the hard way, rather than being taught. As such, many inexperienced practitioners will lack the awareness or tools to deal with it; combined with the rapid growth and thus younger-skewing demographics of the industry, I guess it means those with the requisite experience are in the minority. But has this situation really ever been different?
In any case, none of this is an argument against the thesis of the original text - you can certainly argue it’s a little vague (possibly because it’s a short excerpt from a book) and perhaps overly absolutist. (I’d argue the extent of the problem scales non-linearly with the size of the code on the one hand, and you can to some extent counteract it by proactive development practices.)
FWIW, as a contractor/consultant, I’d say the majority of my projects over the last years have been of the “we have this legacy code, the person/team who wrote it is/are no longer around” kind to some degree. My approach is definitely not to assume that I will never understand the existing code. In fact, I have found a variety of tactics for tackling the task of making sense of existing code. Again, I suspect most of these are not taught. But all of them are much less efficient than just picking the brains of a person who already has a good mental model of the code and the problem it solves. (It is fiendishly difficult to say with any reliability in retrospect whether it would have been cheaper to just start over from scratch on any such project. I do suspect it can shake out either way and depends a lot on the specifics.)
okay, I’ll take the alternate stance here. I worked on the back end of a large triple AAA video game that was always online. I worked on it for roughly 6 years before I moved to another company.
I have very good documentation, very clear objectives. It was very simple infrastructure - as simple as it could be made. The “why” of decisions was documented and weaved consistently into the fabric of the solution.
I hired into my new company! my successor. Expecting him to have experience with the same problems and my original infrastructure sought to solve.
he didn’t, he didn’t learn how or why certain things were how they were. my expectation of his ability to solve problems that I had already solved because he would’ve had experience with them was completely incorrect.
had the system failed catastrophically he would’ve been unable to fix it and that was not discovered even after working there for three years
I agree with your primary criticism–it is certainly true that software can be understood without the original creators.
However, your assessment of what will happen is very optimistic. It is entirely possible that what will happen is that new programmers will be brought in. They will only have time to make basic bug-fixes, which will be kludges. If asked to add new functionality, there will be copy paste. When they do try to buck the trend of increasing kludges, they will break things because they do not fully understand the software.
So I agree, any software should be understandable, but it will take investment in rebuilding a theory of how it works, and rewriting, or refactoring the software to make it workable for the new programmers. This will only happen if management understands that they have a lump of poorly understood software and trusts the developers to play the long game of improving the software.
The optimism is really just extended pessimism: I claim that, if you keep doing that, at some point all changes will break more than they fix, and either someone will take a hatchet to it or it will have to be abandoned.
It’s not that far off, only a little exaggerated. Yes, you can understand code you didn’t write, but you can’t understand it in the same way as one of its authors, until you’ve rewritten a chunk of it yourself. Yes, a team (or a solo developer) can maintain inherited software, but they’re going to have an adjustment period in which they’ll be inclined to “bolt-on” or “wrapper” solutions because they have trepidation about touching the core code. And it’s fair to say that that adjustment period ends, not after some period of staring at the code, but after making enough changes to it — not only that some part of it becomes their own, but that they run into enough challenges that the constraints that shaped the existing code start to make sense.
I wish I’d thought of this in my first comment, but the article is basically a long-winded way to say “the worst memory is better than the best documentation”. I’ll just leave that there.
I can believe this happens sometimes but I don’t think it’s necessary. I’ve picked up legacy projects and within days made changes to them that I’d stand by today. Codebases take time to learn, and working on them helps, but finding one’s way around a new program, figuring out why things are the way they are, and building an intuition for how things should look, are all skills that one can develop.
Anyway I think even your version of the point largely refutes the original. Learning by doing is still just learning, not magic. In particular it doesn’t require an unbroken chain of acculturation. Even if the team behind some software all leaves at once, it’s not doomed.
I would also argue that in some cases the original authors of a program hold it back. The constraints that shaped the existing code aren’t always relevant decades down the track. Some the authors will simply be wrong about things. Removing the code from most of its context can be a good thing when it allows the project to go in a new direction. Also, especially for code that’s difficult to maintain… the original authors are the reason that is so—and as long as the chain of first-generation programmers remains intact, the path of least resistance to full facility with the code is to be trained to think like them. Breaking that local maximum might not be the worst thing.
Perhaps the problem with churn is that it’s not a clean break. You get an endless stream of second-generation programmers who try to build in the image of what came before, but always leave before they achieve mastery. I dunno.
I think it’s very accurate that the founders and early employees have the deepest knowledge of the system though. Yea, new people can come in and learn it, but it’s never quite to the same level. Anecdotally of course.
My mental model of this is that, roughly, the money you pour into a paid programmer accumulates. It’s not “linear”, but if a programmer goes, it takes away the accumulated value. I believe some accounting practices model similar stuff.
The converse also applies: if your company relies heavily on a piece of open-source software, hiring a core maintainer of (or experienced contributor to) that software onto your staff is probably a bargain.
I suppose the idiomatic method for securing processes on FreeBSD is capsicum(4). And at least for Firefox, it looks like someone has been working on adding support but they ran into some tricky cases that apparently aren’t well supported.
I guess for retrofitting huge, complex code bases, pledge and unveil or jails are probably easier to get working. I wonder how this affects things like file picker dialogs for choosing uploads, etc. If they’re implemented in-process, I guess they get to “see” the veiled or jailed file system - which means you don’t get any mysterious permission issues, but you also can’t upload arbitrary files. If they were implemented via IPC to some desktop environment process, the user could see the usual file system hierarchy to select arbitrary files, and if those files were sent back to the browser process as file descriptors, it would actually work. (I think the latter is how sandboxed apps are permitted to read and write arbitrary files on macOS, with Apple-typical disregard for slightly more complex requirements than just picking one file.)
In regard to pledge+unveil, it’s extremely simple to actually implement in code (though considerations for where+what in the code would be more complex) and the predefined promises make it pretty easy.
For pledge+unveil, from memory, the dialog just cannot browse/see outside of the unveil’d paths. For browsers there’s a few files
/etc/<browser>/unveil.*
that lists the paths each kind of browser process is allowed access. Included here is~/Downloads
and common XDG-dirs for example, which allows for most file picking stuff to work fine for most users.One advantage is also that you can progressively enhance this over time. You can keep adding restrictions every time you fix instances of code which would previously have been violating a pledge. Capsicum would appear to require a more top-down approach. (I can’t help but wonder if you could add a kind of compatibility mode where it allows
open()
and similar as long as the provided path traverses a directory for which the process holds an appropriate file descriptor through which you’d normally be expected to callopenat()
. Or maybe that already exists, I really need to get hands-on with this one day.)I can see that the alternative would probably require a more holistic approach at the desktop environment level to implement well, but defining these directories statically up front seems like an awkward compromise. Various XDG directories contain precisely the kind of data you’d want to protect from exfiltration via a compromised process.
Capsicum lets you handle things like the downloads directory by passing a file descriptor with
CAP_CREATE
. This can be used withopenat
with theO_CREAT
flag, but doesn’t let you open existing files in that directory. This is all visible in the code, so you don’t need to cross reference external policy files.If you run a Capsicum app with ktrace, you can see every system call that Capsicum blocks, so it’s easy to fix them. With a default-deny policy and no access to global namespaces, it’s easy to write least-privilege software with Capsicum. I have not had that experience with any of the other sandboxing frameworks I’ve tried.
The download side is easier to handle as long as you’re keeping to a single download directory. I expect uploads to be somewhat more annoying UX wise: it’s rather unusual to have an “uploads” directory where the user would first copy any files they want to upload to a website, then select them in the “Browse…” dialog.
One slightly less annoying option is to agree on blanket read access to a bunch of stuff under $HOME, which appears to be what’s used here in practice, but it leaves you vulnerable to data exfiltration, which surely is one of the attack scenarios this whole exercise is trying to defend against.
Anything more comprehensive I can come up with will necessarily be a multi-process arrangement where the less-sandboxed process sends file descriptors of what the user selects to the heavily sandboxed one. And where drag & drop of a file sends (a) file descriptor(s) rather than just (a) path(s).
To be clear, I’m not saying this would be a bad system! I think it’d be great to have this in a desktop environment. Just that it’s a little tricky to retrofit onto a giant ball of code you’ve never even looked inside before.
That’s good to know - in contrast, dealing with the Sandbox on Apple’s platforms is super annoying as you’re mostly reduced to reading system log tea leaves when things aren’t working - macOS
dtrace
is falling apart more and more with every release and sometimes requires disabling the very security features you’re trying to debug.But tracing only just begins to address the stated problem of retrofitting a large existing code base. If everything including dependencies including transitive ones is using
open
rather thanopenat
, butopen
is completely non-functional, I suspect that might be rather a chore to get fixed. I mean it’s feasible if you actually “own” most of the project, but modifying something as big and unknown as a web browser in this way is quite an undertaking even if you can eventually get it all upstreamed.Capsicum was designed to support this via the powerbox model (just as on macOS: it was designed to be able to more cleanly support the sandboxing model Apple was developing at the time). When you want to upload a file, the file dialog runs as a service in another process that has access to anything and gives file descriptors to selected files. You can also implement the same thing on top of a drag and drop protocol.
Alex Richardson did some Qt / KDE patches to support this and they worked well. Not sure what happened to them.
The nice thing about this is that open is a replaceable symbol. For example, in one project where I want to use some existing libraries in a Capsicum sandbox I simply replace
open
with something that callsopenat
with the right base depending on the prefix.It would be fairly easy to do something similar for the XDG paths and have pre-opened file descriptors for each with a sensible set of permissions.
Good to know it’s been done and the code presumably is still out there somewhere. Something to keep note of in case I end up doing any UNIX desktop work. (And it sounds like this was done as part of an academic research project, so probably worth trying to get hold of any other published artifacts from that - perhaps part of this project?)
Providing this fallback compatibility wrapper as a user space libc override is a nifty technique, thanks for sharing!
I kind of agree. It did feel unintuitive to me at first. Worth noting these files are owned by root and normal users cannot write to them by default.
That’s right – from what I remember it was pretty well locked down though.
When I see the effort some people put, mostly on their spare time, to give people the ability to run extremely closed-source software onto extremely closed-source hardware, with FLOSS in the middle, I’m just amazed… in a good way.
I’m happy that they’re doing it, and I’m not… These people have my absolute respect.
What makes you call the hardware extremely closed source? I mean all the firmware is closed source of course but that’s the case for pretty much all hardware, what makes Mac hardware unusually closed source? It’s not especially locked down, all the mechanisms which are used to install and boot different operating systems are mechanisms intentionally included by Apple after all, it’s not like Asahi relies on some kind of jailbreaking
This might be ignorance from my part, I’m happy to be corrected.
But I’ve always assumed that Mac didn’t use any standard BIOS, and the new ones were using some non-standard UEFI boot process. My understanding was that, while it’s true that their custom firmware allow you to boot another OS, this is something your OS has to put effort into supporting. This is because Apple don’t care about you, as opposed to Intel/AMD/Asus/… who actively contribute to the Linux kernel for their hardware support. My understanding, as well, is that there is very little free documentation available on how to support Apple’s hardware, hence my “extremely closed source”.
Why else would we need the efforts of the Asahi team, if the hardware (which, I admit, I used as a metonymy for hardware+firmware) was standard?
The firmware and bootloader become fairly uninteresting once you’ve successfully booted the operating system, and while there are lots of fiddly bits to take care of to get things running smoothly, once you’re booted, you’re booted. (usually) Booting is not usually a herculean effort, as long as the hardware and firmware vendors haven’t tried to actively block you. Which Apple hasn’t, apparently. (Assuming secure boot is disabled.)
That said, Apple’s GPUs are entirely undocumented, so building a high-quality GPU driver for these things purely based on reverse-engineering (intercepting and tracing what I/O the macOS GPU driver performs on the GPU), especially in this kind of time frame is beyond impressive.
Other devices and SoC subsystems are also mostly undocumented of course and need their own drivers but the complexity of most devices is not in the same league as a GPU; USB is almost certainly a standard implementation though (perhaps with some smaller hacks) so that’s one big driver stack they won’t have to have reimplemented.
Correct! Most Apple peripherals are custom (though some have fun heritage like the Samsung UART, lol) but core USB is the same Synopsys DesignWare block you see in nearly everything (that’s not Intel nor AMD) these days.
That’s just the host/device controller though. To make it work you also need a USB PHY driver (Apple custom), a USB-PD controller driver (Partly incompatible Apple variant of an existing TI chip), and the I2C controller driver for the bus that chip talks through (which is actually a variant of an old PASemi controller).
Yeah, by “core” USB I meant the host/device controller :)
Frankly this is exactly why I like ACPI. With ACPI, all that glue-ish stuff around a generic core interface like XHCI mostly/hopefully can be written once in AML, instead of having to write and maintain drivers in each kernel. Which, yeah, from a Linux-centric perspective that last part is not a feature, but I personally like the BSDs for example, and I also hope that “the future” will be more Redox-shaped or something, so I’m always biased in favor of things that make OS diversity easier!
Heh, I was working on some low-level USB code on a microcontroller for a client. I’m staring the code and having this moment of deja vu “weird… I’ve never used this brand of chip before but for some reason a lot of this feels very familiar.”
I had worked on USB code before on different chips though. I opened up the datasheets from an old project and this new project and started looking at the USB registers. Sure enough, exactly the same names, exactly the same bit layout, exactly the same memory addresses. Google some of the names and find data sheets from multiple vendors and then find one from Synopsis. Sure helped debug things when I had my own reference implementation to compare against!
Booting other OS’s is intentionally supported, and doing so without compromising security for people who aren’t doing so is a significant amount of work. There’s a bunch of stuff describing how non-apple OS’s are installed without trivially undermining secure boot, and I think back in the early days of Asahi marcan talked about it a bunch.
I don’t see a meaningful distinction between “heap allocation” and “dynamic allocation”, other than the scope of the allocator: whether it’s OS-provided and global, or custom and specific to some subsystem. Either way, it has the same effect.
What’s the benefit to using Pigweed’s custom allocator over the global operator new / malloc? If anything it’s a drawback, because in the case that multiple subsystems implement their own allocators, memory is wasted because each one is holding on to free space that the others can’t access.
Nevertheless, there’s a lot of interesting info here on the innards of C++ coroutines!
A very meaningful distinction is that heap allocation, e.g. dynamic storage, will at some point call ::operator new, and has some chance of ending up doing a system call, which has implications in terms of latency for instance as this increases the chances for a thread to get scheduled-out. So far this was entirely precluding the use of coroutines in e.g. real-time audio processing loops unless you could assert at compile-time that HALO went into effect and every memory allocation got optimized out.
it’s definitely the only sane way to design some systems when you really want to make sure you aren’t ever going to take a lock.
There are a few reasons why someone might prefer a custom allocator over a global
new
ormalloc
.In some cases, it’s a hard requirement: not all platforms provide a global allocator. This is fairly common in embedded systems.
Custom allocators can be beneficial for reliability, too– the exact amount of memory available for the purpose of allocating a particular coroutine or set of coroutines can be fixed, and failure to allocate from that dedicated pool of memory can be handled gracefully, rather than allowing the coroutine(s) to gradually expand into memory that is dedicated for other tasks.
Finally, using a custom allocator can sometimes improve performance by taking advantage of cache locality or certain allocation patterns. For example, bump allocators and arena allocators can provide more efficient allocation because they only deallocate at the end of certain scopes or lifetimes. This makes allocation just a simple pointer increment and makes deallocation a total no-op until the end of the arena’s lifetime.
You’re preaching to the choir! I agree custom heap allocators can be useful. But they’re still heaps. I found your article a bit misleading because you imply you’ve found a way to run C++coroutines without heap allocation, which you haven’t. You’ve just built your own heap.
Sorry for the misdirection! I didn’t intend to be misleading– this is an unfortunately common terminology mismatch. I’m using “heap allocation” to mean specifically the provided malloc/free/new/delete, and “dynamic allocation” to refer to the more general case of runtime memory allocation.
I do also discuss in this post why dynamic allocation is needed and how I hope its use can be avoided in the future, so I think it is still relevant to those hoping to avoid dynamic allocation completely (though I’m sad that it isn’t currently possible today).
But this requires you to bring along an allocator. And if you are going to bring along an allocator, why not just implement the global
new
anddelete
overloads?The reason that I would want to avoid global new / delete on embedded platforms is that allocation is more likely to fail (because memory is tightly constrained) and so I need the caller to handle the failure case and I’m on an embedded platform that doesn’t support exceptions so it needs to handle allocation failure by the return address.
Your article mentions this about half way down:
From the API surface, it looks as if you provide a hook in the promise object that means that allocation failure can return a constant. This is really nice, you can have a coroutine call return something like an
ErrorOr<T>
and then return the error value if it fails to allocate space for the coroutine.It would probably be nice to lead with that. I had to read a lot to figure out that you’ve actually solved the bit of the problem that I cared about.
Yes, handling the failure case gracefully is super important! RE
ErrorOr<T>
, Pigweed provides apw::Result<T>
which is exactly this– either aT
or apw::Status
(our error code).Coro<T>
requires thatT
be convertible frompw::Status
so that it can produce a value ofT
frompw::Status::Internal()
in the case of allocation failure.Sounds nice. I might have a look and see if it can be ported to CHERIoT.
I’m going to flip this statement on its head for a sec. Using a fixed-sized arena allocator or something like that is great for having guarantees that you’re not going to blow up your heap. But… do we know at compile-time how big the allocations are going to be? Instead of using a fixed-sized arena, would it be possible to use something like a freelist allocator for each type that needs to get allocated? Like I could declare in my code that I’m going to assign a static allocation for 10
Coro
objects and 30 frames; the allocator could return those from the freelist and return them to the freelist upon deallocation (both O(1) without needing to ever end the arena’s lifetime).The size of the coroutine state will depend on the captured state (automatic variables at suspension points) so much like a lambda (closure type), its size is known by the compiler at compile time, but as far as I’m aware there’s no clear way to obtain it in your code at compile time, for example to be used as a template argument.
Yeah, this isn’t possible today, sadly– I discuss this a bit at the end of the post. If C++ offered a way to inspect the size of the coroutine of a specific function implementation, there would be no need for dynamic memory allocation: we could pre-reserve exactly the required amount of space. This would have huge benefits both for reliability and efficiency– no more need to guess at and adjust the right size for your allocators.
Little late getting back to this but I’ve had a little time to dig into some of the WGxx documents and I understand quite well why we can’t do this (yet anyway, hopefully someday!) For using it on an embedded system though, the HALO thing scares me a whole lot more than having to pre-reserve some space for an allocator (which also scares me less than having hidden/implicit malloc behind the scenes). On most of the smaller-sized embedded systems I work on, running out of space in a bounded arena and returning an error is a manageable thing. Changing the function that sets up a coroutine a little bit, resulting in HALO being available and significantly changing how much stack space is required… that’s spooky! At least with running out of arena space there’s well-defined error semantics. Walking off the end of a fixed-sized stack usually just means corruption.
I’m still digging a little more… it seems like there is a way to ensure HALO is disabled but having it potentially ok by default seems like a footgun for embedded, and we’ve got lots of footguns already!
(I’m still super intrigued through. This looks super cool!)
AFAIR the issue is that the size of the coroutine is only known by the compiler after optimization passes (as it depends on how local variables are stored, which lifetimes overlap or not, what can be reused) but the
sizeof(...)
operator is typically implemented in the frontend. The compiler writers said they would not be able to implement a coroutine proposal with frontend known size without heroic efforts. This was discussed extensively during the design of C++ coroutines, including a competing “core coroutine” proposal appearing and ending up dying on this same issue. Some alternatives were explored, such as types that did not support sizeof(…) but still could be stored on the stack, but it the end storing the coroutine on a heap with an escape hatch for ellision was deemed the best compromise.Ahhhhh interesting, that makes sense. I’m going to put it on my list of things to play with :). Cool project!
.. is it though ? how does that work if you have
It’s interested in the upper bound as common when you’re pre-allocating. So compute both, take the largest.
A more general case of the example above would be if
a
were a variable-length array withx
as the size. I would absolutely never write such a thing in embedded code (and typically not in normal C++ if there’s a chance an attacker can influence the savour of `x‘) but it is permitted.exactly, after all you can have recursive calls or alloca so there’s no way to precompute the entire required stack size statically
Yeah I wasn’t thinking about the general case too much, just the specific example, and assuming the statically sized array was the point that seemed problematic. You’re right it’s not always possible… in most languages.
Rust does know the size of each Future, but it doesn’t let you have dynamically sized values on the stack (on stable). And using that unstable feature, it’s not possible to hold a dynamically sized value across await points (#61335) as await points are where the size needs to be known to be stored in the generated Future’s enum variant.
Another reason is to support safe memory reclamation for a concurrent data structure.
Not sure what you mean by this? Just about all system allocators are thread-safe.
“Safe memory reclamation” is special jargon in this context.
If you are using RCU, hazard pointers, or other similar lock-free techniques, then a write to a datastructure cannot immediately free() any memory that was removed, because the memory can still be in use by concurrent readers. You have to wait for a grace period or quiescent period to pass before it can be freed. You don’t want to do this as part of the write transaction because that will ruin your write throughput: grace periods are relatively long and can accommodate multiple write transactions.
So you need to hang on to the to-be-freed memory somehow and reclaim it later. Sometimes the whole concurrent algorithm is easier if you guarantee type-stable memory by using a region of some kind. Sometimes you can save significant amounts of memory (a pointer per node) by using a GC instead of a to-be-freed list.
I’m not sure how custom allocators help here. They are not invoked until after the destructor has run, so prolonging the lifetime of the allocation using a custom allocator doesn’t help because the object is gone (you may get an arbitrary bit pattern if you try to read it). Things like RCU work fine without a custom allocator because they defer destruction, not deletion.
I don’t know how you can do things like data-structure-specialized GC or type-stable memory without a custom allocator. You don’t need those techniques for RCU (RCU was an example to set the scene) but they are sometimes useful for concurrent data structures. I also dunno how this would fit with the C++ facilities for custom allocators — I guess you have to avoid destruction until the memory is about to be freed (especially for type-stable memory)
Type-stable memory can be done with custom allocators, but it’s often better to do it with a class overriding operators new and delete. You can use CRTP to implement a superclass that provides a custom memory pool for each type.
I could be wrong but my recollection is that they use
alloca
which allocates arbitrary stack space instead of using actual allocators (which typically involve a system call in the worst case).Can someone explain the difference between the “classic” and async/Embassy approach?
I interpreted it as:
But perhaps sync (classic) versus async (Embassy) is the key difference being alluded to
I suspect it is sync vs async, but I suspect the classic would be higher level since it presumably has an embedded RTOS underneath it providing the sync abstraction, while Embassy presumably has a thinner runtime and leverages Rust’s async to support multitasking.
It’s just a hardware abstraction layer, definitely no RTOS there. It doesn’t do multitasking, just abstractions for interacting with the hardware.
So how does one do multitasking with classic? Like how do you run a bluetooth or wifi module while also doing the other things your application needs to do?
Independent pieces of hardware do stuff in the background when you’re not interacting with them, so you either need to check in every so often manually, or use interrupts. A common pattern is to have a big loop that goes around doing everything that needs to be done, and then sleeps when there’s nothing to be done, to be waken up by an interrupt or something else. You might write a bunch of state machines that you drive forward when the correct events come in.
That’s why async is so exciting for embedded. You can write what looks and feels like abstracted task code for an RTOS, but it compiles into a fairly predictable event loop with state machines, just like what many people write by hand.
That makes sense. When I’ve dabbled with embedded programming the “classic” approach usually involved pretty low-level management of network peripherals—you had to know the state machine for each peripheral and how to handle its events and then compose that state machine with the state machines of other peripherals and your application and it was always really painful especially if you misunderstood something about a peripheral’s state machine. Embassy looks more or less like what I’ve been wanting—I can’t wait to try it out.
Write it all manually with hardware timers and interrupts.
A few of my own patches made it into this release!
Unfortunately a bunch of others so far haven’t been accepted, I‘ll try again for the 9.2 cycle:
I’ve also been working on support for HVF’s in-kernel APIC support, but that turns out to be buggy, so I’m currently trying to figure out if there’s some baseline feature set that does work reliably…
I don’t quite get the appeal of the textual prompt about discarding the key. Overlay text breaks the immersion to some extent. If it really is useless now and there’s some emotional backstory, surely the player character could for example convey the emotion through gesture? Shake their fist and fling the key away? (I don’t really go for horror games, so haven’t played RE2 and don’t know what sense of relief the author is alluding to here.)
I liked Far Cry 2’s approach to the map. The first-person player character literally pulls out a map from somewhere and holds it in his¹ hand alongside the handheld GPS device. The interesting bit here is that the game keeps going on around you. You can run around while holding the map, even though it’s obscuring most of your field of view. If you encounter baddies (or they encounter you) while you’re staring at the map, they’ll quite happily start shooting you while you’re still fumbling to grab your weapon.
Far Cry 2 screenshot with player holding the map while standing
The same mechanic works while you’re driving too. Much as driving while holding up a map in one hand is rather hazardous in real life, the game tends to punish you for doing this as well, as tempting as it frequently is.
Far Cry 2 screenshot with player holding up the map while driving
The rest of the HUD is similarly understated and disappears most of the time, though a health bar and number of health syringes will show up when you’ve taken damage. (And IIRC the screen will momentarily be tinted red when taking a hit.)
Later games in the series started cluttering the screen a lot more unfortunately. (They also undeniably tidied up some of the more unpolished game mechanics, but in my opinion never quite recreated the atmosphere of danger that Far Cry 2 conveyed. Nor the moral ambivalence.)
¹ Unfortunately, all playable characters are male, even though key NPCs you meet in the game will be chosen from the pool of player characters plus a woman who for some reason is not playable.
the relief from the player character comes from the fact that in RE/2 you have very limited inventory space. Even just carrying a key around is a whole ordeal, and when you open a door and still have to hold onto the key, that means that you have new challenges ahead, and you don’t even get an extra inventory slot (which you could use to, for example, pick up healing items).
There are absurd situations you can end up in where you have no free inventory slots, but no healing items, and are hurt, and there is a healing item right in front of you and you can’t use it. It’s fodder for a lot of jokes, but it’s also an amazing source of tension.
None of this really makes me a better programmer, but it’s fun to think about.
I love those sorts of things, depending on the game. Helldivers has an interesting middle-ground to this; there’s a HUD, but it’s quite minimal and digging for more information tends to have real consequences. Your weapon has no targeting reticle, unless you actually aim down the sights of it. The ammo counter is a numberless “full/kinda full/empty” bar, unless you bring up the context menu that lets you switch weapon modes and such; only then does it say “23/30 rounds”. You have a local minimap, but zooming out on it to see the full map means your character literally looks down at the PDA strapped to their arm and stops aiming their weapon at the horrible bug-monsters coming to claw out their eyeballs. In general you can be fighting for your life or figuring out what the hell you’re doing with your life, but not both.
They do a good job of using semi-artificial inconveniences like that to add to the gritty dystopia of the game, and to make the decision making and frenetic pacing more hectic. This now makes me wonder what the game might look like if even more of the information in it were implicit rather than given in the HUD.
By the way, the term that describes a UI element like, “You pull a map out and the game doesn’t pause” where it’s internal to the world is “diegetic.” I’d go so far as to say that diegetic UI elements are always superior to virtual ones. The post makes mention of how Isaac’s suit in Dead Space shows his health on the spine, which is also a diegetic UI element.
I think “superior” is doing a lot of work in that sentence. Diagetic elements are always more immersive, but not every game is about being as immersive as possible.
Imagine in real life if you could have a map that stopped time when you used it and filled your entire field of vision, you would probably find that a lot more useful than a paper map! So I don’t think it’s as simple as saying diagetic is better every time.
Diegetic doesn’t always mean “time doesn’t stop when you look at a map” so much as, “the map is a fundamental part of the world rather than an arbitrary menu.”
This blog announces or reviews a ton of SBCs and micro-PCs, but I thought this one in particular was interesting and might be newsworthy here, as it’s basically “Raspberry Pi 5 but it’s x86, can run Windows, and btw supports fast SSDs.” A cute detail is that there’s a RPi2400 chip on board to drive the GPIO pins.
“but there’s also a boot selection switch that lets you put the RP2040 into USB mass storage mode for firmware uploads.” Oh that’s cool
That’s the usual way to upload firmware to an RP2040 and not specific to this board. As far as I’m aware this functionality is part of the bootrom inside the RP2040 and completely hardwired.
(Incidentally, I’ve previously used an RPi Pico as a kind of GPIO breakout board in an x86 SFF PC based custom appliance as well. They are so small, and the decent USB support makes it a good choice.)
I interpreted it as you could flash the 2040 in-line from the main board
In-line as in, it shows up as a USB mass storage device connected to the x86 computer? Yes, that’s what I’d expect. The RP2040’s USB port must surely be wired directly to a USB 2.0 host port on the x86 SoC, so it’ll show up as whatever type of USB device the code running on the RP2040 exposes. In the case of the naked bootrom with BOOTSEL held, this is a USB MSC. With the stock firmware, it would presumably show up as some kind of GPIO-controlling USB device. (HID?) That’s assuming it even ships with any kind of useful firmware on the RP2040. (The “documentation” link on Radxa’s X4 page seems broken, I’ve not researched beyond that.) With TinyUSB, you can program it to expose whatever USB interface you like.
The tricky bit would be resetting or power cycling the RP2040 in isolation without having to power down the host SoC. The “BOOTSEL” button on an RPi Pico normally only has an effect during very early startup, before the bootrom has loaded the current firmware. Perhaps the GPIO connector exposes the RUN pin (30) which can be used to hard reboot an RP2040. (On my SFF PC based build I run both BOOTSEL and RUN contacts to externally accessible push buttons for exactly this reason.) Another option is to have a software reset trigger in your RP2040 firmware. This’ll only work if you didn’t make any mistakes in coding your firmware. (Now who would do such a thing. 😅) Otherwise you’d have to power cycle it, which presumably means power cycling the whole board.
I received an email saying that I was eligible for a new crypto offering that was targeting open-source developers. They claimed they had checked my GitHub username, and that I was in. They offered $200 to do it for me, all I needed to do was send them the private SSH key associated with my GitHub account, so they could claim the reward. One important detail is that it had to be the private SSH key associated with my account at the time the contract was added to the blockchain, so I couldn’t just make a new temporary one.
I investigated it and the crypto distribution was real. I was able to claim it myself, and I sold the tokens as soon as I was able, for almost $3000.
What the hell. Even after explaining it it sounds like a scam. Did you at least rotate all your keys?
I did! But by doing it myself I didn’t have to send anyone my private SSH key, I just had to sign a challenge token with it, which they verified with the public key that they had collected from GitHub.
The process was still sketchy, they had a Docker container that you would have to run. The container would read the private SSH key (which had to be mounted to the container) and run a program to generate the response token. They did recommend running the container without a network attached, and I turned off my wifi just in case as well. The container would die after printing out the response token, so I think it was pretty safe.
This is the token, if anyone’s interested: https://claim.fluence.network/wallet
That’s pretty nuts. I have to ask: what’s in it for them? And why the super unsafe delivery if it’s some altruistic cause?
I’ve been receiving those emails as well and assumed they were scams aimed at injecting malicious code into my github repos or exfiltrating code from private repos. I have zero practical experience with cryptocurrency, so I guess I’ll let this “opportunity” slide as it sounds like more hassle and risk than it’s worth even if it’s not a scam.
I think they gain by creating traction among developers, that would be my guess. And since they created the token when they give it away they’re not losing money, it’s all fictitious anyway. :)
They seem to assume that the only reason packets ever drop because the link is full of traffic. Besides the fact that dropping packets in collisions is part of Ethernet, you will get dropped packets sooner or later any time you use wifi, cellular internet, damaged/degraded wired networks, long-distance wired networks, or wired networks that wander near nasty sources of interference. So, you know, always. TCP does a great job of covering this up when a) latencies are low, and b) ~99% of packets still get through. (Tried to find the article that graphed this but I can’t; I think it was from Google or Cloudflare?)
QUIC is pretty good, definitely use it in 99% of the places you’d normally use UDP. But the question is not “good data” vs “bad data”, it is “mostly-good data now” vs “perfect data maybe half a second from now”. Especially for video telecoms and stuff like that, a garbled frame or two and some weird distortion for a few seconds is better than someone just cutting off and waiting a painful few seconds for TCP and the video/audio codec to get their collective shit together before you get to say “sorry you cut out there, can you repeat that?”
Except that Ethernet and WiFi handle detecting the drops and doing retries, etc, at the link level. They don’t rely on TCP semantics
Modern wireless standards do an impressive amount of forward error correction and so will provide an abstraction that looks like a reliable link even with a lot of frames being dropped. This is critical for satellite links, where a retransmit is hundreds of ms of latency, but is increasingly important for WiFi at high data rates. The faster your link speed, the more data you need to buffer. 802.11bn is supposed to scale to 100 Gb/s. If it takes 1 ms to retransmit, you’d need a 100 Mb buffer to avoid slowdown for a single dropped packet (or you’ll have a smaller window and so get back pressure and slow down). If you have enough error correction, you just get the frames arriving slightly out of order once you’ve received enough to reconstruct the dropped one.
I think modern wireless protocols dynamically adapt the size of their error correction codes when they detect loss, so your speed just gracefully degrades in most cases, but you don’t see huge latency spikes.
Interesting, isn’t that wasteful when a higher layer is ok with some packet loss? But OTOH if the higher level is doing its own retries, and the least reliable link is the first one, it makes sense to retry early where the latency cost is lower.
Maybe the link should do a small number of retries, to get a good balance of latency and reliability.
Not a deep expert on this, but I seem to remember TCP assumes packet loss is down to bandwidth limitation or congestion, and will respond by reducing the transmission rate right down, only slowly ramping back up. The typical packet loss scenario in WiFi is that there‘s some momentary interference, or the geometry of the link has changed, so the peers need to switch to less sensitive QAM encodings, etc. Until that happens, a whole batch of packets will have been dropped, which TCP interprets as catastrophic bandwidth reduction even if the wireless link recovers very quickly. So the higher layer doesn’t cope very well with the specific pattern of loss.
Re. TCP goodput, are you thinking of the Mathis equation? https://lobste.rs/s/5zjwgs/golang_is_evil_on_shitty_networks_2022#c_qxm49s
The conclusion of the article surprised me: it recommends a fairly complicated collection of mechanisms. I thought QUIC was going to get support for lossy streams so that it could avoid creating a backlog of wasted work when the network conditions suddenly change.
The author is working on a spec for media streaming over QUIC, so they’re certainly aware of this.
Yeah, yet they talk about everything except it.
i tried to watch the video, but trying to concentrate on the text while there’s fast-moving game footage in the background made me feel dizzy.
I try hard to keep an open mind and not be a video-hating old fogey, but this one really would be so much better as a blog post. No distracting fast moving footage, pumping music, and the ability to move at my own pace. If it really must be a video, then at least make it in a “boring” lecture/conference presentation style. (You can still show off your creation and show in-game footage between the explanations.)
But then I’m not really sure what the intended target audience is supposed to be… on the one hand, it moves at a pretty fast pace through some intricate assembly language sequences and mathematical expressions, but on the other hand the narrator feels the need to clarify what SIMD is.
Good news, there is a blog post, linked from the video: https://whatcookie.github.io/posts/why-is-avx-512-useful-for-rpcs3/
I think the audience is supposed to not care about the actual instructions, it’s more like “look at this cool thing”
The accompanying blog post is more useful for learning
Is the basis of their new “Private Compute Cloud” macOS, or could it be FreeBSD?
It’s probably the fact macOS coreutils is based on BSD ones, and Apple has perhaps decided to be nicer and upstream bug fixes.
Apple oscillates a bit here. They either do nothing, point FreeBSD committers at patches they may want, or actively push things. They’ve never been particularly consistent about it and it changes over time. They funded the TrustedBSD MAC work, for example, which landed in both FreeBSD and XNU and provides the kernel bit of the iOS / macOS sandboxing framework. In contrast, they extended all of their core utilities to support
-h
and-H
consistently for decimal and binary SI prefixes and didn’t upstream those. They offered to relicense some bits of Darwin that we were considering bringing into the FreeBSD base system.There’s often a reasonable overlap between folks in the Apple CoreOS team and FreeBSD committers, which helps. I wouldn’t take this as a shift in Apple policy to treating FreeBSD explicitly as an upstream as they do with LLVM, though that would be very nice.
They funded parts of TrustedBSD, but they’re deprecating its subsystems on macOS since at least 2019. For example, OpenBSM is already deprecated since 2022, MAC Framework is deprecated since 2019.
As I understand it, their sandbox framework is built on top of the MAC framework. They have largely removed support for third-party kernel modules, which makes it impossible to use the MAC framework’s lower levels, only their higher-level abstractions.
It’s a shame that they never merged Capsicum, since it was designed to support things that look like their sandboxing model.
They have built something on the MAC framework (e.g. EndpointSecurity is currently built on it, for the time being), but the thing is that Apple didn’t allow using it much earlier than when kexts were deprecated. In fact, developers started to use it when Apple has released headers of MACF by mistake, not because Apple considered MACF to be a valid kernel API.
I’m not sure what point you’re trying to make here. Indeed, Apple never officially considered the MAC Framework a public API in macOS/XNU, so why do you consider it deprecated since 2019? And what does that have to do with Apple’s FreeBSD contributions? MACF was never supported on macOS in the first place. In fact, breaking API/ABI changes were frequent long before 2019. They just started cracking down hard on kexts in general in 2019.
The point is that even if Apple has funded creation of subsystems mentioned above, it’s not possible to use them on their platform, because either they’re deprecated and will be removed from the OS altogether, or were never allowed to be used by thirdparties. OP’s comment suggested that Apple has funded them and they’re available to be used, and that’s not true today for OpenBSM, and it was never true for MACF.
I never said (or intentionally implied) that Apple would expose or intend something for third-party developers on XNU just because they funded its development in FreeBSD. I did suggest that, if they’re in the open source bits of Darwin then it’s often possible to pull fixes from there into FreeBSD, but that’s very different.
So Apple’s monopolistic behavior is bad, right? None of this would happen if the connections didn’t require a physical licence key in connected device.
I think a manufacturer can legitimately require some sort of minimal technical compliance for accessories, especially as they can affect battery life (as in this case). The products stating they are compatible with Lightning are just committing fraud.
As to the “monopolistic” behavior, there are other device manufacturers that don’t use Lightning.
It’s not strictly a classical monopoly, in the sense that other people also make phones that don’t have this bullshit – but it is emphatically anti-competitive and flagrantly anti-consumer. We used to just have a literally standard headphone jack but obviously that had to go to make room for a whole new line of credit for Apple. If it wasn’t a racket it would be USB-C by now.
There is a new sort of monopoly that we need to address, though, and which people are beginning to try to address – it affects both add-ons for what should be standard ports like this as much as it affects app stores with technological measures to ensure you can’t skirt around the king’s wishes on his land. It’s essentially like a monopoly but on the third of customers (or whatever it is now) that have an iPhone, allowing Apple to collect all sorts of rents. It’s like Comcast wanting to charge Netflix (and anyone else) for peering as well as charging their customers for that same data – good work, I suppose, if you can get it!
Apple has since switched to USB-C (September 2023, with the release of the iPhone 15)…
that was due to EU legislation
There’s more nuance here than people usually want to admit. Apple had already switched the rest of its hardware lineup to USB-C, and in one case (laptops) actually was getting massive market pushback to return to putting at least a couple non-USB-C port types in.
For the iPhone specifically, I believe they were pretty clear when the Lightning connector came out (USB-C was still in draft/development at the time) that they intended to stick to it for at least ten years, because they didn’t want to switch and then immediately switch again, causing a bunch of needless churn and obsolescence and e-waste. The EU, coincidentally, passed its USB-C mandate nearly exactly ten years after the Lightning connector debuted, and Apple switched to USB-C the very next year.
Anyway, it amuses me that people deployed the same arguments to criticize Apple for switching to USB-C in the rest of its lineup (greedy! just trying to force us to buy their dongles/accessories!) and to criticize for not switching the iPhone over at the same time (greedy! just trying to force us to buy their dongles/accessories!).
Amusing perhaps, but not unreasonable I think. Timing was very different in the two instances. Switching laptops to USB-C only in 2016 was very much ahead of the curve. Adapters were pretty much essential at that time. Now, most displays have USB-C inputs, most peripherals come either with both cables or in 2 variants, etc. Replacing the 2 Thunderbolt 2 ports with Thunderbolt 3/USB-C and keeping the USB-A ports for the first 2 years would likely have offended fewer people. (Plus, I seem to remember there being fast and slow ports on some models, and thermal issues on some ports as well.)
Apple were pretty much the last manufacturer to switch their phones to USB-C, ports however. I don’t think it’s particularly unreasonable to criticise them for it.
And what, I wonder, might have created an impetus in the market for companies to start offering USB-C? Perhaps the existence of popular laptops with USB-C ports…
This is mostly down to the fact that USB-C requires a bunch of different use cases, functions, and feature sets to all share a single connector/port form factor. Apple didn’t maliciously impose that on the industry.
Again, my understanding is that when they introduced the Lightning connectors they said they’d stick to them for at least ten years as an anti-waste measure. And they did that. The narrative that Apple was somehow finally forced, unwilling, into USB-C by EU law after being a holdout is just completely manufactured out of thin air.
It doesn’t even need to be “slow internet.” On a recent trip to Australia, I noticed how large parts of the web felt considerably slower than in Europe, even on a nominally good connection. (<30ms latency to major CDNs, >100Mb/s bandwidth) Services just aren’t built with thoughts to latency anymore.
Yep don’t think most US based tech workers are aware of the latency a lot of the world experiences making requests into US based services. It’s always a delight for me when I use a service hosted in Australia and then I remember that is the normal experience for US based folks instead of the exception.
Any guesses which device these panels originated from? This description doesn’t seem to match my MacBook display.
I have no inside information, and I’m not sure about the rounded corners, but the only other laptop I could find within a minute or two of searching that has a 2880x1920 13.5“ display is the Lenovo ThinkBook 13x IMH G4 (or rather, some configurations of it have such a display).
Using a ‘find first set/ffs/clz/bsr’ operation/instruction surely would be nicer than using a lookup table for decoding the prefix.
Hm, something does not add up for me. It seems that the core thesis is that an ISA is an interface, and that in this area the interface doesn’t matter all that much and the results are defined by the quality of implementation.
But then, we have aarch64 Apple desktop computers which, as far as I understand, still basically run circles around x86_64 desktop computers. I think originally people explained performance by timing — Apple was first to jump on the 6nm process. But that those several years ago, right? So, by now, any timing advantages should disappear? Where’s the inconsistency here?
From my (only partially informed) perspective, the main genuine ISA advantage aarch64 seems to have is how easy it is to decode in parallel. In particular, the M-series chips all use an 8-wide instruction decoder, while the widest x86 chips have only recently achieved 6 parallel instructions.
The “running circles” part is incidentally only true for efficiency; Zen4 and Golden Cove both beat Apple’s core designs in raw performance on most metrics, at the cost of higher clock speed and energy consumption. It’s hard to say where exactly AMD and Intel would end up if they focused more on efficiency than the top spot in the raw performance benchmark table. Likewise, it will be interesting to see if Apple starts diverging its core designs between A series versus M series chips. Right now they seem to be essentially identical.
While it might be true, I don’t think “AMD and Intel only care about raw performance” has much explaining power when it comes to describing the lack of efficient x86 chips. Not only is it conspicuous that AMD and Intel have enormous incentives to care about efficiency by way of several important efficiency-sensitive markets (data center, mobile, tablet, laptop) that they are neglecting in favor of some other market (which?), but everyone else who is building an efficiency-oriented chip steers clear of x86. When Amazon wanted to build an efficient data center CPU, they built on ARM despite that an x86 would have been far more compelling all else equal (they and their customers wouldn’t have to deal with a change of architecture). The same was true for Apple and for Google and so on.
There’s certainly something about x86 that seems less compelling from an efficiency perspective.
Not a hardware expert. My understanding, informed by things fabian giesen and maynard handley have written, is that, broadly speaking, it’s vertical integration: apple made tradeoffs that improved power efficiency significantly at the cost of a small amount of performance (see for instance the tlb design, which is quite clever). In particular, intel targets a ~5-6ghz frequency for its cpus, where apple targets ~3ghz; this means apple gets twice as much time on each cycle. But then apple has a definite advantage in context of a laptop, where the cpu is only going to be running at 3ghz anyway.
Instruction decoding is a thing, as the sibling says. Intel seems to have recently found a strategy for scaling up parallel decode, the details of which I have forgotten. Beyond that, the main thing that comes to mind as potentially inhibiting x86 performance is the concurrency model. Enabling tso on apple cpus costs about 10%; this is likely an upper bound on how much it hurts intel, so I would expect the actual cost of tso to intel is much less than 10%.
Instruction density is sometimes touted as an advantage of x86, but it’s not all that true. x86 can be denser than aarch64, but it’s much less dense than it could be.
Nice references w.r.t. ‘risc/cisc’: https://www.youtube.com/watch?v=JpQ6QVgtyGE&t=4327s https://danluu.com/risc-definition/
Good thinking on TSO and memory model. I seem to remember, during the initial phases of the arm64 transition for Mac, Apple engineers were saying the ObjC (atomic) refcounting operations on arm64 were taking single digit nanoseconds (7-9?) while the same operations were taking around 42ns on the fastest Intel Macs. OK, this is of course an extreme microbenchmark, but it does all quickly add up.
If I’m reading the Intel docs right, this is another big flaw in the ISA.
AFAICT atomic increment must have the
lock
prefix, which also means a full memory barrier. So cloning an object force cache flushing on unrelated data on Intel, where as ARM allows you to do an atomic increment with memory order relaxed, causing contention only on the counter object itself.Read-modify-writes are somewhat different, and those are indeed significantly more costly on x86 (though they have gotten cheaper—down to ~10 cycles on the most recent uarches, I think, form ~20?—compare with 3 on apple). But they should also be fairly rare (strewing atomic reference count updates all over your code is poor design). I’m talking about the cost of ordinary loads and stores, where the difference should be much less.
You may consider it bad design, but Objective-C’s ARC (and presumably to some extent Swift’s built-in object system) will insert refcount operations all over the place, and this accounts for a lot of code running on Apple devices. Yes, the compiler uses linear types to try to erase unnecessary retain/release pairs, but this only works within a method (or across inlined function calls). I’ve definitely encountered performance critical code where refcounting started dominating runtime, and we either had to locally disable ARC or (where possible) move data structures to flat C (or C++).
I’ve put down fibre across various runs in our house and home office, for fast access to local network storage rather than a high speed internet link. I didn’t want to mess with splicing, so I opted for FS’s MTP cables and panels. You still have to get the MTP connector through the conduit, but it’s significantly smaller than say an RJ45 plug, and wasn’t an issue in practice. All in all, I’ve had less trouble with the fibre than with multi-gigabit copper ethernet, which I’ve found to be rather fussy.