1. 38
    1. 25

      The funny thing about the airline example is that airlines do overcommit tickets, and they will ask people not to board if too many people show up…

    2. 14

      The fundamental absurdity of how the OOM killer works is strong evidence that overcommit and maybe even disk-backed virtual memory are fundamentally broken user-visible features at the minimum. Pretending like resources are there that aren’t there isn’t a good way to build robust systems.

      I feel like system designers in the 80s jumped head first into virtual memory while hand-waving or before fully thinking through all the robustness consequences it would have. Just because we can build systems that can simulate having more memory than they do doesn’t mean we should. This type of “we built cool tech, now let’s use it without considering the consequences” is so pervasive in the non mission critical computing world.

      1. 8

        Pretending like resources are there that aren’t there isn’t a good way to build robust systems

        I strongly disagree. I wrote a lot more last time, but the TL;DR: Robustness is built by handling failure gracefully and handling failure at a coarse granularity is far easier than handling it at a fine granularity. An approach that requires handling it at the fine granularity make the probability of needing to handle it higher, which increases the chance of getting it wrong.

        1. 6

          Reading your linked comment, it seems we have a pollution problem here:

          1. Indeed in practice, most programs don’t handle allocation failures gracefully. They’ll just dereference a NULL pointer and crash on the subsequent SEGV with no explanation or even stack trace.
          2. It thus makes sense to try and make allocation failures as rare as possible.
          3. Why bother handling allocation failures if they’re so rare?
          4. Now even programs that could have handled allocation failures can’t, because the kernel lied to them.

          One could argue that desktop & server OSes are not meant to handle critical systems that need to be absolutely certain that the memory they’ve been given is real (the same could be said of real time). Which wouldn’t be too bad if we had any choice. But since outside of extremely constrained environments we only have 3 OS kernels on the planet (we can probably ignore anything that isn’t NT, Linux, or BSD), we end up trying to use them for more than what they’re actually good for.

          These days we can’t even ship a video game, and guarantee with a particular hardware configuration that it will run at this many FPS with no stuttering or crashes. It was possible on older computers and consoles, but somehow as our machines became orders of magnitude more powerful we lost that ability.

          I’m not sure how exactly we should go about it. There’s probably no practical path beyond solving the 30 million lines problem (which most probably require splitting hardware and software activities in separate companies), but at the very least, it would be nice if we could have reliable blame: if something goes wrong, I want to know who is responsible. And the easiest way I can think of is to shift the blame to programs as much as possible.

          It would likely require technical error messages like “Sorry, Discord required more memory than the system could spare (currently 300MB). (Note: you currently have 63 programs running, total available RAM is 8GiB)”. I don’t know how someone used to “Oops, something unexpected happened” would react though.

          1. 5

            But since outside of extremely constrained environments we only have 3 OS kernels on the planet (we can probably ignore anything that isn’t NT, Linux, or BSD), we end up trying to use them for more than what they’re actually good for.

            Keep in mind that Windows doesn’t do overcommit. We don’t have to go and look at a hypothetical alternate universe of non-UNIX things to see what systems without overcommit look like, there’s a real non-UNIX world without overcommit right there. And it’s absolutely awful to work with.

            It would likely require technical error messages like “Sorry, Discord required more memory than the system could spare (currently 300MB). (Note: you currently have 63 programs running, total available RAM is 8GiB)”. I don’t know how someone used to “Oops, something unexpected happened” would react though.

            Most early ’90s operating systems would give errors like that. RiscOS had a nice GUI for dynamically configuring the memory available to different things. Oh, the fun we had making sure that the lines were just long enough that programs could launch.

            NT just reports failure in VirtualAlloc and that’s typically propagated as an SEH exception. And then a program crashes with an out-of-memory error. And then you look at Task Manager and see that you have 60 GiB of RAM free and go ‘huh?’.

            1. 4

              there’s a real non-UNIX world without overcommit right there. And it’s absolutely awful to work with.

              Something I don’t understand, is the huge disconnect between UNIX users and quite a number of Windows developers (especially game devs): each camp says the other OS is significantly worse to work with, and both seem to be ignorant of quite a few things the other has to offer, it’s really weird. (Disclaimer: I’ve never developed for Windows directly, the closest I ever got was using Qt.)

              One important thing where Windows seems to have the upper hand is ABI stability: even though the Linux kernel seems to be the king here (“We never break users!!!”), Linux distributions seem to have a hard time running most GUI applications that have been compiled more than a few years ago. And I wouldn’t know how to compile a binary that would work on most distros, especially a 3D multiplayer game. I wouldn’t be surprised if it required bypassing the distro entirely and run inside a Docker container or similar.


              RiscOS had a nice GUI for dynamically configuring the memory available to different things.

              Crap, I envisioned exactly this as one possible solution, it is indeed a usability nightmare.

              Here’s the thing though: over the years we’ve observed programs being more and more resource hungry, to a point where it cannot possibly be ethically justified — not even by the additional functionality and better graphics we got. Clearly application devs got away with this. This is bad, and I want this to stop. Unfortunately I don’t have any good solution right now.

              NT just reports failure in VirtualAlloc and that’s typically propagated as an SEH exception. And then a program crashes with an out-of-memory error. And then you look at Task Manager and see that you have 60 GiB of RAM free and go ‘huh?’.

              Dear Lord, I’d take overcommit over that any day.

              1. 3

                One important thing where Windows seems to have the upper hand is ABI stability:

                Yes and no. UNIX comes from the same tradition as C, whereas Windows was designed to be language agnostic. This means that none of the system data types are standard C types and the core APIs can be used from C, Pascal, and other things.

                Core libraries use this, and higher-level ones use COM. There is no platform C++ ABI and Visual Studio changes the C++ ABI periodically independent of Windows releases. MinGW supports, I think, three different C++ ABIs. In contrast, for the last 20 years or so, *NIX platforms have all used the Itanium C++ ABI and had a stable C++ ABI.

                In terms of system APIs built on top of these low-level ABIs, Windows is definitely better. Qt, GTK, and so on break their interfaces far more often than Win32. That’s not to say that Win32 is prefect. I have more success running older Windows programs on WINE on macOS than I do on newer Windows.

                There are some things Windows does well, but a lot of the system was aggressively optimised around machines with 8-16 MiB of RAM and has kept those core abstractions. It is to desktop operating systems today what Symbian was to mobile operating systems in 2007: lots of good solutions to the wrong set of problems.

        2. 2

          The concept of trading off fine grain error handling for course grain error handling may be useful in specific scenarios but it does not work here. For one, these course grain errors aren’t being handled in practice in the vast majority of cases. The only reason systems with OOM killers seem to work is because the systems are usually so over provisioned with RAM that they never actually operate in an overcommitted state. This makes these systems a ticking time bomb.

          Wholesale ignoring of these course grained memory exhaustion errors is tolerated because software malfunctioning in non mission critical contexts in general is tolerated as long as it doesn’t happen too often. The downside is that software in non mission critical contexts becomes known to be critically unreliable and users of this type of software, otherwise known as consumers, start to accumulate erratic behaviors to deal with the risk of unpredictable failure, e.g. pressing save after every sentence written, even when auto save is on.

          From a pure system design standpoint, admitting overcommit and OOM killing changes the process abstraction in a substantial way. Now you must account for your program dying randomly through no fault of its own if you want your program to be correct. It’s absurd to engineer a program under these hostile circumstances. The implication here is that every program must have a process monitor to handle random deaths. Oh wait, but now the process monitor must have a process monitor, ad infinitum. Okay then let’s add a special case for the root process monitor. Hmm this asymmetry is kind of smelly, maybe it would be simpler to not have programs randomly crash through no fault of their own?

          If we must sacrifice fork() and mmap() to get a sane process abstraction and deterministic operation under resource exhaustion, so be it.

          1. 6

            The only reason systems with OOM killers seem to work is because the systems are usually so over provisioned with RAM that they never actually operate in an overcommitted state. This makes these systems a ticking time bomb.

            Meanwhile, on the Windows desktop I used at Microsoft (which didn’t have overcommit), allocations would start to fail when memory usage was at 60% and programs would crash with unhandled exceptions from allocation failures.

            This isn’t some hypothetical alternate reality, the most popular desktop operating system does the opposite thing and it doesn’t work well at all.

            Wholesale ignoring of these course grained memory exhaustion errors is tolerated because software malfunctioning in non mission critical contexts in general is tolerated as long as it doesn’t happen too often.

            And that’s how you build reliable systems. Software cannot be 100% reliable because computers cannot be 100% reliable.

            From a pure system design standpoint, admitting overcommit and OOM killing changes the process abstraction in a substantial way. Now you must account for your program dying randomly through no fault of its own if you want your program to be correct

            As you must for any non-trivial program even without overcommit. Unless your program is 100% bug free (including all of its dependencies) it will crash at some point. Again, you build reliable systems by letting things fail and recover.

            Apple systems handle memory very well because the kernel supports Sudden Termination. Processes advertise that they are at a point where they have no unsaved data and so can be killed with no cleanup. This lets them use memory far more efficiently than Windows.

            1. 3

              Meanwhile, on the Windows desktop I used at Microsoft (which didn’t have overcommit), allocations would start to fail when memory usage was at 60% and programs would crash with unhandled exceptions from allocation failures.

              The difference is that these are identifiable errors in the program with stack traces. This can be addressed and fixed like any other bug. Non-deterministic OOM deaths cannot.

              Software cannot be 100% reliable because computers cannot be 100% reliable.

              As you must for any non-trivial program even without overcommit. Unless your program is 100% bug free (including all of its dependencies) it will crash at some point.

              I think these statements muddy the waters between process death per-spec and process death due to malfunction or error. There is useful concept known as “normal operation.” Something that is specified to happen is something that I must build machinery to handle. Something that is not specified to happen is something that I do not need to build machinery to handle. If OOM deaths are specified to happen (and they are) then that means that user-space is responsible for handling random death, yet it’s common knowledge that close to 99% of software that exists doesn’t handle that situation. The implication is that you are saying nearly all software is incorrect. Is that right?

              The problem is that while it may be that case that nearly all user-level software is critically unreliable due to not handling random OOM death, at those high rates of non-compliance I think it’s reasonable to say that de facto it’s not the user-level software that is wrong but the OS. At some point the OOM killer spec was invisibly implemented and close to 0% of programmers got the memo.

              I would just add that placing this burden on user-space makes a system that never actually runs your program and always crashes it a valid system according to spec, which is the absurd logical conclusion to which I was hinting. Compare this to a system that is specified to only return memory errors when there is no memory left.

              Apple systems handle memory very well because the kernel supports Sudden Termination. Processes advertise that they are at a point where they have no unsaved data and so can be killed with no cleanup. This lets them use memory far more efficiently than Windows.

              While the efficiency gains may be true, the only reason that this works is that this feature is broadly advertised and application developers are designing against this spec. As I said earlier, this is simply not the case on BSD, Linux, macOS, et. al. I want to note that according to the arguments you are making above, this feature is misleading since programs should always be in state where they can be killed with no cleanup, since computers are not 100% reliable.

              1. 3

                The implication is that you are saying nearly all software is incorrect. Is that right?

                I’m uncomfortable with overcommit too (indeed I tend to disable it in Linux), but… isn’t it accepted as true that nearly all software, or at least nearly all complex software not formally verified, is incorrect?

                1. 2

                  It’s probably a safe bet that every complex piece of software is not 100% correct and there is at least one subtle issue lurking somewhere in there. That said, I think there is a meaningful distinction between a subtle bug and a blatantly ignored condition that is specified to happen.

            2. 2

              I will grant that the Windows experience sounds horrible (having no firsthand dev experience on Windows myself) but do we necessarily know that it is caused specifically by overcommit, or could it be some other bad aspect of memory allocation policy or design? It seems a stretch to say, Windows fully commits, Windows is terrible, therefore fully committing is terrible.

      2. 5

        system designers in the 80s jumped head first into virtual memory

        Virtual memory dates back to the Manchester / Ferranti Atlas computer designed around 1960, and it became common in other large systems in the next few years.

        1. 1

          That’s a fair point. I was implicitly referring to virtual memory adoption and spread in Unix and related systems, e.g. Mach. From here is where I see the main lineage of contemporary widespread systems. Maybe considered by some as our current OS monoculture.

        2. [Comment from banned user removed]

        3. [Comment from banned user removed]

      3. 2

        I feel like system designers in the 80s jumped head first into virtual memory while hand-waving or before fully thinking through all the robustness consequences it would have.

        Bare in mind that they didn’t have as much resources that we have today, so I still believe went this path knowingly.

      4. 1

        In practice though, do you often suffer when the OOM killer strikes?

        I don’t use Linux on my laptop, but at ${dayjob} we have a bunch of Linux servers, and we never had a problem with it AFAICT.

        1. 5

          I’ve seen it happen on servers, where it would randomly decide to kill exactly the wrong process. Especially when running “standard” somewhat badly behaved software like Drupal (which can be very memory hungry).

          1. 1

            Considering what David C wrote in other comments regarding the challenges of handling memory failures with fine granularity – would it not then make sense to try to make improvements at the application level? Can Drupal/the application be rewritten to become less memory-hungry? If that’s not practical or takes too long time, then memory is relatively cheap these days – one could also increase the memory available to the application.

            1. 3

              Sure, but when you’re hosting websites with a standard package, there’s a tension between how much hardware you can throw at it and how much you can charge the customers.

              It’s actually quite a shitty place to be in - either you host yourself and have to increase the hardware and hopefully the customers don’t run away because hosting gets too expensive, or they host elsewhere and their application gets killed or is too slow and they come complaining at you.

              I’m glad I’m not working in a place where we use these overblown CMSes/“frameworks” anymore.

            2. 3

              Most *NIX systems, rather than having an OOM killer, will cause page faults when you try to access memory that you’ve mmap’d but where there aren’t available pages. This tends to work quite well with things like Drupal. The php-fpm process takes a segfault, crashes, and is automatically restarted. You return a 50x error for one request and then the PHP runtime is restarted with the minimum memory required and can grow again.

              The overall system is then fairly reliable.

    3. 9

      Now I’m curious. Is there any Unix system that doesn’t overcommit memory?

      1. 5

        Not a “Unix”, but Haiku doesn’t by default. I believe there are mmap flags for it when required.

        1. 2

          Yup, this bit is in the Haiku port of snmalloc. We allocate a flat data structure metadata per page of address space. This gives us a very fast mapping from address to metadata (shift, add, load). Physical pages get allocated when we write the metadata for an allocated page. For Haiku, we need to pass MAP_NORESERVE to mmap.

          Not a huge problem, this is the only code we needed for the Haiku port.

      2. 4

        I’ve heard that solaris is configured to do so. Allocs will return errors. But I can’t verify it, I don’t have a box at hand.

        1. 9

          Yes, as I recall, in most cases Solaris fully committed virtual memory. Except for wired kernel memory, every page of real memory needed to be backed by something, either a file or swap space. Virtual memory wasn’t necessarily allocated immediately. Certain pages, such as copy-on-write pages, would only actually be allocated when necessary (when they were written by a process). However, virtual memory would be reserved for them in advance so that in such cases the allocation would always succeed. When a process requested memory with malloc() (essentially, sbrk()) or MAP_PRIVATE (that is, a mapping not shared with other processes) virtual memory would be reserved. If the reservation couldn’t be made, the request would fail. The prevailing sentiment was that this was the best approach to making the system reliable. This probably required more swap space than otherwise, but hey, disk space was cheaper than RAM.

          One exception to the “every page is backed” rule was stack space. Process (and later, threads’) stacks were allocated with MAP_NORESERVE which meant that the address space was mapped but that page and its backing store would be allocated on demand. That meant that a function call might touch the next stack page, which might require virtual memory to be allocated, and that the allocation might fail. If the allocation failed, the process would get SIGBUS or SIGSEGV (I forget which). Fortunately, C programs had fairly predictable stack requirements, so this wasn’t really a problem in practice. It still caused a few of us to worry, though.

          There’s some discussion of this here, in the section that describes MAP_NORESERVE:

          https://docs.oracle.com/cd/E88353_01/html/E37841/mmap-2.html

          At the time, I also recall that my colleagues and I were horrified to learn that most of AIX’s allocations were essentially MAP_NORESERVE. That is, you could malloc() a bunch of memory (which might require sbrk()) and the call might succeed; however, the memory wasn’t really there until you touched it. And touching it would allocate memory, and if this allocation failed, your process would get a signal that might kill it. I almost couldn’t believe that this was true, but I just looked it up and it seems to be the case:

          https://www.ibm.com/docs/en/aix/7.3?topic=policies-comparisons-deferred-early-paging-space-allocation

          Interesting to see the rationale include discussion of “efficient usage of disk resources.” Somewhat reminiscent of the arguments of Linux’s OOM Killer.

    4. 17

      The original mailing-list thread started when someone came back to their workstation to find it magically unlocked: while they were gone, the system had run out of memory and the OOM killer had chosen to kill the xlock process!

      If anything, this speaks more how badly X is for modern use cases than anything. There are lots of reasons that the locker can die (not only for OOM), but the fact that this can “unlock” your desktop is the actual absurd part.

      This would be impossible in Wayland, for example.

      1. 18

        If wayland was good, we’d all be using it by now. It has had so, so much time to prove itself.

        My experience with wayland has been:

        • it’s never the default when I install a popular window manager
        • every 5 years I see if I should tweak settings to upgrade, and find out that if I do that it’s going to break some core use case, for example gaming, streaming, or even screenshots for god’s sake.
        • it’s been like 20 years now

        I think the creators of Wayland have done us a disservice, by convincing everyone it is the way forward, while not actually addressing all the use cases swiftly and adequately, leaving us all in window manager limbo for two decades.

        Maybe my opinion will change when I upgrade to Plasma 6. Although, if you search this page for “wayland” you see a lot of bugs…

        1. 18

          it’s never the default when I install a popular window manager

          Your information might be outdated, not only is the default in Plasma 6 and GNOME 46, but they’ve actually worked to allow compiling them with zero Xorg support. I believe a lot of distros are now not only enabling it by default but have express plans to no longer ship Xorg at all outside of xwayland.

        2. 11

          If wayland was good, we’d all be using it by now. It has had so, so much time to prove itself.

          Keep in mind that I gave Wayland as an example in how they should have fixed the issue (e.g.: having a protocol where if the locker fails, the session is just not opened to everyone).

          My experience with Wayland is that it works for my use cases. I can understand the frustration of not working for yours (I had a similar experience 5 years ago, but since switching to Sway 2 years ago it seems finally good enough for me), but this is not a “Wayland is good and X is bad”, it is “X is not designed for modern use cases”.

          1. 8

            Yeah I realize I’m changing the topic. Your point stands.

        3. 5

          This is a thread about OS kernels and memory management. There are lots of us who use Linux but don’t need a desktop environment there. With that in mind, please consider saving the Wayland vs X discussion for another thread.

          1. 2

            Lone nerd tries to stop nerd fight. Gets trampled. News at 11

        4. 4

          If wayland was good, we’d all be using it by now.

          By the same logic, we could argue that:

          • If Linux was any good, we’d all be using it by now.
          • If Dvorak was any good, we’d all be typing on it by now.
          • If the metric system was any good, the US would be using it by now.

          None of the above examples are perfect, I just want to insist that path dependence is a thing. Wayland, being so different than X, introduced many incompatibilities, so it had much inertia to overcome right from the start. People need clear, substantial, and immediate benefits to consider paying even small switching costs, and Wayland’s are pretty significant.

          1. 2

            I think the logic works fine:

            • Linux is ubiquitous.
            • Dvorak isn’t very good (although I personally use it). Extremely low value-add.
            • The metric system is ubiquitous, including in the US. I learned it in school, scientists use it.

            I think Wayland sucked a lot. And it has finally started to be good enough to get people to switch. And I’m mad that it took so long.

            1. 3

              Linux is ubiquitous.

              Except on the desktop. You could argue it’s just one niche among many, but it remains a bloody visible one.

              Dvorak isn’t very good (although I personally use it). Extremely low value-add.

              Hmm, you’re effectively saying that no layout is very good, and all have extremely low value-add… I’m not sure I believe that, even if we ignore chording layouts that let stenotypists type faster than human speech: really, we can’t do significantly better than what was effectively a fairly random layout?

              The metric system is ubiquitous, including in the US. […]

              I call bullshit on this one. Last time I went there it was all about miles and inches and square feet. Scientists may use it, but day to day you’re still stuck with the imperial system, even down to your standard measurements: wires are gauged, your wood is 2 by 4 inches, even your screws use imperial threads.

              Oh, and there was this Mars probe that went crashing down because of an imperial/metric mismatch. I guess they converted everything to metric since then, but just think of what it took to do that even for this small, highly technical niche.


              That being said…

              I think Wayland sucked a lot.

              I can believe it did (I don’t have a first hand opinion on this).

              1. 4

                Just on a detail, I believe it doesn’t matter to most people what their keyboard layout is, and I’ve wasted a lot of time worrying about it. A basically random one like qwerty is just fine. That doesn’t affect your main point though, especially since the example of stenography layouts is a slam dunk. Many people still do transcription using qwerty, and THAT is crazy path-dependence.

              2. 3

                Linux isn’t very good on the desktop, speaking as a Linux desktop user since 2004.

        5. 3

          it’s been like 20 years now

          The display protocol used by a system using the bazaar style of development is not a question of design, but that of community support/network effect. It can be the very best thing ever, if no client supports it.

          Also, the creators of Wayland are the ex-maintainers of X, it’s not like they were not familiar with the problem at hand. You sometime have to break backwards compatibility for good.

        6. 2

          Seems to be happening though? Disclaimer, self reported data.

          The other survey I could find puts Wayland at 8%, but it dates to early 2022.

          1. 5

            Sure, it’s good to be finally happening. My point is if we didn’t have Wayland distracting us, a different effort could have gotten us there faster. It’s always the poorly executed maybe-solution that prevents the ideal solution from being explored. Still, I’m looking forward to upgrading to Plasma 6 within the next year or so.

            1. 4

              Designing Wayland was a massive effort, it wasn’t just the Xorg team going “we got bored of this and now you have to use the new thing”, they worked very closely with DE developers to design something that wouldn’t make the same mistakes Xorg did.

              1. 12

                Meanwhile, Arcan is basically just @crazyloglad and does a far better job of solving the problems with X11 than Wayland ever will.

                1. 7

                  The appeal to effort argument in the parent comment is just https://imgur.com/gallery/many-projects-GWHoJMj which aptly describes the entire thing.

                  Being a little smug, https://www.divergent-desktop.org/blog/2020/10/29/improving-x/ has this little thing:

                  Take the perspective of a client developer chasing after the tumbleweed of ‘protocols’ drifting around and try to answer ‘what am I supposed to implement and use’? To me it looked like like a Picasso painting of ill-fitting- and internally conflicted ideas. Let this continue a few cycles more and X11 will look clean and balanced by comparison. Someone should propose a desktop icon protocol for the sake of it, then again, someone probably already has.

                  I think we are at 4 competing icon protocols now. Mechanism over policy: https://arcan-fe.com/2019/05/07/another-low-level-arcan-client-a-tray-icon-handler/

                  The closing bit:

                  It might even turn out so well that one of these paths will have a fighting chance against the open desktop being further marginalised as a thin client in the Azure clouded future; nothing more than a silhouette behind unwashed Windows, a virtualized ghost of its former self.

                  That battle is quickly being lost.

                  The unabridged story behind Arcan should be written down (and maybe even published) during the coming year or so as the next thematic shift is around the corner. That will cover how it’s just me but also not. A lot of people has indirectly used the thing without ever knowing, which is my preferred strategy for most things.

                  Right now another fellow is on his way from another part of Europe for a hackaton in my fort out in the wilderness.

                2. 3

                  Arcan does look really cool in the demos, and I’d like to try it, but last time I tried to build it I encountered a compilation bug (and submitted a PR to fix it) and I’ve never been able to get any build of it to give me an actually usable DE.

                  I’m sure it’s possible, but last time I tried I gave up before I worked out how.

                  I also wasn’t able to get Wayland to work adequately, but I got further and it was more “this runs but is not very good” instead of “I don’t understand how to build or run this”.

              2. 7

                Maybe being a massive effort is not actually a good sign.

                Arguably Wayland took so long because it decided to fix issues that didn’t need fixing. Did somebody actually care about a rogue program already running on your desktop being able to capture the screen and clipboard?

                edit: I mean, I guess so since they put in the effort. It’s just hard for me to fathom.

                1. 4

                  Was it a massive effort? Its development starting 20 years ago does not equate to a “massive effort,” especially considering that the first 5 years involved a mere handful of people working on it as a hobby. The remainder of the time was spent gaining enough network effect, rather than technical effort.

              3. 3

                Sorry, but this appeal to the effort it took to develop wayland is just embarrassing.

                Vaporware does not mean good. Au contraire, it usually means terrible design by committee, as is the case with wayland.

                Besides, do you know how much effort it took to develop X?

                It’s so tiring to keep reading this genre of comment over and over again, especially given that we have crazyloglad in this community utterly deconstructing it every time.

      2. 3

        This is true but I do think it was also solved in X. Although there were only a few implementations as it required working around X more than using it.

        IIRC GNOME and GDM would coordinate so that when you lock your screen it actually switched back to GDM. This way if anything started or crashed in your session it wouldn’t affect the screen locking. And if GDM crashed it would just restart without granting any access.

        That being said it is much simpler in Wayland where the program just declares itself a screen locker and everything just works.

        1. 3

          Crashing the locker hasn’t been a very good bypass route in some time now (see e.g. xsecurelock, which is more than 10 years old, I think). xlock, the program mentioned in the original mailing list thread, is literally 1980s software.

          X11 screen lockers do have a lot of other problems (e.g. input grabbing) primarily because, unlike Wayland, X11 doesn’t really have a lock protocol, so screen lockers mostly play whack-a-mole with other clients. Technically Wayland doesn’t have one, either, as the session lock protocol is in staging, but I think most Wayland lockers just go with that.

          Unfortunately, last time I looked at it Wayland delegated a lot of responsibilities to third-parties, too. E.g. session lock state is usually maintained by the compositor (or in any case, a lot of ad-hoc solutions developed prior to the current session lock protocol did). Years ago, “resilient” schemes that tried to restart the compositor if it crashed routinely suffered from the opposite problem: crashing the screen locker was fine, but if the OOM reaped the compositor, it got unlocked.

    5. 6

      I once heard that we have mathematicians to thank for the OOM killer as they would try to allocate much more memory than necessary for their matrices and not end up using it. Instead of fixing their code, they successfully lobbied to have allocations not fail. It was successful because the HPC community is very influential.

      We also have mathematicians to thank for the heartbleed bug in OpenSSL, because back in the day mostly cryptography PhD students would implement stuff in OpenSSL.

      Disclaimer: I am a mathematician.

      1. 10

        I have heard (but can also offer no evidence for) a different lore: Memory overcommit was first introduced so that fork would not OOM the system as frequently, especially when forking off from a process with a lot of memory already allocated.

        1. 2

          I thought the only memory allocated by fork was in the process table, and everything else was just shared between the parent and child.

          1. 8

            That is true for modern OS, but the original fork straight up copied the source process entirely before potentially replacing it with exec.

            IIRC COW fork was introduced by Sun in the late 80s or early 90s, then it slowly percolated through other Unixes.

          2. 6

            It’s shared, until one of them writes to that space - it’s copy-on-write. If one process writes to the “shared” space when there’s no free memory, then you’re in an OOM condition outside of malloc.

          3. 3

            Well, yes, depending on your definition of “allocated”. Because modern systems employ copy-on-write mappings for fork.

            But that trick depends on the ability to have more memory mapped than is in your resident set. And it’s precisely this accounting trick - having pages that are allocated to you but that aren’t actually taking up RAM until you “dirty” them by writing to them - that makes both this and malloc overcommit work.

          4. 3

            But think about how that interacts with overcommit. Everything is shared between the parent and child initially, but any page that’s written to by either of them afterwards has to be unshared. Which means that you have two choices:

            1. Charge the child for everything immediately, and have fork fail if there isn’t enough memory to copy everything. You can keep COW as a performance optimization, but you act outwardly as though it wasn’t there. This is reliable, but means that you leave a lot of memory in reserve for something that will likely never happen.
            2. Don’t do that, and accept that every single write to memory might increase some process’s memory usage, and therefore might be an “allocation” (which can fail due to out of memory) without going through malloc or any similar interface that can return an out-of-memory error.
            1. 1

              Yes, fork itself makes overcommit a very reasonable compromise.

              In many modern systems you could probably get away with charging immediately, the “reserve” memory can be used for ephemeral things like disk cache or pages explicitly marked as reclaimable. But you will always have some subset of the userbase that is running everything out of RAM with no meaningful local disk access that could be cached. In these cases the statistics don’t work out and you are wasting lots of resources.

              The other question is if fork is actually a good API. I’m not convinced it is. It does have rare cases where it is useful like snapshotting a process in order to run a GC, checkpoint state or something without blocking the “main” process. But the common case of fork+exec definitely doesn’t want to be charged for this memory that it is about to release anyways.

              1. 4

                I think it’s a pretty bad API, personally, since the vast majority of fork calls are followed up by changing to a completely different process and sharing nothing. I have a feeling someone realized they could solve two problems with one general API and went with it without thinking about the long-term consequences.

          5. 2

            It’s not copied, but ‘commit’ means that you commit to the guarantee that memory is there when you need it. If you write to a copy-on-write page, the OS needs to have a physical page to allocate, or the write will fail. Without overcommit, you need to have a page available for every not-yet-copied view of a CoW page.

      2. 2

        I don’t think that makes much sense. Pretty much all unices overcommit (hence why it’s a lot more difficult to port to windows — which does not), Linux is the only one with an oom killer[0].

        The rest will degrade differently when they run out of physical memory e.g. start frantically swapping and lock up.

        [0] or at least a very trigger happy one with complicated accounting. IIRC freebsd has a last resort one which just kills the largest process but it may not even reliably trigger. And rather than non-trivial scoring, you can protect(1) a process to make it immune and that’s it.

        1. [Comment removed by author]

    6. 5

      In the old days of android, you could exploit something like this by intentionally causing the Lock Screen to OOM to bypass it.

      https://www.oreilly.com/library/view/practical-mobile-forensics/9781788839198/85928bf7-5c5e-40cb-9a9b-5aa4e75d8032.xhtml

      1. 2

        i think this still works for firewalls: your foreground app uses a lot of memory, android kills firewall bc it’s in bg, android falls back to unfiltered connection

      2. 1

        Cellebrite being able to unlock AFU iPhones suggests they also crash the phone lock screen somehow.

    7. 3

      This has always struck me as the most absurd thing about Linux.

      1. 7

        I had an interesting experience(? bug? feature?) in Windows 7 one time where it somehow ran out of available memory and notified me to close something. I (stupidly) went to open Task Manager to investigate and maybe find something to close, but then trying to open Task Manager (predictably) caused the system to completely run out of memory, in which case it simply froze. No BSOD, just my desktop becoming a still life installation. Still not sure how it happened or if that was what the OS was supposed to do, never happened again since, on any OS for me. So I’m really not sure if Linux should have an OOM Killer or if it should just do that.

      2. 1

        This has always struck me as the most absurd thing about Linux.

        Note (in case you don’t already) that overcommit can be turned off if you want.

        I don’t feel much difference between having it on or off; I have seen Chrome seem unusually limited in how many tabs it can have open with overcommit off, but that might be because I neglected to raise vm.overcommit_ratio from the default 50.

    8. 3

      All the stories I hear about the OOM Killer are as amusing as they are horrifying. Ironically, out of the innumerable problems I had in my six years of exclusively using Linux as my daily driver before it sent me running for the hills back to Windows (this was ~2014, obviously a lot has change since then), OOM Killing was never one of those problems. At least I hope that if/when I do ever go back to Linux, my 128GB RAM workstation will placate the OOM Killer enough to leave me alone.

    9. 3

      Previously, on Lobsters (not the same thing, but related): https://lobste.rs/s/lj96re/linux_memory_overcommit_2007

    10. 2

      I think overcommit is fine and people wouldn’t be too annoyed about the OOM killer if it actually worked reliably enough and if the OS would send events to ask processes to cut their memory use that we could all respond to.

      My experience of low-memory situations on Linux is that it just thrashes forever and the OOM killer never gets involved.

      For desktop use cases I would like the system to present an interactive OOM killer if the system would otherwise thrash, and then I can just pick which processes to kill.

      1. 2

        Thrashing is probably due to a swap partition?

        1. 2

          I once saw a low-memory condition on Linux servers where the OOM killer was not triggered but the system was spending a huge percentage of its cycles in the kernel, apparently shuffling around the few remaining available pages. I don’t think it was swap-related, more like the file cache not having enough room to work, or maybe virtual mapping maintenance/defrag. It was reminiscent of a mark-sweep GC system running low on memory and spending all its time GCing (an even older memory!). This was like ten years ago and I don’t remember the details, but at the time I dug around the stack traces to figure this out.

          1. 1

            I think I’ve been looking into something very similar recently! (And indeed, no swap at all — just a system becoming totally unresponsive, load averages through the roof etc. as memory usage hit a very high watermark, but not actually triggering the OOM killer yet.)

            1. 1

              The telltale sign was a very high sys percentage on htop (or your favorite monitor). If you can give it a little swap space, see if that relieves the problem. It’s good to give a system some swap even if it should all fit in RAM, because it gives the kernel a chance to swap out pages that aren’t in the working set and use that RAM for something more useful, like file buffers or cache.

        2. 1

          My recollection is that this happened both when I did and did not have swap, but either way is a common config that shouldn’t be broken.