The "too small to fail" memory-allocation rule

Posted Dec 25, 2014 19:17 UTC (Thu) by quotemstr (subscriber, #45331)
In reply to: The "too small to fail" memory-allocation rule by Cyberax
Parent article: The "too small to fail" memory-allocation rule

Your comment is incorrect according to a conventional understanding of "overcommit". The misunderstanding probably stems from a general confusion even among very senior Linux developers between allocating address space and allocating memory. In Linux, programs call mmap and expect that operations on the memory returned will never[2] fail. There is no distinction in practice [1] between setting aside a N pages of virtual addresses and reserving storage for N distinct pages of information.

In NT, the operations are separate. A program can set aside N pages of its address space, but it's only when it commits some of those pages that the kernel guarantees that it will be able to provide M, M<=N, distinct pages of information. After a commit operation succeeds, the system guarantees that it will be able to provide the requested pages. There is no OOM killer because the kernel can never work itself into a position where one might be necessary. While it's true that a process can reserve more address space than memory exists on the system, in order to use that memory, it must first make a commit system call, and *that call* can fail. That's not overcommit. That's sane, strict accounting. Your second point is based on a misunderstanding.

Your first point is also based on a misunderstanding. If two processes have mapped writable sections of a file, these mappings are either shared or private (and copy-on-write). Shared mappings do not incur a commit charge overhead because the pages in file-backed shared mappings are backed by the mapped files themselves. Private mappings are copy-on-write, but the entire commit charge for a copy-on-write mapping is assessed *at the time the mapping is created*, and operations that create file mappings can fail. Once they succeed, COW mappings are as committed as any other private, pagefile-backed mapping of equal size. Again, no overcommit. Just regular strict accounting.

From MSDN:

"When copy-on-write access is specified, the system and process commit charge taken is for the entire view because the calling process can potentially write to every page in the view, making all pages private. The contents of the new page are never written back to the original file and are lost when the view is unmapped."

The key problem with the Linux scheme is that, in NT terms, all mappings are SEC_RESERVE and are made SEC_COMMIT lazily on first access, and the penalty for a failed commit operation is sudden death of your process or a randomly chosen other process on the system. IMHO, Linux gets all this tragically wrong, and NT gets it right.

[1] Yes MAP_NORESERVE exists. Few people use it. Why bother? It's broken anyway, especially with overcommit off, when you most care about MAP_NORESERVE in the first place!

[2] Sure, the OOM killer might run in response to a page fault, but the result will either be the death of some *other* process or the death of the process performing the page fault. Either way, that process never observes a failure, in the latter case because it's dead already. Let's ignore file-backed memory on volatile media too.