The "too small to fail" memory-allocation rule
Ignore previous instructions; subscribe to LWN todayKernel developers have long been told that, with few exceptions, attempts to allocate memory can fail if the system does not have sufficient resources. As a result, in well-written code, every call to a function like kmalloc(), vmalloc(), or __get_free_pages() is accompanied by carefully thought-out error-handling code. It turns out, though, the behavior actually implemented in the memory-management subsystem is a bit different from what is written in the brochure. That difference can lead to unfortunate run-time behavior, but the fix might just be worse.Every article on LWN.net is written by humans, for humans. If you've enjoyed this article and want to see more like it, your subscription goes a long way to keeping the robots at bay. We are offering a free one-month trial subscription (no credit card required) to get you started.
A discussion on the topic began when Tetsuo Handa posted a question on how to handle a particular problem that had come up. The sequence of events was something like this:
- A process that is currently using relatively little memory invokes
an XFS filesystem operation that, in turn, needs to perform an
allocation to proceed.
- The memory management subsystem tries to satisfy the allocation, but
finds that there is no memory available. It responds by first trying
direct reclaim (forcing pages out of memory to free them), then, if
that doesn't produce the needed free memory, it falls back to the
out-of-memory (OOM) killer.
- The OOM killer picks its victim and attempts to kill it.
- To be able to exit, the victim must perform some operations on the same XFS filesystem. That involves acquiring locks that, as it happens, the process attempting to perform the problematic memory allocation is currently holding. Everything comes to a halt.
In other words, the allocating process cannot proceed because it is waiting for its allocation call to return. That call cannot return until memory is freed, which requires the victim process to exit. The OOM killer will also wait for the victim to exit before (possibly) choosing a second process to kill. But the victim process cannot exit because it needs locks held by the allocating process. The system locks up and the owner of the system starts to seriously consider a switch to some version of BSD.
When asked about this problem, XFS maintainer Dave Chinner quickly wondered why the memory-management code was resorting to the OOM killer rather than just failing the problematic memory allocation. The XFS code, he said, is nicely prepared to deal with an allocation failure; to him, using that code seems better than killing random processes and locking up the system as a whole. That is when memory management maintainer Michal Hocko dropped a bomb by saying:
The resulting explosion could be heard in Dave's incredulous reply:
Lots of code has dependencies on memory allocation making progress or failing for the system to work in low memory situations. The page cache is one of them, which means all filesystems have that dependency. We don't explicitly ask memory allocations to fail, we *expect* the memory allocation failures will occur in low memory conditions. We've been designing and writing code with this in mind for the past 15 years.
A "too small to fail" allocation is, in most kernels, one of eight contiguous pages or less — relatively big, in other words. Nobody really knows when the rule that these allocations could not fail went into the kernel; it predates the Git era. As Johannes Weiner explained, the idea was that, if such small allocations could not be satisfied, the system was going to be so unusable that there was no practical alternative to invoking the OOM killer. That may be the case, but locking up the system in a situation where the kernel is prepared to cope with an allocation failure also leads to a situation where things are unusable.
One alternative that was mentioned in the discussion was to add the __GFP_NORETRY flag to specific allocation requests. That flag causes even small allocation requests to fail if the resources are not available. But, as Dave noted, trying to fix potentially deadlocking requests with __GFP_NORETRY is a game of Whack-A-Mole; there are always more moles, and they tend to win in the end.
The alternative would be to get rid of the "too small to fail" rule and
make the allocation functions work the way most kernel developers expect
them to. Johannes's message included a patch moving things in that
direction; it causes the endless reclaim loop to exit (and fail an
allocation request) if attempts at direct reclaim do not succeed in
actually freeing any memory. But, as he put it, "the thought of
failing order-0 allocations after such a long time is scary.
"
It is scary for a couple of reasons. One is that not all kernel developers are diligent about checking every memory allocation and thinking about a proper recovery path. But it is worse than that: since small allocations do not fail, almost none of the thousands of error-recovery paths in the kernel now are ever exercised. They could be tested if developers were to make use of the the kernel's fault injection framework, but, in practice, it seems that few developers do so. So those error-recovery paths are not just unused and subject to bit rot; chances are that a discouragingly large portion of them have never been tested in the first place.
If the unwritten "too small to fail" rule were to be repealed, all of those error-recovery paths would become live code for the first time. In a sense, the kernel would gain thousands of lines of untested code that only run in rare circumstances where things are already going wrong. There can be no doubt that a number of obscure bugs and potential security problems would result.
That leaves memory-management developers in a bit of a bind. Causing
memory allocation functions to behave as advertised seems certain to
introduce difficult-to-debug problems into the kernel. But the status quo
has downsides of its own, and they could get worse as kernel locking
becomes more complicated. It also wastes the considerable development time
that goes toward the creation of error-recovery code that will never be
executed. Even so, introducing low-order memory-allocation failures at
this late date may well prove too scary to be attempted, even if the
long-term result would be a better kernel.
Index entries for this article | |
---|---|
Kernel | Memory management/Page allocator |
Posted Dec 23, 2014 22:44 UTC (Tue)
by epa (subscriber, #39769)
[Link]
I can understand running the OOM killer to free a few hundred megs, but if the system is so short of memory that even a 20kbyte request wasn't satisfied, breaking out the OOM machinery sounds like it will make things worse instead of better.
Posted Dec 23, 2014 22:55 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (13 responses)
> We went to lunch afterward, and I remarked to Dennis that easily half the code I was writing in Multics was error recovery code. He said, "We left all that stuff out. If there's an error, we have this routine called panic, and when it is called, the machine crashes, and you holler down the hall, 'Hey, reboot it.'"
Posted Dec 24, 2014 7:39 UTC (Wed)
by jezuch (subscriber, #52988)
[Link] (3 responses)
And now in Linux we have both! :)
Posted Dec 24, 2014 21:07 UTC (Wed)
by agrover (guest, #55381)
[Link] (1 responses)
If there are bugs that are "too scary" to contemplate fixing the right way, then we are all in BIG trouble.
Posted Dec 25, 2014 22:28 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted Dec 27, 2014 0:16 UTC (Sat)
by reubenhwk (guest, #75803)
[Link]
If only we didn't have all that code we may be able to satisfy that small allocation request.
Posted Dec 26, 2014 18:24 UTC (Fri)
by rwmj (subscriber, #5474)
[Link] (8 responses)
Posted Dec 29, 2014 14:09 UTC (Mon)
by epa (subscriber, #39769)
[Link] (7 responses)
Posted Dec 29, 2014 21:58 UTC (Mon)
by rwmj (subscriber, #5474)
[Link] (2 responses)
Posted Dec 30, 2014 10:31 UTC (Tue)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Jan 23, 2015 9:03 UTC (Fri)
by Frej (guest, #4165)
[Link]
Posted Dec 30, 2014 15:59 UTC (Tue)
by rgmoore (✭ supporter ✭, #75)
[Link] (3 responses)
OTOH, Android expects its apps to be able to handle crashes cleanly. When it needs to free up memory, it just kills something in the background, and it's up to the app not to have problems from that. Seamless recovery from being killed is mandatory.
Posted Dec 30, 2014 16:40 UTC (Tue)
by epa (subscriber, #39769)
[Link]
That doesn't really mean you can do without error handling code in the kernel, though. It's great if your filesystem doesn't get horribly corrupted when the machine crashes, but still the crash is not appreciated by the user. Yes, if you are running a farm of several machines then you can fail over to another and the service stays up; that doesn't really work as a remedy for your laptop locking up, unless you happen to carry around a redundant laptop with you at all times.
And in the case of Android, the apps are killed and restarted, but it would not be acceptable for the kernel itself to just panic on any error condition and require restarting the phone. Which is what we are talking about here: *kernel* error recovery.
Posted Dec 30, 2014 20:47 UTC (Tue)
by ldo (guest, #40946)
[Link] (1 responses)
Not quite. The framework will always explicitly call onDestroy() before killing your Activity. If this is happening because the system is running low on resources, not because of direct user action, then onSaveInstanceState() will be called before that so you can save whatever is necessary of the state of your UI so, when the user returns to your app, it can be transparently restarted to make it look like it never stopped.
Posted Dec 30, 2014 21:41 UTC (Tue)
by cesarb (subscriber, #6266)
[Link]
Posted Dec 23, 2014 22:56 UTC (Tue)
by flussence (guest, #85566)
[Link] (14 responses)
Posted Dec 23, 2014 23:07 UTC (Tue)
by corbet (editor, #1)
[Link] (1 responses)
Posted Dec 24, 2014 3:45 UTC (Wed)
by flussence (guest, #85566)
[Link]
And I don't mean to sound negative either. If there's any conclusion to be made here, it's that XFS is really good at provoking improvements everyone can benefit from.
Posted Dec 23, 2014 23:20 UTC (Tue)
by xorbe (guest, #3165)
[Link] (7 responses)
Posted Dec 24, 2014 10:40 UTC (Wed)
by yoe (guest, #25743)
[Link] (6 responses)
I personally run ZFS on a system with 'only' 2G of ram and a single hard disk. It's not as fast as ZFS can be, but it runs perfectly well, with room to spare in its RAM for plenty of other stuff.
Posted Dec 25, 2014 18:38 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Posted Dec 25, 2014 22:33 UTC (Thu)
by yoe (guest, #25743)
[Link]
I suppose that's a step backwards if what you want is "mediocre cache performance, but similar performance for *all* file systems". That's not what ZFS is about, though; ZFS wants to provide excellent performance at all costs. That does mean it's not the best choice for all workloads, but it does beat the pants off of most other solutions in the workloads that it was meant for.
Posted Dec 26, 2014 16:11 UTC (Fri)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Dec 27, 2014 13:09 UTC (Sat)
by yoe (guest, #25743)
[Link]
The article was about XFS, but the comment that I replied to was very much about ZFS, not XFS.
Posted Jan 6, 2015 20:40 UTC (Tue)
by Lennie (subscriber, #49641)
[Link] (1 responses)
Posted Feb 28, 2015 13:59 UTC (Sat)
by yoe (guest, #25743)
[Link]
Posted Dec 23, 2014 23:22 UTC (Tue)
by cesarb (subscriber, #6266)
[Link]
I'd guess it's a side effect of XFS's focus on performance and code quality. As a high-performance filesystem, it puts more stress on the VM subsystem. And as for code quality, there's a reason why the filesystem testing tool every filesystem uses is called "xfstests".
Posted Dec 23, 2014 23:25 UTC (Tue)
by alan (subscriber, #4018)
[Link]
Posted Dec 24, 2014 5:13 UTC (Wed)
by zblaxell (subscriber, #26385)
[Link]
If you're comparing XFS to vfat or ext3, remember that those are really simple filesystems. Almost everything is in fixed-size arrays or bitmaps, and disk blocks have a 1:1 relationship with page cache. Not much interesting for MM to do.
Comparing XFS to reiserfs, NTFS, ZFS: these filesystems have similar complexity, but they're not developed any more, or their best implementations are not in the Linux kernel proper, so their developers don't have newsworthy interactions with MM developers.
Comparing XFS to btrfs: btrfs has MM bugs. but it also has lots of other kinds of bugs, so the MM bugs don't stand out as much. ;)
If you count up lines of code and commits in the last two years, XFS is one of the most actively developed filesystems in Linux, second only to btrfs among non-network filesystems by those crude metrics.
Posted Dec 26, 2014 22:56 UTC (Fri)
by dgc (subscriber, #6611)
[Link]
To do this, we have to have some idea of how the MM subsystem works, how it is architected, the rules we have to play under, the capabilities and deficiencies it has, etc. Just because we work on filesystems doesn't mean we only understand filesystems...
Further, most XFS developers come from a strong engineering background where we are taught to understand and solve the root cause of whatever issue is occurring, not just slap a band-aid over the symptom being reported. Hence you see us pointing out warts in other parts of the system that haven't been well thought out, work around some issue or are just plain broken.
This thread was a prime example - "XFS deadlocked under low memory conditions" was the bug report, but a little bit of investigation of the report points out the underlying cause of the hang is not a filesystem problem at all...
-Dave.
Posted Dec 23, 2014 23:15 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (2 responses)
If I understood it correctly, the problem is a "recursive locking" deadlock: the filesystem does a memory allocation under a lock, which waits for a process to exit, which calls into the filesystem, which needs the same lock again.
Doesn't the kernel already have a way of saying "I'm the filesystem, do not call the filesystem to free memory", that is, GFP_NOFS (and the related GFP_NOIO)? Couldn't the meaning of that flag be extended to also mean "don't wait for the OOM killer even for small allocations", since filesystem and memory management code have a higher probability of having a good quality error recovery code?
Posted Dec 24, 2014 3:13 UTC (Wed)
by neilbrown (subscriber, #359)
[Link]
The allocation is question is __GFP_NOWAIT which is equivalent to GFP_NOIO.
Invoking the OOM killer is reasonable, waiting a little while is reasonable, but waiting for the killed processes to flush data or close files is not..... Hmmm. The out_of_memory() function chooses a processes, kills it, then waits one second ("schedule_timeout_killable(1);"), so it seems to be behaving as I would expect.
Reading more of the email threads it seems that TIF_MEMDIE is an important part of the problem. This is set on a thread when it has been chosen to die. As long as any thread has this set, "select_bad_process()" will not select anything else so nothing else gets killed.
So the "real" problem seems to be that if a process with TIF_MEMDIE set enters filesystem (or IO) code and blocks, a GFP_NOIO allocation can be made to wait for that filesystem code to complete - which could deadlock.
Looking at the original patch which started this: http://marc.info/?l=linux-mm&m=141839249819519&w=2 the problem seems to involve a process with TIF_MEMDIE set, which has already released its memory, but is now stuck in the filesystem, probably closing some files.
As it has TIF_MEMDIE set, nothing else will be killed. So no memory will become available, so it will stay stuck in the filesystem...
This deadlock could possibly be avoided by having oom_scan_process_thread() not abort the scan if TIF_MEMDIE is set on a process which has already dropped its memory. Then another process can be killed even if the first one blocked
But that analysis was very hasty and probably misses something important. "Here be dragons" is how I would mark this code on a map of Linux.
Posted Jan 15, 2015 19:40 UTC (Thu)
by ksandstr (guest, #60862)
[Link]
Second, that the allocator enters the OOM killer on its own volition, changing malloc's locking behaviour from "only ever the heap lock" to "everything an asynchronous OOM killer might end up with". That might well cover all of the kernel, when the caller is expecting malloc to be atomic by itself.
What's surprising isn't so much the reams of untested malloc-failure handling code, but that this bug comes up as late as 2014.
Posted Dec 23, 2014 23:41 UTC (Tue)
by xorbe (guest, #3165)
[Link] (2 responses)
Posted Dec 24, 2014 0:54 UTC (Wed)
by cwillu (guest, #67268)
[Link] (1 responses)
Posted Dec 24, 2014 7:35 UTC (Wed)
by epa (subscriber, #39769)
[Link]
Posted Dec 24, 2014 4:35 UTC (Wed)
by zblaxell (subscriber, #26385)
[Link] (24 responses)
I wonder this too, and have done so for as long as the OOM killer has existed. It seems that instead of returning ENOMEM or calling panic(), we've decided that inflicting a Chaos Monkey on userspace (and adding a bunch of latency for the allocating process) is somehow the best of all feasible alternatives.
Posted Dec 24, 2014 7:53 UTC (Wed)
by ibukanov (subscriber, #3942)
[Link] (22 responses)
For example, Linux allows to fork a process even if its image has more then halve the memory on the system. The assumption is that the child process most likely would either use exec soon or parent/child would use the allocated memory in read-only way. Now consider what should happen when the processes start to write to the memory. That may lead to OOM errors. Now, who should be blamed for it? The child or parent? In addition, applications are typically not prepared to deal with a situation when an arbitrary write may trigger OOM. So OOM killer is pretty reasonable solution.
Compare that with Windows where overcommit is not available (lack of fork in Windows API allows for that). Immediately memory accounting becomes simple and OS can guarantee that if a process has a pointer to writable memory, then memory is always available for writing.
Posted Dec 24, 2014 9:23 UTC (Wed)
by epa (subscriber, #39769)
[Link] (10 responses)
Essentially, when you fork() the kernel has no way of knowing that you are just about to call exec() immediately afterwards, so it has to either reserve space for a complete copy of all your process's memory, or end up overcommitting and resorting to an OOM killer if it guessed wrong. If userland can give the kernel a bit more information then the kernel can do its job better.
Posted Dec 24, 2014 9:35 UTC (Wed)
by roc (subscriber, #30627)
[Link] (9 responses)
Also, copy-on-write fork()ing has other valuable uses. In rr we use it to create checkpoints very efficiently.
Posted Dec 24, 2014 9:54 UTC (Wed)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Dec 25, 2014 18:44 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link]
vfork does what you want. The "child" shares memory with the parent until it calls exec, so you avoid not only the commit charge catastrophe of copy-on-write fork, but also gain a significant performance boost from not having to copy the page tables.
Posted Dec 25, 2014 22:18 UTC (Thu)
by ldo (guest, #40946)
[Link] (1 responses)
Posted Dec 26, 2014 7:33 UTC (Fri)
by epa (subscriber, #39769)
[Link]
vfork() doesn't make a copy of the address space and so doesn't require either over-caution or over-committing. But it has other limitations.
Posted Dec 24, 2014 10:18 UTC (Wed)
by ibukanov (subscriber, #3942)
[Link] (4 responses)
To customize the child before exec one does not need full fork, vfork is enough and does not require overcommit.
Posted Dec 24, 2014 11:34 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link] (3 responses)
Posted Dec 24, 2014 12:31 UTC (Wed)
by ibukanov (subscriber, #3942)
[Link] (1 responses)
Posted Dec 30, 2014 0:04 UTC (Tue)
by klossner (subscriber, #30046)
[Link]
Posted Dec 24, 2014 14:27 UTC (Wed)
by justincormack (subscriber, #70439)
[Link]
Posted Dec 24, 2014 14:12 UTC (Wed)
by patrakov (subscriber, #97174)
[Link]
"overcommit is not available (lack of fork in Windows API allows for that)"
It is not the lack of fork() that makes overcommit "unneeded", but an explicit decision. Windows MapViewOfFile() API allows creation of copy-on-write mappings (which are usually the basis for overcommit) through the FILE_MAP_COPY flag. Although, as documented, Windows does always reserve the whole size of the memory region in the pagefile.
See http://msdn.microsoft.com/en-us/library/windows/desktop/a...
Posted Dec 24, 2014 15:33 UTC (Wed)
by dumain (subscriber, #82016)
[Link] (6 responses)
Posted Dec 25, 2014 18:50 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (5 responses)
MAP_NORESERVE being ignored when overcommit=2 isn't a "gotcha". It's fundamental brokenness.
[1] https://www.kernel.org/doc/Documentation/vm/overcommit-ac...
Posted Dec 25, 2014 19:04 UTC (Thu)
by fw (subscriber, #26023)
[Link] (4 responses)
MAP_NORESERVE is not named appropriately, but it does what it does. The problem is that programmers do not know about overcommit mode 2, do not test with it, and hence never realize the need for the PROT_NONE approach.
Posted Dec 25, 2014 19:58 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (3 responses)
Sadly, the commit charge implications of MAP_NORESERVE are documented but silently broken, but the commit charge implications of PROT_NONE are undocumented and in theory mutable in future releases. All this gives mmap(2) a Rusty Russel score of around -4.
Posted Dec 25, 2014 20:14 UTC (Thu)
by fw (subscriber, #26023)
[Link] (2 responses)
Posted Dec 25, 2014 20:23 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
There's just tremendous confusion over exactly Linux does with commit charge even among people well-versed in memory management fundamentals. There's no clarity in the API either. MAP_NORESERVE is dangerous because it's the only publicly-documented knob for managing commit charge and it's a broken knob nobody is interested in fixing.
(And no, SIGSEGV is not something that will discourage use. Those who know what they're doing can write code that recovers perfectly safely from SIGSEGV.)
Posted Apr 18, 2019 6:36 UTC (Thu)
by thestinger (guest, #91827)
[Link]
The linux-man-pages project is not the official kernel documentation and has often been inaccurate about important things for many years. MAP_NORESERVE doesn't work as it describes at all.
It has no impact in the full overcommit mode or the memory accounting mode, which makes sense. It only has an impact in the heuristic overcommit mode, where it disables the heuristic for causing immediate failure.
Note how a read-only (or PROT_NONE) mapping with not data has no accounting cost:
For an anonymous or /dev/zero map
Posted Dec 24, 2014 22:24 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
1) Map a file with copy-on-write semantics from several processes. Windows does this just fine with the MapViewOfFile function.
2) Reserve a large amount of virtual RAM but don't actually allocate physical pages for it. It's also supported with MEM_RESERVE flag for the VirtualAllocEx function ( http://msdn.microsoft.com/en-us/library/windows/desktop/a... )
The main distinction is that Windows programs by default try to avoid OOM situations.
Posted Dec 24, 2014 23:32 UTC (Wed)
by ibukanov (subscriber, #3942)
[Link]
On Windows after MEM_RESERVE the memory is not available. One has to explicitly commit it using MEM_COMMIT. As for copy-on-write for mmap files Windows explicitly reserves the allocated memory in advance. In both cases Windows ensure as long as a program has a pointer to a writable memory, that memory is always available. That is, on Windows OOM failure can only happen during allocation calls, not during an arbitrary write in a program as it happens on Linux
Posted Dec 25, 2014 19:17 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link]
In NT, the operations are separate. A program can set aside N pages of its address space, but it's only when it commits some of those pages that the kernel guarantees that it will be able to provide M, M<=N, distinct pages of information. After a commit operation succeeds, the system guarantees that it will be able to provide the requested pages. There is no OOM killer because the kernel can never work itself into a position where one might be necessary. While it's true that a process can reserve more address space than memory exists on the system, in order to use that memory, it must first make a commit system call, and *that call* can fail. That's not overcommit. That's sane, strict accounting. Your second point is based on a misunderstanding.
Your first point is also based on a misunderstanding. If two processes have mapped writable sections of a file, these mappings are either shared or private (and copy-on-write). Shared mappings do not incur a commit charge overhead because the pages in file-backed shared mappings are backed by the mapped files themselves. Private mappings are copy-on-write, but the entire commit charge for a copy-on-write mapping is assessed *at the time the mapping is created*, and operations that create file mappings can fail. Once they succeed, COW mappings are as committed as any other private, pagefile-backed mapping of equal size. Again, no overcommit. Just regular strict accounting.
From MSDN:
"When copy-on-write access is specified, the system and process commit charge taken is for the entire view because the calling process can potentially write to every page in the view, making all pages private. The contents of the new page are never written back to the original file and are lost when the view is unmapped."
The key problem with the Linux scheme is that, in NT terms, all mappings are SEC_RESERVE and are made SEC_COMMIT lazily on first access, and the penalty for a failed commit operation is sudden death of your process or a randomly chosen other process on the system. IMHO, Linux gets all this tragically wrong, and NT gets it right.
[1] Yes MAP_NORESERVE exists. Few people use it. Why bother? It's broken anyway, especially with overcommit off, when you most care about MAP_NORESERVE in the first place!
[2] Sure, the OOM killer might run in response to a page fault, but the result will either be the death of some *other* process or the death of the process performing the page fault. Either way, that process never observes a failure, in the latter case because it's dead already. Let's ignore file-backed memory on volatile media too.
Posted Dec 24, 2014 8:01 UTC (Wed)
by jezuch (subscriber, #52988)
[Link]
That's all about promises. When you call malloc(), and it returns a success, the kernel promises that the memory will be available. But then there's overcommit, which means that the kernel might be making empty promises - "don't worry, I know things are tight right now, but trust me, I *will* have the money^Hmemory". And when things are really tight, it scrambles and goes to the pawn shop to fence some memory of some poor innocent process. Because at this point there's no other way - malloc() was reported as success and the process is trying to use this supposedly allocated memory by simply accessing it. But it's not there.
But it looks like this particular case is a different thing. There's no overcommit for the kernel allocations, right?
Posted Dec 24, 2014 7:15 UTC (Wed)
by ibukanov (subscriber, #3942)
[Link] (1 responses)
What helped on a memory-constrained system was to preallocate a buffer and monitor available memory. When it became low, the buffer was dropped with a global flag set indicating for the application to switch to a recovery mode. In that mode most of normal activities lead to errors. The application remained in it until enough memory was freed to be able to allocate the buffer back.
Posted Dec 25, 2014 19:07 UTC (Thu)
by fw (subscriber, #26023)
[Link]
Posted Dec 25, 2014 21:19 UTC (Thu)
by ldo (guest, #40946)
[Link] (117 responses)
I have written a lot of code according to the following paradigm:
This kind of thing makes it easy to convince yourself, by visual inspection, that the allocated storage will be freed once, and only once, no matter what control path is taken. A key simplifying point is that the free(3) call is idempotent: freeing a NULL pointer is a no-op.
If you want to see a more elaborate example, including loops and nested do-once-within-do-once, have a look at this spuhelper.c Python extension module.
Posted Dec 25, 2014 21:34 UTC (Thu)
by cesarb (subscriber, #6266)
[Link] (66 responses)
Wouldn't it be simpler and more readable to use a goto instead of a pseudo-loop?
(The Linux kernel does use the "set pointers to NULL and on error free them if they are not NULL" paradigm, of course using a "goto error" instead of a pseudo-loop, but the error-recovery paths being discussed are not merely "freeing locally allocated memory"; they can also have things like "unlocking a mutex" or "updating the filesystem block allocation structures to mark as free the blocks which had been reserved for this task".)
Posted Dec 25, 2014 22:08 UTC (Thu)
by ldo (guest, #40946)
[Link] (65 responses)
Posted Dec 25, 2014 22:23 UTC (Thu)
by mchapman (subscriber, #66589)
[Link] (27 responses)
There would be precisely as many "goto" statements as you have "break" statements. It's much of a muchness, in my opinion.
Posted Dec 25, 2014 23:07 UTC (Thu)
by ldo (guest, #40946)
[Link] (26 responses)
And you would need a different label for each. And the extra visual burden of ensuring that each goto is paired with the right label. Which is unnecessary with a break, since that can only ever terminate one enclosing construct.
And the further burden of getting things wrong when you try refactoring the code.
Like I said: NO GOTOs!
Posted Dec 26, 2014 0:01 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
So, YES to gotos for error handling!
Posted Dec 26, 2014 0:37 UTC (Fri)
by ldo (guest, #40946)
[Link]
Posted Dec 26, 2014 7:38 UTC (Fri)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Dec 26, 2014 7:43 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jan 16, 2015 5:10 UTC (Fri)
by CameronNemo (guest, #94700)
[Link]
Posted Jan 16, 2015 18:23 UTC (Fri)
by zlynx (guest, #2285)
[Link]
That almost always made the code cleaner, and let me exit the block (which was now a function) with a simple "return."
Posted Dec 26, 2014 7:34 UTC (Fri)
by WolfWings (subscriber, #56790)
[Link] (19 responses)
Um... since when? You can use a single label for any number of goto's, so each label is specifically describing it's use case (freeAndExit perhaps, or whatever semantic name you want to give it) and there's no more/less lines than your do{}while(false); construct but copy-pasting code inside/outside the strcture can't break it as easily. Just saying you're wrong about the claim of needing a billion labels, goto is many-to-one not one-to-one in that regard.
Posted Dec 26, 2014 21:15 UTC (Fri)
by ldo (guest, #40946)
[Link] (17 responses)
Posted Dec 27, 2014 0:47 UTC (Sat)
by reubenhwk (guest, #75803)
[Link] (7 responses)
The deeply nested allocations example *should* be solved by factoring out local/static helper functions...In other words: if your code is too complicated to use GOTO's properly for cleanup, then your code is too complicated.
Posted Dec 27, 2014 20:42 UTC (Sat)
by ldo (guest, #40946)
[Link] (6 responses)
Posted Dec 27, 2014 20:56 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
What else do you want?
Posted Dec 27, 2014 21:33 UTC (Sat)
by ldo (guest, #40946)
[Link]
Posted Jan 1, 2015 0:27 UTC (Thu)
by reubenhwk (guest, #75803)
[Link] (3 responses)
Posted Jan 1, 2015 4:23 UTC (Thu)
by flussence (guest, #85566)
[Link] (1 responses)
Posted Jan 1, 2015 4:40 UTC (Thu)
by reubenhwk (guest, #75803)
[Link]
Posted May 23, 2017 13:10 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link]
Posted Dec 27, 2014 17:44 UTC (Sat)
by itvirta (guest, #49997)
[Link] (8 responses)
> for(...) {
I'm not sure I'd agree that breaking out of a loop just to test the same error condition,
Like someone mentioned, some other programming languages have dedicated constructs
(The thing with absolute rules is, that they absolutely forbid one from thinking. That usually
> Fine. Try reworking the example I gave above, then. That has allocations nested up to three levels
Reducing the nesting level and using a less space-hungry indenting style would be what I'd start with,
Posted Dec 27, 2014 20:46 UTC (Sat)
by ldo (guest, #40946)
[Link] (7 responses)
Posted Dec 27, 2014 20:57 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Or here: https://github.com/ldo/dvd_menu_animator/blob/master/spuh... ?
I actually don't see complicated cleanups anywhere in your code. Posted Dec 28, 2014 3:27 UTC (Sun)
by reubenhwk (guest, #75803)
[Link] (4 responses)
Better yet, use a function for each of those nested levels. Worried about spaghetti code? Then factor out the nesting and call smaller functions with less looping and less nested resource management. Use the return value to propagate error conditions onward.
Posted Dec 28, 2014 6:34 UTC (Sun)
by ldo (guest, #40946)
[Link] (3 responses)
Posted Dec 29, 2014 2:58 UTC (Mon)
by reubenhwk (guest, #75803)
[Link]
Posted Dec 29, 2014 3:19 UTC (Mon)
by bronson (subscriber, #4806)
[Link] (1 responses)
Posted Dec 29, 2014 17:30 UTC (Mon)
by nix (subscriber, #2304)
[Link]
The argument from popularity is not a good one, but if something is not popular it is not terribly wise to imply that in fact it is.
Posted May 23, 2017 13:09 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link]
Avoiding GOTO at all cost is overreacting. The “rule” is probably good to teach to novice programmers, but experts are beyond it. Error handling is *the* prime example in favour of GOTO, although I agree there may be others.
Posted Dec 25, 2014 22:36 UTC (Thu)
by vonbrand (guest, #4458)
[Link] (7 responses)
Please, no knee-jerk "no goto ever" reactions, go read Knuth's "Structured programming with goto statements", ACM Computing Surveys 6(4), dec 1974, pp 261-301. Check how goto is used in the kernel, write up gotoless code doing the same. You'll find that the gotoless code is easier to understand and often significantly more efficient.
Posted Dec 25, 2014 23:08 UTC (Thu)
by ldo (guest, #40946)
[Link] (6 responses)
Posted Dec 30, 2014 2:10 UTC (Tue)
by filipjoelsson (guest, #2622)
[Link] (2 responses)
I thought it would be a fun exercise to rewrite your code with the exact same structure that you used. But instead of the You do realize that your construct will optimize to exactly the same code, when compiled, right? So, that all you do is add empty words and pointless levels of indent, that may or may not help compilers and other programmers understand what you mean? So I took a dive into your code, stripped out all the comments and newlines, ran it through indent, and tried again. And I must say that I believe that Why the f*ck do you for instance do things like this? Breaking expectations like that do not anything of value, and makes sure nobody else will want to bother helping out on your project. You also have nuggets like: Which I would like to replace with: I also wonder what the point is of putting semicolons after comments. In conclusion: I was in your camp until this discussion started - I really never ever use goto. You, however, have convinced me that there are reasonable exceptions to that rule.
Posted Dec 30, 2014 20:42 UTC (Tue)
by ldo (guest, #40946)
[Link] (1 responses)
I have this convention: if one of the exits from a loop is via an explicit break, then all exits from the loop shall be via explicit breaks. That way you don’t have to look for more than one kind of termination condition, just look for the breaks. Only if there are none will there be something like a for-loop termination condition. Never mix the two.
Posted Dec 30, 2014 21:12 UTC (Tue)
by filipjoelsson (guest, #2622)
[Link]
Is there any construct that you use as intended?
Posted Dec 30, 2014 3:00 UTC (Tue)
by flussence (guest, #85566)
[Link] (2 responses)
But the compiler can see your royal highness is stark naked. And so can we.
Posted Dec 30, 2014 9:47 UTC (Tue)
by anselm (subscriber, #2796)
[Link] (1 responses)
GOTO used to be a real problem when it was all that people had for control flow (think 1960s-era FORTRAN).
Right now with the proper discipline I don't have a huge problem with “goto” in C. As far as I'm concerned it is way less of a problem for program structure than the style that ldo is advocating. ldo's style also seems to presuppose that “cleanup” consists exclusively of undoing earlier things in reverse order; in general one might have to do things in an error branch that one might not have to do during normal flow, and sorting these cases out just serves to make the code even more convoluted on top of all the extraneous “do { … } while (0)” and “if (…) break” constructs.
I teach programming as my day job (some of the time, anyway), and I can guarantee that anybody who came to me with code like ldo's would get a very stern talking-to.
Posted Dec 30, 2014 20:53 UTC (Tue)
by ldo (guest, #40946)
[Link]
Posted Dec 25, 2014 22:41 UTC (Thu)
by viro (subscriber, #7872)
[Link] (28 responses)
It's both unidiomatic and brittle. Avoiding goto is generally a good idea, but this isn't a good way to do it.
Posted Dec 25, 2014 23:10 UTC (Thu)
by ldo (guest, #40946)
[Link] (27 responses)
That code already has plenty of loops in it, and you can see for yourself how to keep the interactions simple in that situation.
This isn’t a question of “religion”, it’s one of “bitter experience”. Learn from it!
Posted Dec 26, 2014 0:03 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (26 responses)
Experience shows that idiomatic "goto error_exit" is far better than the alternatives with fake loops and flags.
Posted Dec 26, 2014 0:36 UTC (Fri)
by ldo (guest, #40946)
[Link] (25 responses)
Posted Dec 26, 2014 1:32 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (24 responses)
I won't bother to rewrite all of the code, so only for ParseColorTuple: https://gist.github.com/Cyberax/4546471dcb026c141369 (sorry for possible issues with indentation).
It's now 10 lines less and much more concise (WTF are you incrementing the for-loop variable manually and then manually check conditions?).
I've also fixed a bug in line 246 (failure to check error immediately).
Posted Dec 26, 2014 2:12 UTC (Fri)
by ldo (guest, #40946)
[Link] (23 responses)
At least you made the effort, even if you “fixed” a non-bug and succeeded in introducing a real bug.
This was a simple, almost trivial case. Now try a more complex one. Elsewhere you will see do-once constructs nested up to 3 deep. And then there’s loops. Try and see how well your goto constructs scale up to those complexities.
Posted Dec 26, 2014 4:59 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (22 responses)
> This was a simple, almost trivial case. Now try a more complex one.
> Elsewhere you will see do-once constructs nested up to 3 deep.
And the rest of your code...
It pretty much falls into the 'unmaintainable crap' bin. Everything is just perfect: eclectic code style, exotic constructs (inner functions in... C?), profligate usage of goto disguised as do..nothing loops, functions spanning hundreds of lines, etc.
Posted Dec 26, 2014 5:34 UTC (Fri)
by ldo (guest, #40946)
[Link] (20 responses)
Come back when you’ve learned a bit more about programming.
Posted Dec 26, 2014 5:40 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
Or perhaps an audio codec? Nah, can't write those with gotos: https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_encoder.c;...
Posted Dec 26, 2014 21:21 UTC (Fri)
by ldo (guest, #40946)
[Link] (12 responses)
While I’m sure it is quite complex code, it doesn’t seem to do much with dynamic storage: I found just two calls to opus_free in those 2500-odd lines.
While my example is half that size, it has 39 calls to PY_{,X}DECREF.
Come back when you can cope with that kind of complexity.
Posted Dec 26, 2014 21:25 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (11 responses)
Posted Dec 27, 2014 20:56 UTC (Sat)
by ldo (guest, #40946)
[Link] (10 responses)
Posted Dec 27, 2014 20:58 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
Posted Dec 27, 2014 21:43 UTC (Sat)
by ldo (guest, #40946)
[Link] (8 responses)
And while we’re at it, your tun.c example fails to recover if it cannot create its sysfs device files.
Posted Dec 27, 2014 22:01 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
The Linux kernel manages to do that just fine, somehow.
In particular, your block near line 480 will look like this:
And let me reiterate, your code is a mess. For example, you use 'break' to normally exit the outer loop from within the 'if' statement here: https://github.com/ldo/dvd_menu_animator/blob/master/spuh...
Why do you even bother with 'for' loops?
Posted Dec 28, 2014 6:29 UTC (Sun)
by ldo (guest, #40946)
[Link] (6 responses)
At least it works, unlike your code.
Pot calling the kettle black, as it were?
Posted Dec 28, 2014 6:31 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Posted Dec 28, 2014 6:36 UTC (Sun)
by ldo (guest, #40946)
[Link] (4 responses)
Posted Dec 28, 2014 6:41 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
If you can make a reasonable description of what it does - I can guarantee that it's possible to write it far cleaner then what you have.
Posted Dec 28, 2014 21:33 UTC (Sun)
by ldo (guest, #40946)
[Link] (2 responses)
Posted Dec 28, 2014 22:37 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Dec 29, 2014 17:26 UTC (Mon)
by ldo (guest, #40946)
[Link]
What’s good for the goose, eh?
Posted Dec 26, 2014 16:14 UTC (Fri)
by mb (subscriber, #50428)
[Link] (5 responses)
This is like the guy complaining about _all_ those damn ghost drivers on the highway.
Huge projects use goto for error handling successfully. See the Linux kernel for instance.
Posted Dec 26, 2014 21:15 UTC (Fri)
by ldo (guest, #40946)
[Link] (4 responses)
Posted Dec 27, 2014 11:32 UTC (Sat)
by mb (subscriber, #50428)
[Link] (3 responses)
Posted Dec 27, 2014 20:40 UTC (Sat)
by ldo (guest, #40946)
[Link] (2 responses)
Posted Dec 27, 2014 23:14 UTC (Sat)
by vonbrand (guest, #4458)
[Link] (1 responses)
The pseudo-loops hide the error processing in a structure that could be anything else, and which BTW I haven't seen anywhere else either. ln comparison, the "label: <error handling code>" pattern is easy to spot.
Posted Dec 28, 2014 6:33 UTC (Sun)
by ldo (guest, #40946)
[Link]
Posted Dec 26, 2014 16:30 UTC (Fri)
by nix (subscriber, #2304)
[Link]
This is not something that you'd call out as an advantage and a sign of great experience unless you didn't actually have much. (But it's been worth reading this thread anyway: it's not every day you see people telling Al Viro that he doesn't have enough experience to judge the relative worth of exception-handling goto versus other techniques!)
Posted Dec 26, 2014 0:08 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (42 responses)
For example: https://github.com/ldo/dvd_menu_animator/blob/master/spuh... - does this 'break' exits the loop or is it simply an error exit?
Also, it looks like the outer loop will still do additional steps since 'break' only affects the inner loop, possibly causing more errors.
And what then? Ah, there's a dependency on a hidden state ('PyErr_Occurred').
Ugh once more.
Posted Dec 26, 2014 0:38 UTC (Fri)
by ldo (guest, #40946)
[Link] (41 responses)
Posted Dec 27, 2014 1:10 UTC (Sat)
by reubenhwk (guest, #75803)
[Link] (40 responses)
Posted Dec 27, 2014 1:24 UTC (Sat)
by reubenhwk (guest, #75803)
[Link] (39 responses)
Posted Dec 27, 2014 20:44 UTC (Sat)
by ldo (guest, #40946)
[Link] (38 responses)
Posted Dec 27, 2014 20:55 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (37 responses)
Posted Dec 27, 2014 21:32 UTC (Sat)
by ldo (guest, #40946)
[Link] (36 responses)
Posted Dec 28, 2014 3:51 UTC (Sun)
by reubenhwk (guest, #75803)
[Link] (35 responses)
Posted Dec 28, 2014 6:19 UTC (Sun)
by ldo (guest, #40946)
[Link] (34 responses)
There seems to be a marked reluctance among most of you nay-sayers to do this. But that is the only way you can prove that there is something to your point, more than mere hand-waving hot air.
Posted Dec 28, 2014 12:44 UTC (Sun)
by mm7323 (subscriber, #87386)
[Link] (33 responses)
Did you read the LWN Christmas/end of year message?
Posted Dec 28, 2014 22:17 UTC (Sun)
by ldo (guest, #40946)
[Link] (32 responses)
For those who just came in, and are too lazy to read the rest of the thread before sticking their oar in:
Any questions?
Posted Dec 28, 2014 22:56 UTC (Sun)
by nix (subscriber, #2304)
[Link] (3 responses)
The problem in the kernel is an inability to be sure that what those code paths do on error is even sane. They will execute if triggered, but because of this "too small to fail* rule, many of them have never executed, even when the system is out of memory.
Your digression on an idiosyncratic (and IMNHSO needlessly verbose and unreadable) technique to replace gotos is not relevant to this problem so cannot solve it.
To me, the technique appears similar to the panic that strikes people who started with one-return languages like Pascal when they are faced with languages that allow multiple returns. Yes, if carelessly used this can lead to unreadable spaghetti, but that doesn't mean the right answer is to ban it! It's amazing how many of my functions these days reduce to
Notice how little code error handling includes here -- I've replaced the actual work with pseudocode, but even in the real code the error unwinding is *two lines per check*, one an unavoidable conditional, plus optional extra work in those conditionals to do error-specific stuff first. The cleanup is no extra code: it's just a couple of goto labels, and if you miss one of them out or somehow manage to do some cleanup at a too-deeply-nested level you get a nice easy-to-spot syntax error.
This is ever so much neater than your false-loops approach and involves much less spurious nesting. It's linear in the normal case, just like the actual code flow, with branches only in exceptional conditions, while your code makes *loops* look like the normal condition, even though those loops actually never happen. Plus, spurious break and continue are both correctly diagnosed as syntax errors: in your approach, they cause a silent and erroneous change in control flow, as if an error had happened. It is much easier to mistakenly type or transpose 'break' outside e.g. a switch than it is to accidentally type 'goto some_cleanup'!
In my view it is you who has manifestly failed to understand the merits of our approach, which is quite clearly beneficial on every metric I can see other than the unthinking one which assigns an infinite automatic negative value to any use of 'goto' regardless, even uses which only jump forward in control flow to the end of a function and are thus precisely equivalent to a block with a cleanup in it and then a return. Spaghetti code this is not! It is much less spaghetti than yours.
Note: I came from a Pascal background and was at first violently anti-goto *and* anti-multiple-returns. When I look at my code from those long-ago days the contorted control flow is almost unreadable. Almost all the time a nice linear flow is preferable. Save loops for things that can actually *loop* at least once -- and for the while (0) macro trick.
Posted Dec 29, 2014 0:41 UTC (Mon)
by vonbrand (guest, #4458)
[Link]
That is exactly the point: With the Linux goto usage the normal code path is linear and clear for all to see, failure exceptional code paths are kept apart, for separate analysis (which they require, as "normal situation" isn't guaranteed in them).
Posted Dec 29, 2014 4:20 UTC (Mon)
by viro (subscriber, #7872)
[Link] (1 responses)
BTW, from the control flow graph decomposition POV, break is an atrocity almost equal to multiple returns (and if you add break <number of levels>
Frankly, this sort of attitude is best cured by careful reading of the standard papers circa late 60s--early 70s *plus* sorting through the usual fare on lambda-lifting, supercombinators and related stuff (circa early 80s) *plus* that on the monads sensu Peyton-Jones et.al. Attitude re goto-phobia, that is - "my code is perfect, whaddya mean, idiosyncrasies?" one is cured only by sufficiently convincing LART...
Posted Dec 30, 2014 20:57 UTC (Tue)
by ldo (guest, #40946)
[Link]
Feel free to try any of the error-recovery-within-loop cases. That is where the rubber hits the road, and where all the goto-ists have come a cropper.
Posted Dec 28, 2014 23:33 UTC (Sun)
by cesarb (subscriber, #6266)
[Link] (5 responses)
Stop right here. The problem is not with cleanups which should always be executed; these are already well-tested. The problem is with cleanups which should *never* be executed, unless an error happens.
In exception-handling terms, what you are talking about is a "finally" block, which executes both on success and on failure. The error-recovery paths in question, however, are "except" blocks, which execute only on failure.
For a filesystem-related example, since we're supposed to be talking about filesystems: let's say a user program just appended something into a file, and you're in the function which will submit the newly appended data to the block subsystem. As part of that, it has to allocate space for the new data, updating several of the filesystem's control structures, and allocate the control structures for the block subsystem.
If any allocation fails in the middle of this complex process, you have to unwind all the bookkeeping updates, otherwise the filesystem gets into an inconsistent state. If no allocation fails, however, the bookkeeping changes must be kept, since they represent the new state of the filesystem.
Is the pseudo-loop technique able to do the equivalent of an "except" block without getting even uglier? Remember that the rewind will have to be called from many places, since any of the allocations can fail, and that the failure can also be in the middle of the the initial update of the control structures, so the rewind might be partial.
The traditional way of doing this in the Linux kernel is somewhat like the following:
int foo(...)
The kernel developers are pragmatic. If a kernel developer feels that using a "pseudo-loop" results in cleaner code, they do use it. However, for error handling the usual design pattern is either a "goto error" (single error label, pointers initialized to NULL) or a "goto error_foo" (many error labels, each falling through to the next, releasing resources in reverse order). On success, either the unlock is duplicated (it's usually just a unlock, memory is rarely allocated just for the duration of one function), or the error handling does a "goto out" like in the example above.
Posted Dec 29, 2014 17:35 UTC (Mon)
by ldo (guest, #40946)
[Link] (4 responses)
This is why cleanups should be idempotent--pass the cleanup routine a NULL argument, and it does nothing. So the steps become something like
So when you get to the end, you simply execute all the relevant cleanups unconditionally, and the ones with NULL arguments turn into no-ops.
Posted Dec 29, 2014 18:14 UTC (Mon)
by cesarb (subscriber, #6266)
[Link] (3 responses)
You still are in the "cleanup" mentality. But the problem with filesystems is not "cleanup", it's "rollback". There's nothing to subsume the relevant code, and it subsumes nothing else; it's really something which executes only on failure.
> So when you get to the end, you simply execute all the relevant cleanups unconditionally, and the ones with NULL arguments turn into no-ops.
That doesn't help with the situation in question, where the memory allocation routines never return "failure". The relevant error-recovery code would always get passed NULL, and so the branch which is called when it receives a non-NULL never gets tested.
Sure, you *called* the code unconditionally, but it doesn't change the fact that it *executes* conditionally. Whether the branch point is outside or inside it is immaterial. For instance, it's well-known that the standard "free(void *ptr)" function does nothing when called with a NULL argument, but that's because it begins with a conditional branch: "if (!ptr) return;". If it were always called with a NULL pointer, the most complex part of its code would never be exercised.
Posted Dec 30, 2014 20:34 UTC (Tue)
by ldo (guest, #40946)
[Link] (2 responses)
That’s right. You keep insisting that “cleanup” and “rollback” are entirely different, whereas they are really two aspects of the same thing, and can be treated as such.
> That doesn't help with the situation in question, where the memory Returning NULL from an allocation request is a failure.
> Sure, you *called* the code unconditionally, but it doesn't change Do you know what “abstraction” means?
Posted Dec 30, 2014 22:33 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (1 responses)
Please, reread the article this comment thread is attached to.
The whole issue is that, under certain conditions, the allocation requests were *never* returning NULL, even when they should!
> > Sure, you *called* the code unconditionally, but it doesn't change the fact that it *executes* conditionally.
Please, reread the article this comment thread is attached to.
Abstraction or not, it doesn't change the fact that, since the allocation requests were *never* returning NULL, even when they should, the error handling code was *never* being executed. It doesn't matter whether it has been abstracted away or not, untested code is untested code.
Posted Dec 31, 2014 21:00 UTC (Wed)
by ldo (guest, #40946)
[Link]
Which is why I am advocating simpler error-handling paths, as in my example.
Posted Dec 28, 2014 23:59 UTC (Sun)
by bronson (subscriber, #4806)
[Link] (1 responses)
Hardly. Reread Cyberax's replies with an open mind this time. He's right: carefully applied gotos would take care of your "do /* once */ {} while(false)" nesting and ambiguous breaks. If you want to make your code more readable, this would be a great first step. After that, you could give it a good scrubbing with https://www.kernel.org/doc/Documentation/CodingStyle
But, from your aggressive writing style, it sounds like you'd rather argue.
Posted Dec 29, 2014 17:28 UTC (Mon)
by ldo (guest, #40946)
[Link]
I took the trouble to write all that code which proves my point. If you want to prove yours, you have to do the same.
Posted Dec 29, 2014 0:35 UTC (Mon)
by cesarb (subscriber, #6266)
[Link]
Taking a look at that code, it's easy to see why. It has a rather unorthodox coding style, which can make coders used only to common coding styles for both C *and* Python recoil in horror.
The code layout is the first thing one unavoidably notices when first looking at a new source code file, so even before even they can even consider the actual code quality, your code already has a negative score in their minds.
In particular, Python does have a standard coding style for C code: https://www.python.org/dev/peps/pep-0007/. When in Rome...
Here are, in no particular order, a few of the things which made *me* recoil in horror on a quick look at that code:
* The placement of the parameters in a function definition.
for (; i < 4; ++i)
Yes, the comment doesn't match the control structure it's attached to. This is why most people don't bother with these sorts of comments: unless they are part of the language (and thus forced to be correct by the compiler), they can and will end up being wrong.
None of that affects the quality of the compiled code; it's all stripped by the compiler (even the "dummy loops" are optimized out). The exception is all the PyErr_Occurred() calls; your code has redundant calls to that function (it's not a macro, so it won't be optimized out), which probably wouldn't happen with a more standard coding style.
Your code might or might not have a good quality; the strange coding style makes it unnecessarily hard to tell. And as others have noted, it has a few "code smells" (https://en.wikipedia.org/wiki/Code_smell), like excessively long functions, which usually correlate with a lower code quality.
Posted Dec 29, 2014 9:39 UTC (Mon)
by itvirta (guest, #49997)
[Link] (5 responses)
About errors promised, but not returned, and the numerous error paths that are untested
> Yet, when challenged, they are unable to actually prove their point, by offering simpler
You've repeated about five times the suggestion that someone _else_ should prove their
Not that mere numbers prove everything, but it makes one wonder. As does the fact that
I stumbled upon a rather apt quote attributed to Dijkstra regarding his well-known article
> But that just brings us back to the same point: if they think my code is too complicated or just
If this is just an exercise, perhaps a smaller one would do. Also of course, you could do your
Somehow, I'm starting to hope you are just trolling. That would at least _explain_ all of this. :)
Posted Dec 30, 2014 21:00 UTC (Tue)
by ldo (guest, #40946)
[Link] (4 responses)
It is the complex cases that demonstrate the scalability of my technique over GOTOs. This is evidenced by the fact that no one can come up with an equivalent using GOTOs that is even correct.
Posted Dec 30, 2014 22:56 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (3 responses)
Absence of evidence is not evidence of absence.
You posted a large block of source code, without any testcases, written in an unorthodox coding style. To do a decent rewrite, first one would have to understand the code, which is made harder by the uncommon code style and by the fact that it's a mathematical algorithm; experienced programmers tend to "pattern match" the source code visually to skip unimportant details, but this can only happen if one is familiar with the coding style. Then one would have to carefully reformat each function, converting the do-nothing loops into straight code without breaking the do-something loops; this is made harder because the same keyword (break) is being used to exit both. Finally, one would have to refactor the code into a more traditional style, being very careful to not break anything (since there are no testcases).
That's a lot of work for something which would be used solely to score a rhetorical point and then thrown away; the maintainer of the code (you) would not accept the changed code, since you're being so dogmatic about your coding style.
It could be worth doing the exercise if there was any chance that it would not be a wasted effort. Since there isn't, there are better uses for our time, and that's probably why nobody's bothering.
Posted Dec 31, 2014 21:01 UTC (Wed)
by ldo (guest, #40946)
[Link] (2 responses)
Posted Dec 31, 2014 22:25 UTC (Wed)
by bronson (subscriber, #4806)
[Link] (1 responses)
Posted Jan 1, 2015 21:46 UTC (Thu)
by ldo (guest, #40946)
[Link]
Posted Dec 29, 2014 18:36 UTC (Mon)
by acollins (guest, #94471)
[Link] (12 responses)
A goto label would provide a clear indication of where error handling takes place, instead, in your code I have to look at all the surrounding context to figure out what on earth "break" means in that particular context (is it exiting a real loop normally or is it a do-nothing loop to avoid a goto?). You mix error handling and normal loop control flow in a very confusing manner.
Contrast this with far more complex kernel code I've read that is much more understandable at first glance, largely due to readable error handling.
A number of people have already replied with similar thoughts but I'll reiterate, instead of lashing out, perhaps take a look at the feedback and reconsider your code.
Posted Dec 30, 2014 20:36 UTC (Tue)
by ldo (guest, #40946)
[Link] (11 responses)
That is the one case where the goto-ists have so far fallen flat on their faces when trying to “improve” my code.
Posted Dec 30, 2014 22:27 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (10 responses)
Sure. A very simple one, which should be easy to follow: the deeply nested unuse_mm() loop, which can be found at mm/swapfile.c. This is one I'm familiar with, other kernel developers most certainly know of better examples.
The first thing to notice is that, for better readability, it's split into several functions, one for each nesting level of the loop. The outermost loop, within unuse_mm, loops over the vmas and calls unuse_vma for each one. The next level, within unuse_vma, calls unuse_pud_range for each pgd; unuse_pud_range calls unuse_pmd_range for each pud; unuse_pmd_range calls unuse_pte_range for each pmd; and unuse_pte_range calls unuse_pte for each pte. Finally, unuse_pte does the real work, and it's where an error can happen.
Yes, we have a 5-level nested loop here, 4 of them looping over the 4-level abstract page table, with errors propagating outward from the innermost loop. Since each loop is in its own function, it doesn't use even need a "goto"; it can use a straight "return". But the innermost function (unuse_pte) does have an example of the traditional "cleanup" use of goto.
Now how about an example from XFS, since we're supposed to be talking about XFS? I randomly looked at its source code, and found xlog_alloc_log. That function has to deal with error recovery from before the loop, from after the loop, and from within the loop, and the error recovery must be run only on failure. It's an allocation function; if there's no failure, it must keep everything it allocated, and if there's any failure, it must release everything it has allocated.
Posted Dec 31, 2014 21:56 UTC (Wed)
by ldo (guest, #40946)
[Link] (9 responses)
Speaking of which, I notice there is no error checking on this call to pte_offset_map_lock. Can that never fail?
And what happens if unuse_pte returns an error, anyway? Do the outer routines abort, and leave their work half-done? Is this supposed to be cleanup code, or not?
Posted Dec 31, 2014 22:56 UTC (Wed)
by cesarb (subscriber, #6266)
[Link] (8 responses)
Looking at how it's implemented, that call does two things: it temporarily maps the page table, in a way which won't fail (in some architectures, it's a simple arithmetic operation, and in others, it uses a mechanism which has a number of slots reserved for temporary mappings), and it locks a spinlock. If the spinlock is unlocked, it can't fail; if the spinlock is locked, it will wait until its current owner unlocks it, so again it can't fail.
> And what happens if unuse_pte returns an error, anyway? Do the outer routines abort, and leave their work half-done?
The answer here is yes!
This code is ultimately called from within the swapoff system call (further down in the same file). There's in fact another outermost loop, try_to_unuse, which loops over the swapfile's pages and tries to unuse (yeah...) each one in turn. Here is where it's called:
1872 set_current_oom_origin();
Just before this fragment of code, the swapfile (or swap partition) is marked as disabled on the list of swapfiles. The try_to_unuse function then tries to move all the pages which currently reside into that swapfile back into the main memory, and make all page tables which pointed to these pages on the swap point to them in main memory, so the swapfile can be safely removed.
If try_to_unuse fails (usually because there's not enough memory to hold what's currently on the swapfile to be removed), this code enables the swapfile again (this part of the code used to be almost a duplicate of the corresponding part of the swapon code; I refactored it into a separate function used by both). It doesn't try to swap out again the pages it swapped in; if there's a need to free some of the main memory, the normal memory management code will swap them out again.
If try_to_unuse succeeds, on the other hand, the swapfile is now empty; the code after the fragment of code I pasted above releases all the resources which were allocated by the swapon system call for this swapfile, and returns success to the userspace.
Posted Jan 1, 2015 21:44 UTC (Thu)
by ldo (guest, #40946)
[Link] (7 responses)
In that case, this code is not very interesting. The interesting case would be the construction of a complex data structure, piece by piece, where each individual piece construction could fail. If any failures occur, then all the partially-constructed pieces so far need to be freed before returning an error indication to the caller. Only if all the construction steps succeed can the complete object be returned to the caller.
In my opinion, this would be just about the ultimate stress test of your error-recovery technique.
Search through my example for the comment “so I don't dispose of it yet” to see how I deal with this. You should find three instances.
Posted Jan 1, 2015 22:51 UTC (Thu)
by reubenhwk (guest, #75803)
[Link]
Posted Jan 1, 2015 23:12 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Jan 2, 2015 1:56 UTC (Fri)
by cesarb (subscriber, #6266)
[Link]
Well, swapoff is the destruction of a complex data structure, not its construction. Its construction is the swapon system call, further down in the same file.
The same design pattern can be found all over the Linux kernel: there's a construction function, which constructs whatever complex data structure the module needs, and a destruction function, which releases it.
> If any failures occur, then all the partially-constructed pieces so far need to be freed before returning an error indication to the caller. Only if all the construction steps succeed can the complete object be returned to the caller.
In the swapon system call, there's a single error handling label "bad_swap" which frees all the partially-constructed data structures, undoes block device configuration, closes the opened file, and so on. It falls through to the "out" label, used for both the success and failure cases, which releases any temporarily used resources.
> Search through my example for the comment “so I don't dispose of it yet” to see how I deal with this. You should find three instances.
I see. You have two variables for the return value of the function: one which has its reference counter decremented at the end, and one which is actually returned from the function. You keep the allocated structure at the first one, and when everything's ready, you swap it with the second one. So in the failure case, the reference counter is decremented and it returns NULL; in the success case, the reference counter is not decremented and it returns the pointer to the structure.
It's an elegant way of doing it (though I'm annoyed at your inconsistency: twice you called it "result" and once you called it "Result"). It's also orthogonal to the use of goto versus pseudo-loop for cleanup: this trick can help simplify the code in both cases.
What you did is actually a design pattern in modern C++:
std::unique_ptr<foo> fn(/*args*/)
Here, "ret" is a std::unique_ptr<foo>, which contains a pointer to a "foo". When this variable gets out of scope, which will happen if an exception is thrown, whatever is pointed to by "ret" will be deleted. When it reaches the "return ret", however, the pointer is moved (as in std::move) to the return value of the function, so when it leaves the scope, "ret" is pointing to NULL, and the returned object isn't deleted.
Posted May 23, 2017 13:21 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link] (3 responses)
struct foo *foo = calloc(1, sizeof(foo));
if (!(foo->a = geta()))
out:
Posted May 23, 2017 14:08 UTC (Tue)
by anselm (subscriber, #2796)
[Link] (2 responses)
This sort of thing may work most of the time, but just for the record, while calloc() fills the whole structure in question with all-bits-zero bytes, there is no guarantee in the C standard that an individual structure entry like foo->a will in fact turn out to be a valid null pointer afterwards. (The C language does not require the null pointer to be “all bits zero”, even though expressions like “!(foo->a = geta())” must still return 1, as in “true”, if the geta() call yields a null pointer.)
If you're unlucky this means that if, say, you error out when trying to allocate foo->b, the “free(foo->d);” at the beginning of the out: path might try to free something at the all-bits-zero-address-that-isn't-a-null-pointer that hasn't previously be allocated, which isn't allowed. This shortcut looks enticingly convenient and probably works on most platforms today but people who are interested in safe, portable, and standard-conforming C code shouldn't be using it.
Posted May 26, 2017 9:55 UTC (Fri)
by mirabilos (subscriber, #84359)
[Link] (1 responses)
Posted May 26, 2017 23:06 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Dec 29, 2014 19:14 UTC (Mon)
by rleigh (guest, #14622)
[Link] (4 responses)
Posted Dec 29, 2014 19:23 UTC (Mon)
by cesarb (subscriber, #6266)
[Link] (3 responses)
If you are willing to use GNU extensions, you can have it on C too via __attribute__((__cleanup__(...))). One well-known project which uses it is systemd.
But I agree that it's not going to be acceptable in the kernel, since it introduces implicit calls to the cleanup functions, and kernel developers prefer to be explicit.
Posted Dec 29, 2014 23:11 UTC (Mon)
by rleigh (guest, #14622)
[Link] (2 responses)
Posted Dec 30, 2014 23:28 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (1 responses)
Posted Dec 31, 2014 11:17 UTC (Wed)
by rleigh (guest, #14622)
[Link]
Posted Jan 12, 2015 0:54 UTC (Mon)
by Kamilion (subscriber, #42576)
[Link] (1 responses)
I've been wondering how to interact between high level languages like python and low level languages like C, C++, and Rust for a good long time.
This has been a great window into some traps and pitfalls and how-times-have-changed situations.
Does anyone know of any coding contests where a body of decent coders critique intentionally terribly written code (as opposed to, say, Obfuscated C) in a manner such as this thread? I'm vaguely aware of code-golf from seeing sidebar-links on the stackoverflow network, which I guess could be close, but can anyone offer anything else similar?
Posted Jan 12, 2015 12:53 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Dec 26, 2014 1:38 UTC (Fri)
by davecb (subscriber, #1574)
[Link]
The maintainers then see if their error-handers work.
We used to a variant on this in Solaris, specifically using interposers to catch, report and continue after some #%#^!!! unwise person wrote
The trick is generalizable, so you can run in can't-fail mode on individual calls until they're all converted to handling failures properly.
--dave
Posted Dec 31, 2014 2:23 UTC (Wed)
by gerdesj (subscriber, #5446)
[Link] (4 responses)
Many here are experts in some way or another. Some are regarded as de-facto experts by most of their peers. I think we might all learn by listening even when the speaker is clearly wrong by our own measure.
This article was about long held assumptions being shown to not be quite right and that seems somehow appropriate in the comments final analysis.
Cheers
Posted Dec 31, 2014 8:50 UTC (Wed)
by cyronin (guest, #99592)
[Link] (3 responses)
I've often been told that the best way to learn something is to assert the wrong thing to experts, and they will give a clearer rationale on that topic than you will ever get from any other source. As a bonus, you might even cause a previously unnoticed rational inconsistency to become obvious, via the rubber duck effect.
Posted Jan 1, 2015 1:05 UTC (Thu)
by nix (subscriber, #2304)
[Link]
It's a highly efficient way to become rapidly convinced that you are still below the Dunning-Kruger bound on whatever it is. :)
(I have, of course, been there. More than once...)
Posted Jan 1, 2015 21:16 UTC (Thu)
by zlynx (guest, #2285)
[Link] (1 responses)
This may be true but to do it on purpose only wastes the expert's time and make them stop wanting to talk to anyone.
In my opinion this is why discussions on comp.lang.c eventually turned every thread into "Read the FAQ" and for a while StackOverflow every new question was turned into a duplicate, until it became too much work to even find the duplicate for each stupid question and now the site is mostly junk. In my opinion.
Posted Jan 2, 2015 3:12 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Who reads the entire site and cares?
Posted Jan 5, 2015 10:06 UTC (Mon)
by mtorni (guest, #3618)
[Link]
I was thinking of checking all the execution paths (disregarding loops)
Might be some work to implement, but the reports could be great for verification of code before commits.
Posted Jan 8, 2015 13:41 UTC (Thu)
by pebolle (subscriber, #35204)
[Link]
"Nobody really knows when the rule that these allocations could not fail went into the kernel; it predates the Git era."
This triggered me to try to uncover that history. I think I managed to do that. These are the highlights. I won't actually analyze these highligths: just finding them already cost me quite a bit of time!
It appears to start at v2.4.10 which included a "major rewrite of the VM code" by Andrea Arcangeli. That apparently "came as a bit of a surprise" to most kernel developers . As far as I understand quite a few threads on lkml discuss the exiting bugs people hit in that code.
One of the tweaks to that code was added in v2.4.11-pre6. It added
Small allocations and OOM killer
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Error recovery (was: The "too small to fail" memory-allocation rule)
Re: OTOH, Android expects its apps to be able to handle crashes cleanly.
Re: OTOH, Android expects its apps to be able to handle crashes cleanly.
The "too small to fail" memory-allocation rule
I hope I didn't create the wrong impression; XFS is not the problem here, it just happened to be the filesystem that was in use.
XFS
XFS
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
Such an allocation must not risk waiting for FS or IO activity. Yet waiting for the OOM killer can clearly do that. This is a violation of the interface whether you accept the "too small to fail" rule or not.
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
That issue was solved a long time ago, which is why the vfork(2) hack is obsolete nowadays.
Re: shouldn't require a complete copy of the parent process's memory
Re: shouldn't require a complete copy of the parent process's memory
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The man page says this about MAP_NORESERVE:
The "too small to fail" memory-allocation rule
Do not reserve swap space for this mapping. When swap space is reserved,
one has the guarantee that it is possible to modify the mapping. When swap
space is not reserved one might get SIGSEGV upon a write if no physical
memory is available. See also the discussion of the file
/proc/sys/vm/overcommit_memory in proc(5). In kernels before 2.6, this
flag had effect only for private writable mappings.
That looks a lot like a a flag for mapping that doesn't reserve commit charge. There's no mention of the flag not actually working; there's no reference to better techniques. MAP_NORESERVE is just an attractive nuisance.
The "too small to fail" memory-allocation rule
SHARED - size of mapping
PRIVATE READ-only - 0 cost (but of little use)
PRIVATE WRITABLE - size of mapping per instance
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
Or You Could Simplify The Error-Recovery Paths
ptr = NULL;
... similarly initialize any other pointers ...
do /*once*/
{
...
allocate ptr;
if (failure)
break;
...
futher processing, including further pointer allocations as necessary
...
}
while (false);
free(ptr);
... free any other allocations ...
Or You Could Simplify The Error-Recovery Paths
> if (failure)
> break;
> } while (false);
No! No GOTOs. They don’t nest easily, and they lead to spaghetti code.
Look at the example code I linked above, and think how it would look with GOTOs all over the place. Avoid GOTOs!
Re: Wouldn't it be simpler and more readable to use a goto instead of a pseudo-loop?
Re: Wouldn't it be simpler and more readable to use a goto instead of a pseudo-loop?
Re: There would be precisely as many "goto" statements
Re: There would be precisely as many "goto" statements
Re: So, YES to gotos for error handling!
Re: There would be precisely as many "goto" statements
Re: There would be precisely as many "goto" statements
Re: There would be precisely as many "goto" statements
Re: There would be precisely as many "goto" statements
Re: There would be precisely as many "goto" statements
if (error1) {
goto label;
}
/* ... more code ... */
if (error2) {
goto label;
}
/* ... more code ... */
label: free(pointer);
Re: you're wrong about the claim of needing a billion labels
Re: you're wrong about the claim of needing a billion labels
Re: deeply nested allocations example *should* be solved by factoring out
Re: deeply nested allocations example *should* be solved by factoring out
Re: What else do you want?
Re: deeply nested allocations example *should* be solved by factoring out
Re: deeply nested allocations example *should* be solved by factoring out
Re: deeply nested allocations example *should* be solved by factoring out
Re: deeply nested allocations example *should* be solved by factoring out
goto bike shed
> example code I linked above, and think how it would look with GOTOs all over the place.
> Avoid GOTOs!
> if (PyErr_Occurred()) break;
> }
> if (PyErr_Occurred()) break;
to then only break out of another loop is any clearer than just jumping out directly.
(like exceptions) that can break out of multiple loops. It's not very different from what goto
was recommended for in this case.
doesn't lead to anything good, so I would advise against it.)
> deep, as well as loops. See how well that works using GOTOs.
if I were to do this exercise for you.
Re: not sure I'd agree that breaking out of a loop just to test the same error condition
Re: not sure I'd agree that breaking out of a loop just to test the same error condition
Re: not sure I'd agree that breaking out of a loop just to test the same error condition
>> propagating the error condition onwards. That is the whole point of the nesting.
Re: Better yet, use a function for each of those nested levels.
Re: Better yet, use a function for each of those nested levels.
Re: Better yet, use a function for each of those nested levels.
Re: Better yet, use a function for each of those nested levels.
Re: There would be precisely as many "goto" statements
Re: Wouldn't it be simpler and more readable to use a goto instead of a pseudo-loop?
Re: Please, no knee-jerk "no goto ever" reactions
Re: Please, no knee-jerk "no goto ever" reactions
do { /* code */ } while (false);
construct - I'd use goto cleanup
. The point of this would have been to prove to you that what you are doing is in fact to use goto, but in a circumspect manner.do { /* code */ } while (false);
is one of your smaller sins!for (i=0;;) {
if (i == nrcolors) break;
/* code */
++i;
}if (nrhistentries <= 4 || /* other condition */)
{
if (nrhistentries > 0) {
histogram[0].index = 0;
if (nrhistentries > 1) {
histogram[1].index = 1;
if (nrhistentries > 2) {
histogram[2].index = 2;
if (nrhistentries > 3) {
histogram[3].index = 3;
}
}
}
}
int max_histentries = (nrhistentries>4)?4:nrhistentries;
for (i=0; i<max_histentries; ++i) {
histogram[i].index = i;
}Re: Why the f*ck do you for instance do things like this?
Re: Why the f*ck do you for instance do things like this?
Re: Please, no knee-jerk "no goto ever" reactions
Re: Please, no knee-jerk "no goto ever" reactions
Re: in general one might have to do things in an error branch that one might not have to do during normal flow
Re: Wouldn't it be simpler and more readable to use a goto instead of a pseudo-loop?
Re: E.g. a loop or switch added,
Re: E.g. a loop or switch added,
Re: Experience shows that idiomatic "goto error_exit" is far better than the alternatives with fake loops and flags.
Re: Experience shows that idiomatic "goto error_exit" is far better than the alternatives with fake loops and flags.
Re: I won't bother to rewrite all of the code, so only for ParseColorTuple:
Re: I won't bother to rewrite all of the code, so only for ParseColorTuple:
Where?
You can do it _automatically_ by rewriting do-nothing loops into 'goto cleanup' and merging cleanup actions if needed.
That's not something to be proud of, you know. It means that you've failed to decompose your logic into smaller fragments.
Re: It pretty much falls into the 'unmaintainable crap' bin
Re: It pretty much falls into the 'unmaintainable crap' bin
Like... I don't know, say an OS kernel?
Re: Or perhaps an audio codec?
Re: Or perhaps an audio codec?
Re: Your wish is my command
Re: Your wish is my command
Re: What do you mean by "with a loop"?
Re: What do you mean by "with a loop"?
No you don't. Just split them up into free-standing functions, instead of 300-line monstrosities.
https://gist.github.com/Cyberax/3a7796231be66d0f64cc
(no, I'm not going to check it in details). It's pretty clear that simply by splitting some logic into a function you can decrease nesting level by 3.
Re: And let me reiterate, your code is a mess.
Re: And let me reiterate, your code is a mess.
Re: I have not claimed that my code works
Re: I have not claimed that my code works
Re: If you can make a reasonable description of what it does
Re: If you can make a reasonable description of what it does
Re: Please, provide a coherent description
Re: It pretty much falls into the 'unmaintainable crap' bin
And your "nested pseudo loops error handler" argument is void, because thouse unmaintainable monsters should be split into sub-routines anyway. And then it can use plain gotos. Which gains the additional opportunity to write structured error paths instead of "one must handle all errors" error paths.
Re: Huge projects use goto for error handling successfully
Re: Huge projects use goto for error handling successfully
Your complex pseudo-loop can't help here at all. It just adds extra complexity. The error paths will still be untested.
Re: The error paths will still be untested.
Re: The error paths will still be untested.
No, the cleanups always happen after the do-once constructs. That’s how you can be sure they will always be executed.
Re: The pseudo-loops hide the error processing
(do/once constructs nested up to three deep)
Re: I won't bother to rewrite all of the code, so only for ParseColorTuple:
That's not something to be proud of, you know. It means that you've failed to decompose your logic into smaller fragments.
Actually it means he's probably trying to do multiple things which need unwinding, but because he can't use three goto labels and rely on fallthrough he has to add a layer of almost-entirely-redundant nesting for each one.
Or You Could Simplify The Error-Recovery Paths
Ugh. This is an unmaintainable mess.
Re: does this 'break' exits the loop or is it simply an error exit?
Are you choosing this...
Re: does this 'break' exits the loop or is it simply an error exit?
for (i = 0; i < MAX; ++i) {
do { /*once*/
if (err) {
i = MAX;
break;
}
} while (false);
}
over this?
goto cleanup;
This isn't good...
Re: does this 'break' exits the loop or is it simply an error exit?
for (i = 0; i < IMAX; ++i)
{
for (j = 0; j < JMAX; ++j)
{
do { /* once */
if (PyErr_Occurred())
break;
} while (false);
if (PyErr_Occurred())
break;
}
if (PyErr_Occurred())
break;
}
This is far better...
for (i = 0; i < IMAX; ++i)
{
for (j = 0; j < JMAX; ++j)
{
if (PyErr_Occurred())
goto done;
}
}
done:
Dealing with complexity means splitting a complex problem up into simple things people can understand. Good programmers don't write complex code.
Good programmers write simple code.
Also, the *ONLY* time you should use a pseudo-loop like this is in a macro so the compiler will accept the semi-colon following it...
#define COMPLEX_MACRO_SHOULD_BE_A_FUNC(x) do { \
/* complex macro code isn't good either */ \
} while (0)
later...
COMPLEX_MACRO_SHOULD_BE_A_FUNC("hello world");
You do realize that PyErr_Occurred() is checking for an error from other API calls, right? So where do you put these API calls in your example?
Re: This is far better...
Re: This is far better...
Re: Right before the PyErr_Occurred.
Re: Right before the PyErr_Occurred.
for (i = 0; i < IMAX; ++i)
{
for (j = 0; j < JMAX; ++j)
{
/* Some Python C calls, allocations, etc */
if (PyErr_Occurred())
goto done;
}
}
done:
free(whatever);
if (fileptr)
fclose(fileptr);
Don't fight it. Embrace goto statements. Goto is clearly better than calling PyErr_Occurred multiple times. For all we know, PyErr_Occurred checks the entire state of the Python interpreter (possible a non-trivial, expensive, operation). goto done will emit one machine instruction. Don't abuse other language constructions (do nothing loops) just to avoid using a goto...especially when you're effectively doing the same thing. Try it. You'll like it.
Re: Right before the PyErr_Occurred.
Re: Right before the PyErr_Occurred.
The Story So Far...
The problem in the kernel is not an inability to be sure that cleanups will execute. With the goto exception-handling technique, unless you omit a goto, they always execute, and if your cleanups are executed in the normal path (i.e. there is no return before the goto label) they will execute even if you omit a goto. This is *exactly the same* as in your technique.
The Story So Far...
int err = -1;
if (do-nothing-fastpath-test)
return 0; /* early */
stuff that allocates memory or does other things needing cleanup in both success and failure cases
if (error check)
goto cleanup_stuff;
stuff that doesn't need cleanup but can fail
if (error check)
goto cleanup_stuff;
more stuff
if (error check)
goto cleanup_more_stuff;
...
err = 0; /* success */
cleanup_more_stuff:
undo 'more stuff'
cleanup_stuff:
undo 'stuff'
return err;
It is clear that in this code -- other than in the early-fast-exit case -- the cleanups will always execute. You don't have to have a 'return' before them! The normal flow of control passes smoothly through them at all times: it's just that the cleanups can skip some of that flow (the parts relating to things they did that don't need cleanup, and where the cleanup is not something like free() that is harmless to do on something that was never allocated).
The Story So Far...
The Story So Far...
a-la sh(1), it becomes _worse_ than multiple returns). Not to mention its inconsistency between if() and switch(), etc.
Re: ldo either has never bothered to read any of the relevant papers
The Story So Far...
{
int err = 0;
/* do stuff */
err = bar(...);
if (err)
goto error;
/* do even more stuff */
err = baz(...);
if (err)
goto error;
/* do yet more stuff */
out:
/* free temporary memory and unlock mutexes */
return err;
error:
/* unwind state updates */
goto out;
}
Re: The problem is with cleanups which should *never* be executed, unless an error happens.
Re: The problem is with cleanups which should *never* be executed, unless an error happens.
Re: You still are in the "cleanup" mentality.
> allocation routines never return "failure".
> the fact that it *executes* conditionally.
Re: You still are in the "cleanup" mentality.
> Returning NULL from an allocation request is a failure.
> Do you know what “abstraction” means?
Re: requests were *never* returning NULL, even when they should!
The Story So Far...
Re: you win the argument?
The Story So Far...
* The placement of the comment describing the function. Its traditional location (and the one all code-documentation tools expect) is before the function; you placed it just before the opening brace, where the parameters types were declared in pre-standard C.
* The unorthodox indentation of a multi-line expression, which placed each operator in its own line (instead of, as usually done, either at the end or at the beginning of a line).
* The use of an "end comment" for control structures. An actual example from that code file:
{
/* ... */
} /*if*/
The Story So Far...
>which has somehow baked itself into many parts of the Linux kernel since long before anyone can now remember.
because of that (plus a nice lock-up), if I got it right. The actual coding style of the error
paths seems unrelated.
> alternatives to my code.
> So they resort to trying to cast aspersions on the quality of my code.
way by reorganising _your_ code. Given that everyone else, including the Linux kernel folks
seem to be happy with their gotos against your suggestion, it seems that the numbers are
against you, and I've seen no proof for your way other than your personal (and persistent)
assertion.
someone tried to rework your code, but made mistakes. Doesn't that also reflect badly on your
code? Perhaps it isn't as well-readable as you assert? Some specific issues about the coding
style have been mentioned, but I haven't seen any real arguments to defend them, except the
dogmatic anti-goto attitude. (And you accuse others of prejudice!)
about not using goto. Perhaps you should ponder on it for a moment:
"I have the uncomfortable feeling that others are making a religion out of it, as if the conceptual
problems of programming could be solved by a single trick, by a simple form of coding discipline!"
> plain crap, why can’t they do better? Why can they not come up with something simpler to solve
> the same real-world problem? But they cannot.
part in comparing the options. It's not like anyone has a responsibility to attend to exercises
someone else gives on the Internet. If you want to take the lack of submissions as proof of
your superior position, you are of course free to do so. But it doesn't mean you are right,
or that others will want to talk to you after that.
Re: If this is just an exercise, perhaps a smaller one would do
Re: If this is just an exercise, perhaps a smaller one would do
*Yawn* More content-free hand-waving hot air. Show us the code!
Re: Absence of evidence is not evidence of absence.
Re: Absence of evidence is not evidence of absence.
Because they’re just not that interesting.
Re: why don't you rewrite the goto-laden kernel examples cesarb posted?
The Story So Far...
Re: Contrast this with far more complex kernel code
Re: Contrast this with far more complex kernel code
Re: Finally, unuse_pte does the real work, and it's where an error can happen.
Re: Finally, unuse_pte does the real work, and it's where an error can happen.
1873 err = try_to_unuse(p->type, false, 0); /* force unuse all pages */
1874 clear_current_oom_origin();
1875
1876 if (err) {
1877 /* re-insert swap space back into swap_list */
1878 reinsert_swap_info(p);
1879 goto out_dput;
1880 }
Re:The answer here is yes!
Re:The answer here is yes!
Re:The answer here is yes!
Re:The answer here is yes!
{
auto ret = std::make_unique<foo>(/*args*/);
/* ...code which can throw an exception... */
return ret;
}
Re:The answer here is yes!
goto out;
if (!(foo->b = getb()))
goto out;
if (!(foo->c = getc()))
goto out;
if (!(foo->d = getd()))
goto out;
return (foo);
free(foo->d);
free(foo->c);
free(foo->b);
free(foo->a);
return (NULL); /* error */
Re:The answer here is yes!
Re:The answer here is yes!
Re:The answer here is yes!
Nowadays I'd do this:
Or You Could Simplify The Error-Recovery Paths
std::unique_ptr<a> ptra(std::make_unique<a>(args)); // a instance allocated
std::unique_ptr<b> ptrb(std::make_unique<b>(args)); // b instance allocated
// stuff that might fail.
ptrb->foo(ptra->bar());
// Cleanup of ptra a instance and ptrb b instance is automatic
I know this isn't going to be acceptable in the kernel, but in userspace this gives you guaranteed correct behaviour on error, and will clean up correctly on allocation failure of either instance, returns at any point and also exceptions at any point. No need for complex conditionals. No need for gotos. It's all automatically handled.
Unless you really want to suffer with C, you can just have it all handled if you use the facilities C++ offers you. The worry that you've missed some special case is taken care of by RAII, and it does lead to simpler, more readable, more maintainable and more robust code.
Or You Could Simplify The Error-Recovery Paths
Or You Could Simplify The Error-Recovery Paths
Or You Could Simplify The Error-Recovery Paths
Or You Could Simplify The Error-Recovery Paths
Or You Could Simplify The Error-Recovery Paths
Or You Could Simplify The Error-Recovery Paths
Find and fix by exhaustion
For each of these, start a standard regression test.
When the allocator is called
- turn on coverage tracking
- return error from the allocator
- run for another 10 instructions or so
- stop and mail the tcov results to the subsystem maintainer.
continue the loop with the next call
if (*p != '\0') instead of if (p != NULL & *p != '\0').
It takes lots of calendar time, but not much time per individual maintainer, so it scales.
The "too small to fail" memory-allocation rule
Jon
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
I've often been told that the best way to learn something is to assert the wrong thing to experts, and they will give a clearer rationale on that topic than you will ever get from any other source.
It's even better if you can manage to accidentally manage this trick when your interlocutor is the original inventor of whatever-it-is. Particularly if you didn't realise it. (This was much easier before the days of web search engines, but if you don't think to check it can happen these days too.)
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
Partial static analysis of error paths
and producing a report of all possible cases of remaining allocations and not-rolled-back actions. For a simple function the report would be the "happy case" (several allocations in effect, perhaps a lock) and "failure" (no allocations and locks in effect after exit). If the function had error-handling bugs, there would be also be partial exit cases (some allocations in force, perhaps a lock).
The "too small to fail" memory-allocation rule
/* Don't let big-order allocations loop */
if (order)
return NULL;
to mm/page_alloc.c:__alloc_pages(). This appears to be added by Linus himself. I have not (yet) found any public discussion of this change.
In v2.4.13-pre5 these lines were changed to
/* Don't let big-order allocations loop */ if (order > 1) return NULL;
(I haven't tried to link this change to a patch or a public discussion.)
Then between v2.4.14-pre8 and v2.4.14 these were changed to
/* Don't let big-order allocations loop */ if (order > 3) return NULL;This seems to done again by Linus himself.
Fast forward to v2.5.46. It included a patch by Andrew Morton titled "disable PF_MEMALLOC for interrupt allocations". It "tidies things up a little in there" by changing the code to
/* * Don't let big-order allocations loop. Yield for kswapd, try again. */ if (order <= 3) { yield(); goto rebalance; }
Andrew Morton again cleaned up the code quite a bit in v2.5.69 with the patch called "implement __GFP_REPEAT, __GFP_NOFAIL, __GFP_NORETRY". I won't quote the new code.
And, finally, in the git era "3" was changed in v2.6.23 to "PAGE_ALLOC_COSTLY_ORDER" by Andy Whitcroft's commit 5ad333eb66ff ("Lumpy Reclaim V4").
Anyone with access to a Linux git repository can easily follow the subsequent changes to that code. But hopefully this shows the history before the Git era (ie, before v2.6.12). Unless, of course, this isn't actually the code discussed in this article!
Posted Jan 19, 2015 15:53 UTC (Mon)
by pal (guest, #57253)
[Link] (1 responses)
Posted Jan 19, 2015 18:33 UTC (Mon)
by dgm (subscriber, #49227)
[Link]
Your rule actually makes a lot of sense. Locks exists to ensure concurrent access to a shared resource in a coherent way. If you're calling "foreign" code, then your code is not just accessing the shared resource, and thus it is protecting too much. Probably the "right" thing to do is to move all resource allocations (including memory) outside the lock scope, and only acquire the lock once all the needed resources are ready.
Posted Feb 12, 2015 13:30 UTC (Thu)
by wenhua.yu (guest, #100439)
[Link] (2 responses)
Posted Feb 17, 2015 19:43 UTC (Tue)
by bronson (subscriber, #4806)
[Link]
Posted Mar 17, 2015 16:55 UTC (Tue)
by meuh (guest, #22042)
[Link]
Posted May 30, 2022 13:28 UTC (Mon)
by huxiaoming (guest, #158793)
[Link]
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule
我就看看
The "too small to fail" memory-allocation rule
According to some online translator:
"So lively discussion,
I'll take a look".
The "too small to fail" memory-allocation rule
The "too small to fail" memory-allocation rule