The Story So Far...

Posted Dec 29, 2014 18:36 UTC (Mon) by acollins (guest, #94471)
In reply to: The Story So Far... by ldo
Parent article: The "too small to fail" memory-allocation rule

Looking at your spuhelper.c example has convinced me of the exact opposite. I find your error paths nearly unreadable.

A goto label would provide a clear indication of where error handling takes place, instead, in your code I have to look at all the surrounding context to figure out what on earth "break" means in that particular context (is it exiting a real loop normally or is it a do-nothing loop to avoid a goto?). You mix error handling and normal loop control flow in a very confusing manner.

Contrast this with far more complex kernel code I've read that is much more understandable at first glance, largely due to readable error handling.

A number of people have already replied with similar thoughts but I'll reiterate, instead of lashing out, perhaps take a look at the feedback and reconsider your code.

Re: Contrast this with far more complex kernel code

Posted Dec 30, 2014 20:36 UTC (Tue) by ldo (guest, #40946) [Link] (11 responses)

I’d be curious to know if anybody can point to an example in the kernel which has to deal with error recovery from inside a loop, similar to my code.

That is the one case where the goto-ists have so far fallen flat on their faces when trying to “improve” my code.

Re: Contrast this with far more complex kernel code

Posted Dec 30, 2014 22:27 UTC (Tue) by cesarb (subscriber, #6266) [Link] (10 responses)

> I’d be curious to know if anybody can point to an example in the kernel which has to deal with error recovery from inside a loop, similar to my code.

Sure. A very simple one, which should be easy to follow: the deeply nested unuse_mm() loop, which can be found at mm/swapfile.c. This is one I'm familiar with, other kernel developers most certainly know of better examples.

The first thing to notice is that, for better readability, it's split into several functions, one for each nesting level of the loop. The outermost loop, within unuse_mm, loops over the vmas and calls unuse_vma for each one. The next level, within unuse_vma, calls unuse_pud_range for each pgd; unuse_pud_range calls unuse_pmd_range for each pud; unuse_pmd_range calls unuse_pte_range for each pmd; and unuse_pte_range calls unuse_pte for each pte. Finally, unuse_pte does the real work, and it's where an error can happen.

Yes, we have a 5-level nested loop here, 4 of them looping over the 4-level abstract page table, with errors propagating outward from the innermost loop. Since each loop is in its own function, it doesn't use even need a "goto"; it can use a straight "return". But the innermost function (unuse_pte) does have an example of the traditional "cleanup" use of goto.

Now how about an example from XFS, since we're supposed to be talking about XFS? I randomly looked at its source code, and found xlog_alloc_log. That function has to deal with error recovery from before the loop, from after the loop, and from within the loop, and the error recovery must be run only on failure. It's an allocation function; if there's no failure, it must keep everything it allocated, and if there's any failure, it must release everything it has allocated.

Re: Finally, unuse_pte does the real work, and it's where an error can happen.

Posted Dec 31, 2014 21:56 UTC (Wed) by ldo (guest, #40946) [Link] (9 responses)

Speaking of which, I notice there is no error checking on this call to pte_offset_map_lock. Can that never fail?

And what happens if unuse_pte returns an error, anyway? Do the outer routines abort, and leave their work half-done? Is this supposed to be cleanup code, or not?

Re: Finally, unuse_pte does the real work, and it's where an error can happen.

Posted Dec 31, 2014 22:56 UTC (Wed) by cesarb (subscriber, #6266) [Link] (8 responses)

> Speaking of which, I notice there is no error checking on this call to pte_offset_map_lock. Can that never fail?

Looking at how it's implemented, that call does two things: it temporarily maps the page table, in a way which won't fail (in some architectures, it's a simple arithmetic operation, and in others, it uses a mechanism which has a number of slots reserved for temporary mappings), and it locks a spinlock. If the spinlock is unlocked, it can't fail; if the spinlock is locked, it will wait until its current owner unlocks it, so again it can't fail.

> And what happens if unuse_pte returns an error, anyway? Do the outer routines abort, and leave their work half-done?

The answer here is yes!

This code is ultimately called from within the swapoff system call (further down in the same file). There's in fact another outermost loop, try_to_unuse, which loops over the swapfile's pages and tries to unuse (yeah...) each one in turn. Here is where it's called:

1872 set_current_oom_origin();
1873 err = try_to_unuse(p->type, false, 0); /* force unuse all pages */
1874 clear_current_oom_origin();
1875
1876 if (err) {
1877 /* re-insert swap space back into swap_list */
1878 reinsert_swap_info(p);
1879 goto out_dput;
1880 }

Just before this fragment of code, the swapfile (or swap partition) is marked as disabled on the list of swapfiles. The try_to_unuse function then tries to move all the pages which currently reside into that swapfile back into the main memory, and make all page tables which pointed to these pages on the swap point to them in main memory, so the swapfile can be safely removed.

If try_to_unuse fails (usually because there's not enough memory to hold what's currently on the swapfile to be removed), this code enables the swapfile again (this part of the code used to be almost a duplicate of the corresponding part of the swapon code; I refactored it into a separate function used by both). It doesn't try to swap out again the pages it swapped in; if there's a need to free some of the main memory, the normal memory management code will swap them out again.

If try_to_unuse succeeds, on the other hand, the swapfile is now empty; the code after the fragment of code I pasted above releases all the resources which were allocated by the swapon system call for this swapfile, and returns success to the userspace.

Re:The answer here is yes!

Posted Jan 1, 2015 21:44 UTC (Thu) by ldo (guest, #40946) [Link] (7 responses)

In that case, this code is not very interesting. The interesting case would be the construction of a complex data structure, piece by piece, where each individual piece construction could fail. If any failures occur, then all the partially-constructed pieces so far need to be freed before returning an error indication to the caller. Only if all the construction steps succeed can the complete object be returned to the caller.

In my opinion, this would be just about the ultimate stress test of your error-recovery technique.

Search through my example for the comment “so I don't dispose of it yet” to see how I deal with this. You should find three instances.

Re:The answer here is yes!

Posted Jan 1, 2015 22:51 UTC (Thu) by reubenhwk (guest, #75803) [Link]

It seems like the vars around /* so I don't dispose of it yet */ are used as local variable where you're building/using something, and at a point (where the assignment of result is) is where you're ready to call this data structure complete and ready to return. I do something similar. When I can I try to ensure that a function either completely fails or completely succeeds with no in-between.

Re:The answer here is yes!

Posted Jan 1, 2015 23:12 UTC (Thu) by nix (subscriber, #2304) [Link]

There have been multiple worked examples of exactly this shown to you already.

Re:The answer here is yes!

Posted Jan 2, 2015 1:56 UTC (Fri) by cesarb (subscriber, #6266) [Link]

> In that case, this code is not very interesting. The interesting case would be the construction of a complex data structure, piece by piece, where each individual piece construction could fail.

Well, swapoff is the destruction of a complex data structure, not its construction. Its construction is the swapon system call, further down in the same file.

The same design pattern can be found all over the Linux kernel: there's a construction function, which constructs whatever complex data structure the module needs, and a destruction function, which releases it.

> If any failures occur, then all the partially-constructed pieces so far need to be freed before returning an error indication to the caller. Only if all the construction steps succeed can the complete object be returned to the caller.

In the swapon system call, there's a single error handling label "bad_swap" which frees all the partially-constructed data structures, undoes block device configuration, closes the opened file, and so on. It falls through to the "out" label, used for both the success and failure cases, which releases any temporarily used resources.

> Search through my example for the comment “so I don't dispose of it yet” to see how I deal with this. You should find three instances.

I see. You have two variables for the return value of the function: one which has its reference counter decremented at the end, and one which is actually returned from the function. You keep the allocated structure at the first one, and when everything's ready, you swap it with the second one. So in the failure case, the reference counter is decremented and it returns NULL; in the success case, the reference counter is not decremented and it returns the pointer to the structure.

It's an elegant way of doing it (though I'm annoyed at your inconsistency: twice you called it "result" and once you called it "Result"). It's also orthogonal to the use of goto versus pseudo-loop for cleanup: this trick can help simplify the code in both cases.

What you did is actually a design pattern in modern C++:

std::unique_ptr<foo> fn(/*args*/)
{
auto ret = std::make_unique<foo>(/*args*/);
/* ...code which can throw an exception... */
return ret;
}

Here, "ret" is a std::unique_ptr<foo>, which contains a pointer to a "foo". When this variable gets out of scope, which will happen if an exception is thrown, whatever is pointed to by "ret" will be deleted. When it reaches the "return ret", however, the pointer is moved (as in std::move) to the return value of the function, so when it leaves the scope, "ret" is pointing to NULL, and the returned object isn't deleted.

Re:The answer here is yes!

Posted May 23, 2017 13:21 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (3 responses)

What, you don’t initialise them to NUL and just free them afterwards?

struct foo *foo = calloc(1, sizeof(foo));

if (!(foo->a = geta()))
goto out;
if (!(foo->b = getb()))
goto out;
if (!(foo->c = getc()))
goto out;
if (!(foo->d = getd()))
goto out;
return (foo);

out:
free(foo->d);
free(foo->c);
free(foo->b);
free(foo->a);
return (NULL); /* error */

Re:The answer here is yes!

Posted May 23, 2017 14:08 UTC (Tue) by anselm (subscriber, #2796) [Link] (2 responses)

This sort of thing may work most of the time, but just for the record, while calloc() fills the whole structure in question with all-bits-zero bytes, there is no guarantee in the C standard that an individual structure entry like foo->a will in fact turn out to be a valid null pointer afterwards. (The C language does not require the null pointer to be “all bits zero”, even though expressions like “!(foo->a = geta())” must still return 1, as in “true”, if the geta() call yields a null pointer.)

If you're unlucky this means that if, say, you error out when trying to allocate foo->b, the “free(foo->d);” at the beginning of the out: path might try to free something at the all-bits-zero-address-that-isn't-a-null-pointer that hasn't previously be allocated, which isn't allowed. This shortcut looks enticingly convenient and probably works on most platforms today but people who are interested in safe, portable, and standard-conforming C code shouldn't be using it.

Re:The answer here is yes!

Posted May 26, 2017 9:55 UTC (Fri) by mirabilos (subscriber, #84359) [Link] (1 responses)

Sure, but that can be implementation-defined, and POSIX does do this (in the next version):

http://austingroupbugs.net/view.php?id=940#c2696

Re:The answer here is yes!

Posted May 26, 2017 23:06 UTC (Fri) by nix (subscriber, #2304) [Link]

Note the freedom which still exists: there can be *other* bit patterns that also represent the null pointer. :)