The Story So Far...

Posted Dec 28, 2014 23:33 UTC (Sun) by cesarb (subscriber, #6266)
In reply to: The Story So Far... by ldo
Parent article: The "too small to fail" memory-allocation rule

> This way, you can be sure that cleanups will always execute, no matter which way the flow of control goes.

Stop right here. The problem is not with cleanups which should always be executed; these are already well-tested. The problem is with cleanups which should *never* be executed, unless an error happens.

In exception-handling terms, what you are talking about is a "finally" block, which executes both on success and on failure. The error-recovery paths in question, however, are "except" blocks, which execute only on failure.

For a filesystem-related example, since we're supposed to be talking about filesystems: let's say a user program just appended something into a file, and you're in the function which will submit the newly appended data to the block subsystem. As part of that, it has to allocate space for the new data, updating several of the filesystem's control structures, and allocate the control structures for the block subsystem.

If any allocation fails in the middle of this complex process, you have to unwind all the bookkeeping updates, otherwise the filesystem gets into an inconsistent state. If no allocation fails, however, the bookkeeping changes must be kept, since they represent the new state of the filesystem.

Is the pseudo-loop technique able to do the equivalent of an "except" block without getting even uglier? Remember that the rewind will have to be called from many places, since any of the allocations can fail, and that the failure can also be in the middle of the the initial update of the control structures, so the rewind might be partial.

The traditional way of doing this in the Linux kernel is somewhat like the following:

int foo(...)
{
int err = 0;
/* do stuff */
err = bar(...);
if (err)
goto error;
/* do even more stuff */
err = baz(...);
if (err)
goto error;
/* do yet more stuff */
out:
/* free temporary memory and unlock mutexes */
return err;
error:
/* unwind state updates */
goto out;
}

The kernel developers are pragmatic. If a kernel developer feels that using a "pseudo-loop" results in cleaner code, they do use it. However, for error handling the usual design pattern is either a "goto error" (single error label, pointers initialized to NULL) or a "goto error_foo" (many error labels, each falling through to the next, releasing resources in reverse order). On success, either the unlock is duplicated (it's usually just a unlock, memory is rarely allocated just for the duration of one function), or the error handling does a "goto out" like in the example above.

Re: The problem is with cleanups which should never be executed, unless an error happens.

Posted Dec 29, 2014 17:35 UTC (Mon) by ldo (guest, #40946) [Link] (4 responses)

This is why cleanups should be idempotent--pass the cleanup routine a NULL argument, and it does nothing. So the steps become something like

Allocate the first thing needing cleanup; abort on failure
Allocate the second thing needing cleanup; abort on failure
If the second thing subsumes the first thing (so cleaning up the second thing automatically cleans up the first thing), then you can set the first thing to NULL at this point, to avoid a double cleanup.
And so on.

So when you get to the end, you simply execute all the relevant cleanups unconditionally, and the ones with NULL arguments turn into no-ops.

Re: The problem is with cleanups which should never be executed, unless an error happens.

Posted Dec 29, 2014 18:14 UTC (Mon) by cesarb (subscriber, #6266) [Link] (3 responses)

> If the second thing subsumes the first thing (so cleaning up the second thing automatically cleans up the first thing), then you can set the first thing to NULL at this point, to avoid a double cleanup.

You still are in the "cleanup" mentality. But the problem with filesystems is not "cleanup", it's "rollback". There's nothing to subsume the relevant code, and it subsumes nothing else; it's really something which executes only on failure.

> So when you get to the end, you simply execute all the relevant cleanups unconditionally, and the ones with NULL arguments turn into no-ops.

That doesn't help with the situation in question, where the memory allocation routines never return "failure". The relevant error-recovery code would always get passed NULL, and so the branch which is called when it receives a non-NULL never gets tested.

Sure, you *called* the code unconditionally, but it doesn't change the fact that it *executes* conditionally. Whether the branch point is outside or inside it is immaterial. For instance, it's well-known that the standard "free(void *ptr)" function does nothing when called with a NULL argument, but that's because it begins with a conditional branch: "if (!ptr) return;". If it were always called with a NULL pointer, the most complex part of its code would never be exercised.

Re: You still are in the "cleanup" mentality.

Posted Dec 30, 2014 20:34 UTC (Tue) by ldo (guest, #40946) [Link] (2 responses)

That’s right. You keep insisting that “cleanup” and “rollback” are entirely different, whereas they are really two aspects of the same thing, and can be treated as such.

> That doesn't help with the situation in question, where the memory
> allocation routines never return "failure".

Returning NULL from an allocation request is a failure.

> Sure, you *called* the code unconditionally, but it doesn't change
> the fact that it *executes* conditionally.

Do you know what “abstraction” means?

Re: You still are in the "cleanup" mentality.

Posted Dec 30, 2014 22:33 UTC (Tue) by cesarb (subscriber, #6266) [Link] (1 responses)

> > That doesn't help with the situation in question, where the memory allocation routines never return "failure".
> Returning NULL from an allocation request is a failure.

Please, reread the article this comment thread is attached to.

The whole issue is that, under certain conditions, the allocation requests were *never* returning NULL, even when they should!

> > Sure, you *called* the code unconditionally, but it doesn't change the fact that it *executes* conditionally.
> Do you know what “abstraction” means?

Please, reread the article this comment thread is attached to.

Abstraction or not, it doesn't change the fact that, since the allocation requests were *never* returning NULL, even when they should, the error handling code was *never* being executed. It doesn't matter whether it has been abstracted away or not, untested code is untested code.

Re: requests were never returning NULL, even when they should!

Posted Dec 31, 2014 21:00 UTC (Wed) by ldo (guest, #40946) [Link]

Precisely the point. And changing them to return NULL is fraught, because the error-handling paths in the callers are quite likely riddled with omissions where they should be dealing with this case. And finding and fixing those omissions is hard, because the error-handling paths are so complex.

Which is why I am advocating simpler error-handling paths, as in my example.

The Story So Far...

The Story So Far...

Re: The problem is with cleanups which should *never* be executed, unless an error happens.

Re: The problem is with cleanups which should *never* be executed, unless an error happens.

Re: You still are in the "cleanup" mentality.

Re: You still are in the "cleanup" mentality.

Re: requests were *never* returning NULL, even when they should!

Re: The problem is with cleanups which should never be executed, unless an error happens.

Re: The problem is with cleanups which should never be executed, unless an error happens.

Re: requests were never returning NULL, even when they should!