Re: Finally, unuse_pte does the real work, and it's where an error can happen.

Posted Dec 31, 2014 22:56 UTC (Wed) by cesarb (subscriber, #6266)
In reply to: Re: Finally, unuse_pte does the real work, and it's where an error can happen. by ldo
Parent article: The "too small to fail" memory-allocation rule

> Speaking of which, I notice there is no error checking on this call to pte_offset_map_lock. Can that never fail?

Looking at how it's implemented, that call does two things: it temporarily maps the page table, in a way which won't fail (in some architectures, it's a simple arithmetic operation, and in others, it uses a mechanism which has a number of slots reserved for temporary mappings), and it locks a spinlock. If the spinlock is unlocked, it can't fail; if the spinlock is locked, it will wait until its current owner unlocks it, so again it can't fail.

> And what happens if unuse_pte returns an error, anyway? Do the outer routines abort, and leave their work half-done?

The answer here is yes!

This code is ultimately called from within the swapoff system call (further down in the same file). There's in fact another outermost loop, try_to_unuse, which loops over the swapfile's pages and tries to unuse (yeah...) each one in turn. Here is where it's called:

1872 set_current_oom_origin();
1873 err = try_to_unuse(p->type, false, 0); /* force unuse all pages */
1874 clear_current_oom_origin();
1875
1876 if (err) {
1877 /* re-insert swap space back into swap_list */
1878 reinsert_swap_info(p);
1879 goto out_dput;
1880 }

Just before this fragment of code, the swapfile (or swap partition) is marked as disabled on the list of swapfiles. The try_to_unuse function then tries to move all the pages which currently reside into that swapfile back into the main memory, and make all page tables which pointed to these pages on the swap point to them in main memory, so the swapfile can be safely removed.

If try_to_unuse fails (usually because there's not enough memory to hold what's currently on the swapfile to be removed), this code enables the swapfile again (this part of the code used to be almost a duplicate of the corresponding part of the swapon code; I refactored it into a separate function used by both). It doesn't try to swap out again the pages it swapped in; if there's a need to free some of the main memory, the normal memory management code will swap them out again.

If try_to_unuse succeeds, on the other hand, the swapfile is now empty; the code after the fragment of code I pasted above releases all the resources which were allocated by the swapon system call for this swapfile, and returns success to the userspace.

Re:The answer here is yes!

Posted Jan 1, 2015 21:44 UTC (Thu) by ldo (guest, #40946) [Link] (7 responses)

In that case, this code is not very interesting. The interesting case would be the construction of a complex data structure, piece by piece, where each individual piece construction could fail. If any failures occur, then all the partially-constructed pieces so far need to be freed before returning an error indication to the caller. Only if all the construction steps succeed can the complete object be returned to the caller.

In my opinion, this would be just about the ultimate stress test of your error-recovery technique.

Search through my example for the comment “so I don't dispose of it yet” to see how I deal with this. You should find three instances.

Re:The answer here is yes!

Posted Jan 1, 2015 22:51 UTC (Thu) by reubenhwk (guest, #75803) [Link]

It seems like the vars around /* so I don't dispose of it yet */ are used as local variable where you're building/using something, and at a point (where the assignment of result is) is where you're ready to call this data structure complete and ready to return. I do something similar. When I can I try to ensure that a function either completely fails or completely succeeds with no in-between.

Re:The answer here is yes!

Posted Jan 1, 2015 23:12 UTC (Thu) by nix (subscriber, #2304) [Link]

There have been multiple worked examples of exactly this shown to you already.

Re:The answer here is yes!

Posted Jan 2, 2015 1:56 UTC (Fri) by cesarb (subscriber, #6266) [Link]

> In that case, this code is not very interesting. The interesting case would be the construction of a complex data structure, piece by piece, where each individual piece construction could fail.

Well, swapoff is the destruction of a complex data structure, not its construction. Its construction is the swapon system call, further down in the same file.

The same design pattern can be found all over the Linux kernel: there's a construction function, which constructs whatever complex data structure the module needs, and a destruction function, which releases it.

> If any failures occur, then all the partially-constructed pieces so far need to be freed before returning an error indication to the caller. Only if all the construction steps succeed can the complete object be returned to the caller.

In the swapon system call, there's a single error handling label "bad_swap" which frees all the partially-constructed data structures, undoes block device configuration, closes the opened file, and so on. It falls through to the "out" label, used for both the success and failure cases, which releases any temporarily used resources.

> Search through my example for the comment “so I don't dispose of it yet” to see how I deal with this. You should find three instances.

I see. You have two variables for the return value of the function: one which has its reference counter decremented at the end, and one which is actually returned from the function. You keep the allocated structure at the first one, and when everything's ready, you swap it with the second one. So in the failure case, the reference counter is decremented and it returns NULL; in the success case, the reference counter is not decremented and it returns the pointer to the structure.

It's an elegant way of doing it (though I'm annoyed at your inconsistency: twice you called it "result" and once you called it "Result"). It's also orthogonal to the use of goto versus pseudo-loop for cleanup: this trick can help simplify the code in both cases.

What you did is actually a design pattern in modern C++:

std::unique_ptr<foo> fn(/*args*/)
{
auto ret = std::make_unique<foo>(/*args*/);
/* ...code which can throw an exception... */
return ret;
}

Here, "ret" is a std::unique_ptr<foo>, which contains a pointer to a "foo". When this variable gets out of scope, which will happen if an exception is thrown, whatever is pointed to by "ret" will be deleted. When it reaches the "return ret", however, the pointer is moved (as in std::move) to the return value of the function, so when it leaves the scope, "ret" is pointing to NULL, and the returned object isn't deleted.

Re:The answer here is yes!

Posted May 23, 2017 13:21 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (3 responses)

What, you don’t initialise them to NUL and just free them afterwards?

struct foo *foo = calloc(1, sizeof(foo));

if (!(foo->a = geta()))
goto out;
if (!(foo->b = getb()))
goto out;
if (!(foo->c = getc()))
goto out;
if (!(foo->d = getd()))
goto out;
return (foo);

out:
free(foo->d);
free(foo->c);
free(foo->b);
free(foo->a);
return (NULL); /* error */

Re:The answer here is yes!

Posted May 23, 2017 14:08 UTC (Tue) by anselm (subscriber, #2796) [Link] (2 responses)

This sort of thing may work most of the time, but just for the record, while calloc() fills the whole structure in question with all-bits-zero bytes, there is no guarantee in the C standard that an individual structure entry like foo->a will in fact turn out to be a valid null pointer afterwards. (The C language does not require the null pointer to be “all bits zero”, even though expressions like “!(foo->a = geta())” must still return 1, as in “true”, if the geta() call yields a null pointer.)

If you're unlucky this means that if, say, you error out when trying to allocate foo->b, the “free(foo->d);” at the beginning of the out: path might try to free something at the all-bits-zero-address-that-isn't-a-null-pointer that hasn't previously be allocated, which isn't allowed. This shortcut looks enticingly convenient and probably works on most platforms today but people who are interested in safe, portable, and standard-conforming C code shouldn't be using it.

Re:The answer here is yes!

Posted May 26, 2017 9:55 UTC (Fri) by mirabilos (subscriber, #84359) [Link] (1 responses)

Sure, but that can be implementation-defined, and POSIX does do this (in the next version):

http://austingroupbugs.net/view.php?id=940#c2696

Re:The answer here is yes!

Posted May 26, 2017 23:06 UTC (Fri) by nix (subscriber, #2304) [Link]

Note the freedom which still exists: there can be *other* bit patterns that also represent the null pointer. :)