The author seems to be expressing opinions in a vacuum without actually seeing how Rust and RAII are used in my GPU driver, and how kernels manage memory in general (and Linux in particular).
Oh no, this is how you get software that take forever to close for no reason!
The deallocations per GPU job are O(1). There’s a fixed maximum number of objects that are involved (the actual number is variable depending on what features were enabled, the specific GPU model involved since some require more allocations under some conditions, etc.). There is no chance of RAII cleanup spiraling out of control.
(The deallocations are O(n) when shutting down a whole process that uses the GPU, but there’s no way around that, since you need to do things like unmap and clean up every GPU buffer object owned by the process, and this is no different regardless of what programming language you use. You can’t use arenas to solve this.)
More generally, RAII is a feature that exists in tension with the approach of operating on items in batches, which is an essential technique when writing performance-oriented software.
Kernels don’t operate in batches, almost everything in a kernel operates on single units. Sure, there are various performance optimizations around grouping operations together, but you will not find arena-based memory management in kernels. Go ahead, grep for arena in the Linux kernel. There are none, the only significant hits are nvdimm (where an “arena” is just an nvdimm concept for a region of memory) and BPF (where an “arena” is a shared memory construct between a BPF program and userspace).
The arena concept is great for things like games which have to do a ton of memory management every frame, which can just be blown away at the end of the frame. But it simply doesn’t work for, and isn’t used in, Linux, where there is no such concept as a “frame” that allows batching memory allocations like that. Kernels always do fine-grained resource management on behalf of userspace, there’s just no other way it can work.
Sure, the GPU kernel driver could batch actual GPU memory allocations together to reduce alloc/dealloc overhead, and that’s something I’m considering for a future performance optimization if profiling reveals it would be worth it, but it would not be in conflict with RAII. It would just mean that I compute the required allocation size for different kinds of GPU memory ahead of time for a job, allocate as one big block, and then hand out reference-counted suballocations to it (so RAII still works and the whole block is only cleaned up when all the sub-allocs are freed). It’s not clear a priori whether this would help that much, it depends on the performance of the allocator (which is very simple and based on drm_mm, and doesn’t do much besides assign and free ranges in GPU address space, in an already allocated/populated heap most of the time, so again I need to profile this before deciding this is at all worth it).
What you will find in C drivers is coarse lifetime management, where a fixed number of objects are allocated as one big struct, freed all at once. That isn’t coarser RAII though, the sub-objects (which half the time are or own pointers to their own allocations anyway, this is emphatically not a memory arena) still need constructors and destructors called where required, and often their lifetime is not fully tied 1:1 to the lifetime of the parent struct since some members might only be initialized when needed on-demand. There is no difference with Rust in the generated code for this model, and no performance difference (in particular, in kernel Rust we have a whole pin_init mechanism which allows initialization of objects to happen in-place exactly as happens in this C model). The main difference is that since in the C approach initialization and cleanup is not strictly delegated to the individual objects, and not handled automatically by the compiler, there are many ways this can go subtly wrong with uncommon orders of operations and error paths, causing lots of problems leading to memory unsafety, and all those just vanish in Rust because the compiler makes sure everything is initialized before use and cleaned up before being freed.
The author states:
I personally hope that the Linux kernel never adopts any RAII, as I already have to waste way too much time for other slow software to load.
But Linux is already chock full of RAII (and objects for that matter). It’s just RAII (and objects, and vtables) implemented manually in C, with no compiler guarantees that all the edge cases are handled properly. In fact, often the places where Linux offers alternatives to traditional open-coded C RAII (such as devm) exist not because RAII is bad or slow, but because proper RAII is so easy to get wrong in C that it’s safer to use coarser approaches (and those mechanisms are, therefore, not necessary and not used in kernel Rust).
Let me just say up front that I’m not arguing for either approach here…
It would just mean that I compute the required allocation size for different kinds of GPU memory ahead of time for a job, allocate as one big block, and then hand out reference-counted suballocations to it (so RAII still works and the whole block is only cleaned up when all the sub-allocs are freed). It’s not clear a priori whether this would help that much, it depends on the performance of the allocator (which is very simple and based on drm_mm, and doesn’t do much besides assign and free ranges in GPU address space, in an already allocated/populated heap most of the time, so again I need to profile this before deciding this is at all worth it).
It sounds like you are saying here that you would make a larger allocation and then also do per-sub-allocation accounting. That seems to assume that the cost of granular RAII is unavoidable but the linked article seems to specifically be asserting the opposite. It argued for making one larger allocation and specifically not doing per-sub-allocation accounting:
And it doesn’t end here: operating in batches by using memory arenas, for example, is also a way to reduce memory ownership complexity, since you are turning orchestration of N lifetimes into 1.
Profiling the management of N+1 lifetimes rather than N lifetimes (even if each N in N+1 is lighter weight) seems to be an ill substitute for profiling the management of 1 lifetime. Are you and the author talking past each other or am I missing something in one (or both) of your arguments?
That seems to assume that the cost of granular RAII is unavoidable but the linked article seems to specifically be asserting the opposite. It argued for making one larger allocation and specifically not doing per-sub-allocation accounting:
That could work, but if you don’t do per-sub-allocation accounting then everything has to fit within Rust’s lifetime/borrowing model, and if it doesn’t, you have to fall back to unsafe code (which is, of course, the Zig approach, since Zig doesn’t do safety). That is a major decision that would only be made if performance really warrants it, which is very unlikely for my use case, because…
Are you and the author talking past each other or am I missing something in one (or both) of your arguments?
The reason why you would group allocations is to reduce the cost of the allocation/deallocations themselves. Managing the lifetimes themselves is just trivial reference counting. What I’m describing (suballocation) is likely to be better performant than an arena allocator here. An arena allocator does O(n) allocations and O(1) deallocations. My approach would do O(1) allocation, O(1) deallocation, and O(n) refcount maintenance operations. If the allocation/deallocation is significantly expensive, this would be a clear win, and if it isn’t (relative to just some refcounting), then it’s highly unlikely there are significant wins to be realized here at all by micro-optimizing the resource allocations and other things are the performance bottleneck.
Alloc/dealloc micro-optimization with arenas matters in things like games which might allocate thousands of objects per frame. It doesn’t for a GPU driver that needs to allocate a dozen state objects per submission, when it also has to fill in kilobytes of data structures and communicate with the firmware and parse and validate commands from userspace and all sorts of other stuff. If the allocs get batched together and reduced to some increfs/decrefs to keep the resource management reasonable, then it’s pretty much guaranteed there are no more gains to be had there.
Incidentally, our userspace graphics driver does use arenas (it calls them pools), because at that layer you do get potentially thousands of draws for a single render pass and there it’s clearly the right thing to do for things things like allocating GPU command blocks, parameter blocks, etc.. Those thousands of draws get passed to the kernel as one render command, so by the time the kernel is involved that n factor is gone.
It’s also a lot less dangerous because, at worst, if the driver gets the pool lifetime wrong the GPU faults and fails the command. If the kernel gets an object lifetime wrong then the GPU firmware crashes and cannot be recovered without a whole system reboot, so the stakes are a lot higher. I need to have a very good reason to play fast and loose with any GPU object lifetimes in the kernel.
Thanks for taking the time to explain that! I see now why you’re a lot less concerned about the N figure, given the actual values of N. It seems like to achieve the gains proposed by the linked article you would need to share memory across a much larger execution boundary and invite a lot of concurrency and safety issues into the picture (which I agree would not be worth the tradeoff).
Not really, that’s basically just “variable-sized pages”. It’s basically just turning what was previously fixed-sized page allocation and management into variable-sized folio management. That’s quite similar to any traditional heap allocator that can handle different object sizes.
Of course the kernel does operate on variable-sized things (such as disk I/O in multiples of the block size instead of single blocks, or handling TCP or UDP packets in large sizes well above the physical MTU to hand off to hardware segmentation offload), but when people talk about “batches” in the context of arena allocators and RAII they’re usually talking about doing heterogeneous allocations and then freeing everything as a batch, not about just handling variable-sized data or grouping multiple data blocks together upfront for performance reasons.
More generally, RAII is a feature that exists in tension with the approach of operating on items in batches, which is an essential technique when writing performance-oriented software.
This is a falsehood some people in the intersection of the data oriented design and C crowd love to sell. RAII works fine with batches, it’s just the RAII object is the batch instead of the elements inside. Even if the individual elements have destructors, if you have an alternative implementation for batches C++ has all the tools to avoid the automatic destructor calls, just placement new into a char buffer and then you can run whatever batch logic instead. Don’t try bringing this up with any of them though or you’ll get an instant block.
I do highly performance sensitive latency work with tighter deadlines than gamedev and still use destructors for all the long lived object cleanup. For objects that are churned aggressively avoiding destructor calls isn’t any different than avoiding any other method call.
Agreed, this post is making some wild claims that don’t hold up in my experience. I’m writing a high-performance compiler in Rust, and most state exists as plain-old datatypes in re-usable memory arenas that are freed at the end of execution. RAII is not involved in the hot phase of the compiler. Neither are any smart pointers or linked lists.
I simply find the argument unconvincing. Visual Studio has performance problems related to destructors => RAII causes slow software?
“Exists in tension” seems accurate to me. Yes, you can do batches with RAII, but in practice RAII languages lead to ecosystems and conventions that make it difficult. The majority of Rust crates use standard library containers and provide no fine grained control over their allocation. You could imagine a Rust where allocators were always passed around, but RAII would still constrain things because batching to change deallocation patterns would require changing types. I think the flexibility (and pitfalls) of Zig’s comptime duck typing vs. Rust traits is sort of analogous to the situation with no RAII vs. RAII.
I think it’s the case that library interfaces tend not to hand control of allocations to the caller but I think that’s because there’s almost never pressure to do so. When I’ve wanted this I’ve just forked or submitted patches to allow me to do so and it’s been pretty trivial.
Similarly, most libraries that use a HashMap do not expose a way to pick the hash algorithm. This is a bummer because I expect the use of siphash to cause way more performance problems than deallocations. And so I just submit PRs.
Yes. I write Zig every day, and yet it feels like a big miss, and, idk, populist? “But don’t just take my word for it.” Feels like too much trying to do ‘convincing’ as opposed to elucidating something neat. (But I guess this is kind of the entire sphere it’s written in; what does the “Rust/Linux Drama” need? Clearly, another contender!)
It doesn’t, but without it I don’t really see the post offering anything other than contention for the sake of marketing.
I spend somewhere between 2 to 8 hours a day working on my own projects. (“2” on days I also do paid work, but that’s only two days a week.) Zig has been my language of choice for four or five years now; you can see a list on my GitHub profile. A lot of my recent work with it is private.
Thank you! I really like it, and I’m a little sad that Rust — which I still use often, maintain FOSS software in, and advocate for happily! — has narrowed the conversation around lower-level general-purpose programming languages in a direction where many now reject out of hand anything without language-enforced memory safety. It’s a really nice thing to have, and Rust is often a great choice, but I don’t love how dogmatic the discourse can be at the expense of new ideas and ways of thinking.
I very much agree. A Zig program written in a data-oriented programming style, where most objects are referenced using indices into large arrays (potentially associated to a generation number) should be mostly memory safe. But I haven’t written enough Zig to confirm this intuition.
I don’t remember the arguments against RAII much (has been a few years since) but that Zig doesn’t have RAII feels like an odd omission given the rest of the language design. It’s somewhat puzzling to me.
Hm, it’s pretty clear RAII goes against the design of Zig. It could be argued that it’d be a good tradeoff still, but it definitely goes against the grain.
Zig requires keyword for control flow. RAII would be a single instance where control jumps to a user defined function without this being spelled out explicitly.
Zig doesn’t have operator overloading, and, more generally, it doesn’t have any sort of customization points for type behavior. «compiler automatically calls __deinit__ function if available” would the the sole place where that sort of thing would be happening
Idiomatic Zig doesn’t use a global allocator, nor does it store per-collection allocators. Instead, allocators are passed down to specific methods that need them as an argument. So most deinits in Zig takes at least one argument, and that doesn’t work with RAII.
I was unaware that Zig discourages holding on to the allocator. I did not spend enough time with Zig but for instance if you have an ArrayList you can defer .deinit() and it will work just fine. So I was assuming that this pattern:
var list = ArrayList(i32).init(heap_allocator);
defer list.deinit();
Could be turned into something more implicit like
var list = @scoped(ArrayList(i32).init(heap_allocator));
I understand that “hidden control flow” is something that zig advertises itself against, but at the end of the day defer is already something that makes this slightly harder to understand. I do understand that this is something that the language opted against but it still feels odd to me that no real attempt was made (seemingly?) to avoid defer.
But it very much sounds like that this pattern is on the way out anyways.
Zig’s std.HashMap family stores a per-collection allocator inside the struct that is passed in exactly once through the init method. Idk how that can be considered non-idiomatic if it’s part of the standard library.
Zig is a pre 1.0 language. Code in stdlib is not necessary idiomatic both because there’s still idiom churn, and because it was not uniformly audited for code quality.
As someone who doesn’t use Zig or follow it closely, both the fact that that change is being made and the reason behind it are really interesting. Thanks for sharing it here
Even if the individual elements have destructors, if you have an alternative implementation for batches C++ has all the tools to avoid the automatic destructor calls, just placement new into a char buffer and then you can run whatever batch logic instead.
I’ve never used placement new, so I don’t know about that, so my question is, how do you do that? Take for instance a simple case where I need a destructor:
If I have a bunch of elements that are both constructed at the same time, then later destroyed at the same time, I can imagine having a dedicated Element_list class for this, but never having used placement new, I don’t know right now how I would batch the allocations and deallocations.
And what if my elements are constructed at different times, but then later destroyed at the same time? How could we make that work?
Don’t try bringing this up with any of them though or you’ll get an instant block.
I think I have an idea about their perspective. I’ve never done Rust, but I do have about 15 years of C++ experience. Not once in my career have I seen a placement new. Not in my own code, not in my colleagues’ code, not in any code I have ever looked at. I know it’s a thing when someone mentions it, but that’s about it. As far as I am concerned it’s just one of the many obscure corners of C++. Now imagine you go to someone like me, and tell them to “just placement new” like it’s a beginner technique everyone ought to have learned in their first year of C++.
I don’t expect this to go down very well, especially if you start calling out skill issues explicitly.
I’ve never done Rust, but I do have about 15 years of C++ experience. Not once in my career have I seen a placement new. Not in my own code, not in my colleagues’ code, not in any code I have ever looked at.
I’m a little bit surprised, because I’ve had the opposite experience. Systems programming in C++ uses placement new all of the time, because it’s the way that you integrate with custom allocators.
In C++, there are four steps to creating and destroying an object:
Allocate some memory for it.
Construct the object.
Destruct the object.
Deallocate the memory.
When you use the default new or delete operators, you’re doing two of these: first calling the global new, which returns a pointer to some memory (or throws an exception if allocation fails) and then calling the constructor, then calling the destructor. Both new and delete are simply operators that can be overloaded, so you can provide your own, either globally, globally for some overload, or per class.
Placement new has weird syntax, but is conceptually simple. When you do new SomeClass(...), you’re actually writing new ({arguments to new}) SomeClass({arguments to SomeClass's constructor}). You can overload new based on the types of the arguments passed to it. Placement new is a special variant that takes a void* and doesn’t do anything (it’s the identity function). When you do new (somePointer) SomeClass(Args...), where somePointer is an existing allocation, the placement new simply returns somePointer. It’s up to you to ensure that you have space here.
If you want to allocate memory with malloc in C++ and construct an object in it, you’d write something like this (not exactly like this, because this will leak memory if the constructor throws):
template<typename T, typename... Args>
T *create(Args... args)
{
void *memory = malloc(sizeof(T));
return new (memory) T(std::forward<Args>(args)...);
}
This separates the allocation and construction: you’re calling malloc to allocate the object and then calling placement new to call the constructor and change the type of the underlying memory to T.
Similarly, you can separate the destruction and deallocation like this (same exception-safety warning applies):
In your example, std::unique_ptr has a destructor that calls delete. This may be the global delete, or it may be some delete provided by Foo, Bar, or Baz.
If you’re doing placement new, you can still use std::unique_ptr, but you must pass a custom deleter. This can call the destructor but not reclaim the memory. For example, you could allocate space for all three of the objects in your ‘object’ with a single allocation and use a custom deleter that didn’t free the memory in std::unique_ptr.
Most of the standard collection types take an allocator as a template argument, which makes it possible to abstract over these things, in theory (in practice, the allocator APIs are not well designed).
LLVM does arena allocation by providing making some classes constructors private and exposing them with factory methods on the object that owns the memory. This does bump allocation and then does placement new. You just ‘leak’ the objects created this way, they’re collected when the parent object is destroyed.
I’ve done very little systems programming in C++. Almost all the C++ code I have worked with was application code, and even the “system” portion hardly did any system call. Also, most C++ programs I’ve worked with would have been better of using a garbage collected language, but that wasn’t my choice.
This may explain the differences in our experiences.
Yup, that’s a very different experience. Most C++ application code I’ve seen would be better in Java, Objective-C, C#, or one of a dozen other languages. It’s a good systems language, it’s a mediocre application language.
For use in a kernel, or writing a memory allocator, GC, or language runtime, C++ is pretty nice. It’s far better than C and I think the tradeoff relative to Rust is complicated. For writing applications, it’s just about usable but very rarely the best choice. Most of the time I use C++ in userspace, I use it because Sol3 lets me easily expose things to Lua.
I think it very much also depends on the subset of C++ you’re working with, at a former job I worked on a server application that might have worked in Java with some pains (interfacing with C libs quite a bit), and in (2020?) or later it should have probably be done in Rust but it was just slightly older that Rust had gained… traction or 1.0 release. It was (or still is, probably) written in the most high-level Java-like C++ I’ve ever seen due to extensive use of Qt and smart pointers. I’m not saying we never had segfaults or memory problems but not nearly as many as I would have expected.
But yeah, I think I’ve never even heard about this placement new thing (reading up now), but I’m also not calling myself a C++ programmer.
Placement new is half the story, you also need to be aware that you can invoke destructors explicitly.
A trivial example looks like
char foo_storage[sizeof(foo)];
foo *obj = new (&foo_storage[0]) foo();
obj->do_stuff();
obj->~foo(); //explicitly invoke the destructor
If you want to defer the construction of multiple foos but have a single allocation you can imagine char foos_storage[sizeof(foo)*10] and looping to call the destructors. Of course you can heap allocate the storage too.
However, you mostly don’t do this because if you looking for something that keeps a list of elements and uses placement new to batch allocation/deallocation that’s just std::vector<element>.
Likewise if I wanted to batch the allocation of Foo Bar and Baz in Element I probably would just make them normal members.
class Element
{
Foo foo;
Bar bar;
Baz baz;
};
Each element and its members is now a single allocation and you can stick a bunch of them in a vector for more batching.
If you want to defer the initialization of the members but not the allocation you can use std::optional to not need to deal with the nitty gritty of placement new and explicitly calling the destructor.
IME placement new comes up implementing containers and basically not much otherwise.
Note that since C++20+ you should rather use std::construct_at and std::destroy_at since these don’t require spelling the type and can be used inside constexpr contexts.
You likely use placement new every day indirectly without realizing it, it’s used by std::vector and other container implementations.
When you write new T(arg) two things happen, the memory is allocated and the constructor runs. All placement new does is let you skip the memory allocation and instead run the constructor on memory you provide. The syntax is a little weird new(pointer) T(arg). But that’s it! That will create a T at the address stored in pointer, and it will return a T* pointing to the same address (but it will be a T* whereas pointer was probably void* or char*). Without this technique, you can’t implement std::vector, because you need to be able to allocate room for an array of T without constructing the T right away since there’s a difference between size and capacity. Later to destroy the item you do the reverse, you call the destructor manually foo->~T(), then deallocate the memory. When you clear a vector it runs the destructors one by one but then gives the memory back all at once with a single free/delete. If you had a type that you wanted to be able to do a sort of batch destruction on (maybe the destructor does some work that you can SIMD’ify), you’d need to make your own function and call it with the array instead of the individual destructors, then free the memory as normal.
I’m not trying to call anybody out for having a skill issue, but I am calling out people who are saying it’s necessary to abandon the language to deal with one pattern without actually knowing what facilities the language provides.
There are different ways you could do it but one way would be to have a template that you specialize for arrays of T, where the default implementation does one by one destruction and the specialization does the batch version. You could also override regular operator delete to not have an implementation to force people to remember to use a special function.
More generally, RAII is a feature that exists in tension with the approach of operating on items in batches, which is an essential technique when writing performance-oriented software.
And it doesn’t end here: operating in batches by using memory arenas, for example, is also a way to reduce memory ownership complexity, since you are turning orchestration of N lifetimes into 1.
there’s absolutely nothing in a RAII design that prevents use of pool or whatever other allocation technique fits the domain - if anything, it definitely makes easier to try various allocations patterns & systems by moving them from point-of-use to type definition.
Also more important there really is no tension here. If you want things to clean up in batches you can do so.
Even more importantly even if you use RAII nothing stops you from leaking large allocation at the end and not cleaning them up if you want a fast shutdown.
The singular focus on RAII seems to be a distraction. This article is sorting pulling a substitution trick. Linux rejected C++ (true). C++ relies heavily on RAII (true). Therefore Linux rejected RAII (not quite true). Linus did say that languages should not hide memory allocations. However that was only one of many points against the language. He also ranted (more commonly and more voluminously) against exceptions, excessive abstraction, poor quality abstractions, and a lack of stability and portability. That is just on the negative side. He also said that C++ lacked any compelling features above and beyond C. The same is not true of Rust.
For what it’s worth, kernel Rust does not hide memory allocations either. Every single allocation has always been fallible in the kernel dialect (unlike in userspace Rust), and as of rust-next this cycle there are separate Rust primitives for each kernel allocator variant and full support for Linux kernel gfp_flags. For example, a kernel-rust KVec uses kmalloc as the allocator, and push looks like fn push(v: T, flags: Flags) -> Result<(), AllocError> where flags is the gfp_flags.
This is, of course, also possible in C++ (as long as you don’t use the STL, just like Rust for Linux doesn’t use std), but I’m guessing nobody really pushed this approach when C++ was being considered for the kernel. And of course, Rust still has more compelling benefits over C than C++ (like safety).
Oh no, this is how you get software that take forever to close for no reason!
OK, but this is prefaced by
Making sure to free tiny things makes perfect sense when your program is going to use a ton of memory, or is going to be super long-lived.
And a kernel is (almost by definition) long-lived. So we do actually need to track all the memory (and other GPU objects) in order to free it. And maybe arena allocation is a nice tool for that. But I don’t think it does much for GPU resources.
Regarding the other comments from Asahi:
Macros
I believe the idea is to use comptime instead. Which I understand to be more powerful than C’s macros.
Lifetimes
Perhaps this was supposed to be “lacks a borrow checker”? Lifetimes are rather superfluous without one.
Operator overloading
This is a feature. In zig, much like C, WYSIWYG (in terms of programmer-visible semantics).
I don’t think zig would be good fit for the kernel
It’s not stable (no 1.0 release yet) and the latest release depends on new versions of LLVM (that aren’t available in LTS distros). This is a problem for rust, but an even bigger problem for zig.
It is very opinionated about a lot of things. Rust has basically one big opinion: “the borrow checker is great and you should let it manage all of your memory.” So if you accept that, you can write your rust code however you want. But zig has a ton of opinions, even about little stuff like indentation or unused variables. I think it would probably generate just as much controversy as rust.
FWIW I did have some Zig folks come up with implementations of what I do with Rust procmacros using comptime, although it did take a few attempts along several discussion threads (because my use case is very particular). So Zig and Rust are indeed both suitable to solve that particular problem (in very different ways, each with some pros and cons, and both quite frankly hacky in their own way because the problem I’m solving is just weird). The other points still stand though.
More generally, RAII is a feature that exists in tension with the approach of operating on items in batches,
I’ve seen the “RAII leads to slow destruction” thing a lot, probably starting with comments about Jai. This just isn’t true. Or, at least, it’s radically overblown. I have written plenty of Rust code that puts short-lived allocations into an arena and frees them all (or just clears them, if it’s a pool) plenty of times.
I don’t think RAII is directly in tension, it just works on per-item basis by default and if you want to do batch alloc/free you can do so very easily. Rust makes this very easy, in part because of its RAII - like, genuinely, using an arena/ bump allocator is so nice in Rust because of RAII.
You use RAII in its “native” form for allocations that make sense, you use arenas if you need arenas, you use manual management if you need manual management, you use Rc if you need it, etc etc etc. Rust is perfectly happy to give you all of this.
The fact that Rust developers who are interfacing with the Linux project seem completely unaware of the downsides of RAII,
We’re not unaware at all, nothing about this is new, bump allocators and arena crates have been a thing since pre-2015.
Similarly, Linux is not “too poor” to afford RAII and it actually chose to keep out the style of programming that both Rust and C++ seem to love.
Linus’s rant about C++ doesn’t mention RAII. It’s hard to interpret this rant because it’s so non-technical, but I think Linus is criticizing the abstractions around OOP - things like inheritance, dynamic dispatch, etc. I don’t think he’s talking about RAII here, but it’s like reading tea leaves with these dumb rants. The irony is that Linux reinvents these in C, dynamic dispatch is literally everywhere, except the pointers are just raw pointers that the compiler can’t reason about for things like CFI or constification.
While I understand having differing opinions about features and ways of structuring code, I’m not suprised if there’s tension between Rust and C developers in the kernel, since the Linux project has already expressed in the past a preference for avoiding those constructs (and the style of development they are designed to support) by banning C++ from the codebase.
Have they? C++ is a lot more specific than RAII and destructors. If destructors were really the reason they banned C++, why did they start experimenting with Rust? Did they just not notice that Rust uses destructors?
Linus objected to C++ way back before the Itanium ABI was standardised, when:
GCC periodically changed the C++ ABI so modules compiled with different compiler versions would not link.
GCC didn’t care at all about C++ and so code generation was pretty bad.
This was long before C++11 made the language tolerable, but as far as I am aware the decision hasn’t really been revisited. The NT and XNU (macOS / iOS) kernels use a load of C++ without issues.
Apple actually did suffer from the ABI instability because they originally shipped gcc 2.95 and IOKit ended up depending on it. As I recall, they carried a load of patches to support it in clang until they dropped support for 32-bit PowerPC (newer architectures used the Itanium ABI). That would have been a problem for Linux (at least for downstream distros) and it was a good decision to avoid it.
Mind you, some Linux folks believe strange things about code generation. I had one tell me that the goto pattern that Linux uses for error handling generated better code than using nested if statements. This surprised me, since I would expect them to be equivalent. And, it turned out, when I refactored some of his code to use structured control flow, gcc 3.x (the latest at the time) generated an identical binary.
Linux kernel shouldn’t care about compilers ABI at all, because the model there is the same that, eg, Rust uses — everything is compiled from source, and, if you want to link in a separately compiled blob, you are on your own.
Linux kernel shouldn’t care about compilers ABI at all, because the model there is the same that, eg, Rust uses — everything is compiled from source, and, if you want to link in a separately compiled blob, you are on your own.
@david_chisnall is talking about a very different era :). That was back when compiling your own network card or sound card driver still wasn’t particularly uncommon and lots of things you needed lived out-of-tree. Requiring everything to be compiled from source like that would’ve meant distributions couldn’t have updated the compiler and the kernel separately, and then requiring everyone to recompile their own stuff aftwards, and that simply wasn’t practical in the computing landscape of that day. Even as late as 2007 linking in a separately compiled blob was a pretty common requirement, even at the desktop/workstation end. The amount of stuff we have in the mainline kernel today is unbelievable but it wasn’t always like that.
Frequent ABI changes were also one of the “soft” mechanisms by which the other problem was introduced – to put it mildly, g++ emitted really bad code up until the mid-’00s or so, and the further you got from what everyone was using the worse it got. Closer to the x86 end of the spectrum it was just kind of inefficient or weird, but on things like (oh god why) SuperH end it was just plain buggy, including truly hilarious things like barfing if identifier names in pragma statements had some specific lengths (multiple of 4, I think?).
Indeed. Compiling your own kernel often took multiple hours. Compiling GCC was an overnight job. You absolutely wanted to be able to compile a kernel module against the distro-provided kernel, but if you needed to version-match the g++ version that the distro used and the one that you used, that would be annoying. Especially given how buggy g++ was: you might need a newer one to compile your module correctly.
Even on x86, g++‘s code generation was not great. GCC developers didn’t really care about C++, especially not for performance. LLVM changed that. It was written in C++ and so improvements to optimising C++ code were directly visible to compiler writers. Once LLVM started to be better for C++ than GCC, the GCC folks focused more on C++ and now both are often better for idiomatic C++ than for the same in idiomatic C.
Interesting viewpoint, doesn’t match my experience at all - I think the last kernel I had to compile on a personal laptop or workstation was either the 90s or 2004ish, and the “had to” was already a bit of a stretch.
But I think I never had any “weird” hardware and only a single laptop (centrino, so probably mainline by 2005ish) until 2010, so I guess I was just lucky!
I’m of the opinion that it’s practically impossible for people to write correct code in systems that grow larger than a team of people (~10, probably even less than that) without automated checking. And those checks can happen at many stages of the software development lifecycle (SDLC).
RAII is one tool to help guarantee a dimension of correctness - that memory you allocate is also deallocated. I don’t use Rust, but I do appreciate the goal it frequently has to pull many forms of correctness checking earlier in the SDLC - so you get earlier feedback and can stay more in context for the piece you’re working on. Things like leak detection, fuzzing, and such are important, but they’re increasingly far away from your component’s context.
I struggle with any argument against such correctness enforcement, whether it’s RAII, memory-safety via Rust’s borrow-checker, or data race safety (Swift) that doesn’t include an practical alternative. “Linux kernel developers are really good at writing C” and “we trade correctness for speed of closing apps” don’t strike me as practical alternatives.
Rust actually lets you leak memory safely (though of course it’s more explicit than in C where you leak by just forgetting the free(). To leak in Rust you need to either do it explicitly with a leak or forget, or have a reference cycle). I think the more important points are 1) ensuring that memory is always initialized before it is used, and 2) ensuring that memory is only deallocated after its last use. Missing a dealloc (memory leak) is less bad than use-after-free (UB).
The article is quite weak in my opinion. Even with RAII, it’s possible to construct your objects in way that they do bulk operations inside and destroy the said bulk data. I guess you could somehow hand wave it in the direction that RAII promotes non-DOD code which your CPU might not like. On linux kernel, many resources are temporary and has to be tracked separately, sadly using arenas often is not a option. I do like zig though.
Imma gonna hide this, because I never like these threads - I glanced at the current comments and recalled again why I had to hide the “rust” tag, but, a saying I am fond of is:
“Never take down a fence, unless you know what it was there for.”
AKA: You’re probably not the first person to have though of something, and the people who came before you are probably not as stupid as you think they are.
The middle ground I’d like to see is something like linear types where the compiler just tells you if you ‘forgot’ the drop call, vs. implicitly inserting it (where C++ and Rust do the implicit insertion). There could still be some codegen / derive-macro-like thing to have the default drop function propagate to member fields (but still require the explicit call at the top-level). Also not doing implicit drop on re-assignment and needing an explicit call to overwrite, and so on.
You can kind of try to do undroppable types in Rust right now by eg. const panic’ing or referencing an undefined extern func from the Drop impl, but I found it to be finicky (it seems to generate the drop even if I only have borrow parameters to the type). I ended up exploring a simple linear + borrow checker for C: https://gist.github.com/nikki93/2bb11237bf76fceb0bf687d6d9eef1b3
Currently interested in doing some formal methods with separation logic or such…
The author seems to be expressing opinions in a vacuum without actually seeing how Rust and RAII are used in my GPU driver, and how kernels manage memory in general (and Linux in particular).
The deallocations per GPU job are O(1). There’s a fixed maximum number of objects that are involved (the actual number is variable depending on what features were enabled, the specific GPU model involved since some require more allocations under some conditions, etc.). There is no chance of RAII cleanup spiraling out of control.
(The deallocations are O(n) when shutting down a whole process that uses the GPU, but there’s no way around that, since you need to do things like unmap and clean up every GPU buffer object owned by the process, and this is no different regardless of what programming language you use. You can’t use arenas to solve this.)
Kernels don’t operate in batches, almost everything in a kernel operates on single units. Sure, there are various performance optimizations around grouping operations together, but you will not find arena-based memory management in kernels. Go ahead, grep for
arena
in the Linux kernel. There are none, the only significant hits are nvdimm (where an “arena” is just an nvdimm concept for a region of memory) and BPF (where an “arena” is a shared memory construct between a BPF program and userspace).The arena concept is great for things like games which have to do a ton of memory management every frame, which can just be blown away at the end of the frame. But it simply doesn’t work for, and isn’t used in, Linux, where there is no such concept as a “frame” that allows batching memory allocations like that. Kernels always do fine-grained resource management on behalf of userspace, there’s just no other way it can work.
Sure, the GPU kernel driver could batch actual GPU memory allocations together to reduce alloc/dealloc overhead, and that’s something I’m considering for a future performance optimization if profiling reveals it would be worth it, but it would not be in conflict with RAII. It would just mean that I compute the required allocation size for different kinds of GPU memory ahead of time for a job, allocate as one big block, and then hand out reference-counted suballocations to it (so RAII still works and the whole block is only cleaned up when all the sub-allocs are freed). It’s not clear a priori whether this would help that much, it depends on the performance of the allocator (which is very simple and based on
drm_mm
, and doesn’t do much besides assign and free ranges in GPU address space, in an already allocated/populated heap most of the time, so again I need to profile this before deciding this is at all worth it).What you will find in C drivers is coarse lifetime management, where a fixed number of objects are allocated as one big struct, freed all at once. That isn’t coarser RAII though, the sub-objects (which half the time are or own pointers to their own allocations anyway, this is emphatically not a memory arena) still need constructors and destructors called where required, and often their lifetime is not fully tied 1:1 to the lifetime of the parent struct since some members might only be initialized when needed on-demand. There is no difference with Rust in the generated code for this model, and no performance difference (in particular, in kernel Rust we have a whole
pin_init
mechanism which allows initialization of objects to happen in-place exactly as happens in this C model). The main difference is that since in the C approach initialization and cleanup is not strictly delegated to the individual objects, and not handled automatically by the compiler, there are many ways this can go subtly wrong with uncommon orders of operations and error paths, causing lots of problems leading to memory unsafety, and all those just vanish in Rust because the compiler makes sure everything is initialized before use and cleaned up before being freed.The author states:
But Linux is already chock full of RAII (and objects for that matter). It’s just RAII (and objects, and vtables) implemented manually in C, with no compiler guarantees that all the edge cases are handled properly. In fact, often the places where Linux offers alternatives to traditional open-coded C RAII (such as devm) exist not because RAII is bad or slow, but because proper RAII is so easy to get wrong in C that it’s safer to use coarser approaches (and those mechanisms are, therefore, not necessary and not used in kernel Rust).
Let me just say up front that I’m not arguing for either approach here…
It sounds like you are saying here that you would make a larger allocation and then also do per-sub-allocation accounting. That seems to assume that the cost of granular RAII is unavoidable but the linked article seems to specifically be asserting the opposite. It argued for making one larger allocation and specifically not doing per-sub-allocation accounting:
Profiling the management of N+1 lifetimes rather than N lifetimes (even if each N in N+1 is lighter weight) seems to be an ill substitute for profiling the management of 1 lifetime. Are you and the author talking past each other or am I missing something in one (or both) of your arguments?
That could work, but if you don’t do per-sub-allocation accounting then everything has to fit within Rust’s lifetime/borrowing model, and if it doesn’t, you have to fall back to unsafe code (which is, of course, the Zig approach, since Zig doesn’t do safety). That is a major decision that would only be made if performance really warrants it, which is very unlikely for my use case, because…
The reason why you would group allocations is to reduce the cost of the allocation/deallocations themselves. Managing the lifetimes themselves is just trivial reference counting. What I’m describing (suballocation) is likely to be better performant than an arena allocator here. An arena allocator does O(n) allocations and O(1) deallocations. My approach would do O(1) allocation, O(1) deallocation, and O(n) refcount maintenance operations. If the allocation/deallocation is significantly expensive, this would be a clear win, and if it isn’t (relative to just some refcounting), then it’s highly unlikely there are significant wins to be realized here at all by micro-optimizing the resource allocations and other things are the performance bottleneck.
Alloc/dealloc micro-optimization with arenas matters in things like games which might allocate thousands of objects per frame. It doesn’t for a GPU driver that needs to allocate a dozen state objects per submission, when it also has to fill in kilobytes of data structures and communicate with the firmware and parse and validate commands from userspace and all sorts of other stuff. If the allocs get batched together and reduced to some increfs/decrefs to keep the resource management reasonable, then it’s pretty much guaranteed there are no more gains to be had there.
Incidentally, our userspace graphics driver does use arenas (it calls them pools), because at that layer you do get potentially thousands of draws for a single render pass and there it’s clearly the right thing to do for things things like allocating GPU command blocks, parameter blocks, etc.. Those thousands of draws get passed to the kernel as one render command, so by the time the kernel is involved that n factor is gone.
It’s also a lot less dangerous because, at worst, if the driver gets the pool lifetime wrong the GPU faults and fails the command. If the kernel gets an object lifetime wrong then the GPU firmware crashes and cannot be recovered without a whole system reboot, so the stakes are a lot higher. I need to have a very good reason to play fast and loose with any GPU object lifetimes in the kernel.
Thanks for taking the time to explain that! I see now why you’re a lot less concerned about the N figure, given the actual values of N. It seems like to achieve the gains proposed by the linked article you would need to share memory across a much larger execution boundary and invite a lot of concurrency and safety issues into the picture (which I agree would not be worth the tradeoff).
Wouldn’t mm folios be an example of operating on things in batches?
Not really, that’s basically just “variable-sized pages”. It’s basically just turning what was previously fixed-sized page allocation and management into variable-sized folio management. That’s quite similar to any traditional heap allocator that can handle different object sizes.
Of course the kernel does operate on variable-sized things (such as disk I/O in multiples of the block size instead of single blocks, or handling TCP or UDP packets in large sizes well above the physical MTU to hand off to hardware segmentation offload), but when people talk about “batches” in the context of arena allocators and RAII they’re usually talking about doing heterogeneous allocations and then freeing everything as a batch, not about just handling variable-sized data or grouping multiple data blocks together upfront for performance reasons.
This is a falsehood some people in the intersection of the data oriented design and C crowd love to sell. RAII works fine with batches, it’s just the RAII object is the batch instead of the elements inside. Even if the individual elements have destructors, if you have an alternative implementation for batches C++ has all the tools to avoid the automatic destructor calls, just placement new into a char buffer and then you can run whatever batch logic instead. Don’t try bringing this up with any of them though or you’ll get an instant block.
I do highly performance sensitive latency work with tighter deadlines than gamedev and still use destructors for all the long lived object cleanup. For objects that are churned aggressively avoiding destructor calls isn’t any different than avoiding any other method call.
Agreed, this post is making some wild claims that don’t hold up in my experience. I’m writing a high-performance compiler in Rust, and most state exists as plain-old datatypes in re-usable memory arenas that are freed at the end of execution. RAII is not involved in the hot phase of the compiler. Neither are any smart pointers or linked lists.
I simply find the argument unconvincing. Visual Studio has performance problems related to destructors => RAII causes slow software?
Agreed. I like Zig and appreciate Loris’ work, but I don’t understand this argument as well.
“Exists in tension” seems accurate to me. Yes, you can do batches with RAII, but in practice RAII languages lead to ecosystems and conventions that make it difficult. The majority of Rust crates use standard library containers and provide no fine grained control over their allocation. You could imagine a Rust where allocators were always passed around, but RAII would still constrain things because batching to change deallocation patterns would require changing types. I think the flexibility (and pitfalls) of Zig’s comptime duck typing vs. Rust traits is sort of analogous to the situation with no RAII vs. RAII.
I think it’s the case that library interfaces tend not to hand control of allocations to the caller but I think that’s because there’s almost never pressure to do so. When I’ve wanted this I’ve just forked or submitted patches to allow me to do so and it’s been pretty trivial.
Similarly, most libraries that use a HashMap do not expose a way to pick the hash algorithm. This is a bummer because I expect the use of siphash to cause way more performance problems than deallocations. And so I just submit PRs.
Yes. I write Zig every day, and yet it feels like a big miss, and, idk, populist? “But don’t just take my word for it.” Feels like too much trying to do ‘convincing’ as opposed to elucidating something neat. (But I guess this is kind of the entire sphere it’s written in; what does the “Rust/Linux Drama” need? Clearly, another contender!)
To be fair, invalidating this specific argument against RAII does not invalidate the entire post.
You write Zig every day? What kind of program are you working on?
It doesn’t, but without it I don’t really see the post offering anything other than contention for the sake of marketing.
I spend somewhere between 2 to 8 hours a day working on my own projects. (“2” on days I also do paid work, but that’s only two days a week.) Zig has been my language of choice for four or five years now; you can see a list on my GitHub profile. A lot of my recent work with it is private.
Impressive commitment to Zig! Thanks for sharing.
Thank you! I really like it, and I’m a little sad that Rust — which I still use often, maintain FOSS software in, and advocate for happily! — has narrowed the conversation around lower-level general-purpose programming languages in a direction where many now reject out of hand anything without language-enforced memory safety. It’s a really nice thing to have, and Rust is often a great choice, but I don’t love how dogmatic the discourse can be at the expense of new ideas and ways of thinking.
I very much agree. A Zig program written in a data-oriented programming style, where most objects are referenced using indices into large arrays (potentially associated to a generation number) should be mostly memory safe. But I haven’t written enough Zig to confirm this intuition.
I don’t remember the arguments against RAII much (has been a few years since) but that Zig doesn’t have RAII feels like an odd omission given the rest of the language design. It’s somewhat puzzling to me.
Hm, it’s pretty clear RAII goes against the design of Zig. It could be argued that it’d be a good tradeoff still, but it definitely goes against the grain.
__deinit__
function if available” would the the sole place where that sort of thing would be happeningdefer fits Zig very well, RAII not at all.
I was unaware that Zig discourages holding on to the allocator. I did not spend enough time with Zig but for instance if you have an
ArrayList
you can defer.deinit()
and it will work just fine. So I was assuming that this pattern:Could be turned into something more implicit like
I understand that “hidden control flow” is something that zig advertises itself against, but at the end of the day
defer
is already something that makes this slightly harder to understand. I do understand that this is something that the language opted against but it still feels odd to me that no real attempt was made (seemingly?) to avoiddefer
.But it very much sounds like that this pattern is on the way out anyways.
Zig’s
std.HashMap
family stores a per-collection allocator inside the struct that is passed in exactly once through theinit
method. Idk how that can be considered non-idiomatic if it’s part of the standard library.It is getting removed! https://github.com/ziglang/zig/pull/22087
Zig is a pre 1.0 language. Code in stdlib is not necessary idiomatic both because there’s still idiom churn, and because it was not uniformly audited for code quality.
As someone who doesn’t use Zig or follow it closely, both the fact that that change is being made and the reason behind it are really interesting. Thanks for sharing it here
You might also like https://matklad.github.io/2020/12/28/csdi.html then, as a generalization of what’s happening with Zig collections.
That’s an interesting development. Thanks for informing me!
It would completely change the design of the language and its approach to memory and resource management.
I’ve never used placement
new
, so I don’t know about that, so my question is, how do you do that? Take for instance a simple case where I need a destructor:If I have a bunch of elements that are both constructed at the same time, then later destroyed at the same time, I can imagine having a dedicated
Element_list
class for this, but never having used placementnew
, I don’t know right now how I would batch the allocations and deallocations.And what if my elements are constructed at different times, but then later destroyed at the same time? How could we make that work?
I think I have an idea about their perspective. I’ve never done Rust, but I do have about 15 years of C++ experience. Not once in my career have I seen a placement
new
. Not in my own code, not in my colleagues’ code, not in any code I have ever looked at. I know it’s a thing when someone mentions it, but that’s about it. As far as I am concerned it’s just one of the many obscure corners of C++. Now imagine you go to someone like me, and tell them to “just placement new” like it’s a beginner technique everyone ought to have learned in their first year of C++.I don’t expect this to go down very well, especially if you start calling out skill issues explicitly.
I’m a little bit surprised, because I’ve had the opposite experience. Systems programming in C++ uses placement new all of the time, because it’s the way that you integrate with custom allocators.
In C++, there are four steps to creating and destroying an object:
When you use the default
new
ordelete
operators, you’re doing two of these: first calling the globalnew
, which returns a pointer to some memory (or throws an exception if allocation fails) and then calling the constructor, then calling the destructor. Bothnew
anddelete
are simply operators that can be overloaded, so you can provide your own, either globally, globally for some overload, or per class.Placement new has weird syntax, but is conceptually simple. When you do
new SomeClass(...)
, you’re actually writingnew ({arguments to new}) SomeClass({arguments to SomeClass's constructor})
. You can overloadnew
based on the types of the arguments passed to it. Placement new is a special variant that takes avoid*
and doesn’t do anything (it’s the identity function). When you donew (somePointer) SomeClass(Args...)
, wheresomePointer
is an existing allocation, the placement new simply returnssomePointer
. It’s up to you to ensure that you have space here.If you want to allocate memory with
malloc
in C++ and construct an object in it, you’d write something like this (not exactly like this, because this will leak memory if the constructor throws):This separates the allocation and construction: you’re calling
malloc
to allocate the object and then calling placement new to call the constructor and change the type of the underlying memory toT
.Similarly, you can separate the destruction and deallocation like this (same exception-safety warning applies):
In your example,
std::unique_ptr
has a destructor that callsdelete
. This may be the global delete, or it may be somedelete
provided byFoo
,Bar
, orBaz
.If you’re doing placement new, you can still use
std::unique_ptr
, but you must pass a custom deleter. This can call the destructor but not reclaim the memory. For example, you could allocate space for all three of the objects in your ‘object’ with a single allocation and use a custom deleter that didn’t free the memory instd::unique_ptr
.Most of the standard collection types take an allocator as a template argument, which makes it possible to abstract over these things, in theory (in practice, the allocator APIs are not well designed).
LLVM does arena allocation by providing making some classes constructors private and exposing them with factory methods on the object that owns the memory. This does bump allocation and then does placement new. You just ‘leak’ the objects created this way, they’re collected when the parent object is destroyed.
Thanks for the explanation, that helps a ton.
I’ve done very little systems programming in C++. Almost all the C++ code I have worked with was application code, and even the “system” portion hardly did any system call. Also, most C++ programs I’ve worked with would have been better of using a garbage collected language, but that wasn’t my choice.
This may explain the differences in our experiences.
Yup, that’s a very different experience. Most C++ application code I’ve seen would be better in Java, Objective-C, C#, or one of a dozen other languages. It’s a good systems language, it’s a mediocre application language.
For use in a kernel, or writing a memory allocator, GC, or language runtime, C++ is pretty nice. It’s far better than C and I think the tradeoff relative to Rust is complicated. For writing applications, it’s just about usable but very rarely the best choice. Most of the time I use C++ in userspace, I use it because Sol3 lets me easily expose things to Lua.
I think it very much also depends on the subset of C++ you’re working with, at a former job I worked on a server application that might have worked in Java with some pains (interfacing with C libs quite a bit), and in (2020?) or later it should have probably be done in Rust but it was just slightly older that Rust had gained… traction or 1.0 release. It was (or still is, probably) written in the most high-level Java-like C++ I’ve ever seen due to extensive use of Qt and smart pointers. I’m not saying we never had segfaults or memory problems but not nearly as many as I would have expected. But yeah, I think I’ve never even heard about this placement new thing (reading up now), but I’m also not calling myself a C++ programmer.
Placement new is half the story, you also need to be aware that you can invoke destructors explicitly.
A trivial example looks like
If you want to defer the construction of multiple foos but have a single allocation you can imagine
char foos_storage[sizeof(foo)*10]
and looping to call the destructors. Of course you can heap allocate the storage too.However, you mostly don’t do this because if you looking for something that keeps a list of elements and uses placement new to batch allocation/deallocation that’s just
std::vector<element>
.Likewise if I wanted to batch the allocation of Foo Bar and Baz in Element I probably would just make them normal members.
Each element and its members is now a single allocation and you can stick a bunch of them in a vector for more batching.
If you want to defer the initialization of the members but not the allocation you can use
std::optional
to not need to deal with the nitty gritty of placement new and explicitly calling the destructor.IME placement new comes up implementing containers and basically not much otherwise.
Note that since C++20+ you should rather use std::construct_at and std::destroy_at since these don’t require spelling the type and can be used inside constexpr contexts.
You likely use placement new every day indirectly without realizing it, it’s used by std::vector and other container implementations.
When you write
new T(arg)
two things happen, the memory is allocated and the constructor runs. All placement new does is let you skip the memory allocation and instead run the constructor on memory you provide. The syntax is a little weirdnew(pointer) T(arg)
. But that’s it! That will create aT
at the address stored inpointer
, and it will return aT*
pointing to the same address (but it will be aT*
whereaspointer
was probablyvoid*
orchar*
). Without this technique, you can’t implement std::vector, because you need to be able to allocate room for an array of T without constructing the T right away since there’s a difference between size and capacity. Later to destroy the item you do the reverse, you call the destructor manuallyfoo->~T()
, then deallocate the memory. When you clear a vector it runs the destructors one by one but then gives the memory back all at once with a single free/delete. If you had a type that you wanted to be able to do a sort of batch destruction on (maybe the destructor does some work that you can SIMD’ify), you’d need to make your own function and call it with the array instead of the individual destructors, then free the memory as normal.I’m not trying to call anybody out for having a skill issue, but I am calling out people who are saying it’s necessary to abandon the language to deal with one pattern without actually knowing what facilities the language provides.
What would this look like in practice? How do you avoid shooting yourself in the foot due to a custom destructor? Is there a known pattern here?
There are different ways you could do it but one way would be to have a template that you specialize for arrays of T, where the default implementation does one by one destruction and the specialization does the batch version. You could also override regular operator delete to not have an implementation to force people to remember to use a special function.
there’s absolutely nothing in a RAII design that prevents use of pool or whatever other allocation technique fits the domain - if anything, it definitely makes easier to try various allocations patterns & systems by moving them from point-of-use to type definition.
Also more important there really is no tension here. If you want things to clean up in batches you can do so.
Even more importantly even if you use RAII nothing stops you from leaking large allocation at the end and not cleaning them up if you want a fast shutdown.
The singular focus on RAII seems to be a distraction. This article is sorting pulling a substitution trick. Linux rejected C++ (true). C++ relies heavily on RAII (true). Therefore Linux rejected RAII (not quite true). Linus did say that languages should not hide memory allocations. However that was only one of many points against the language. He also ranted (more commonly and more voluminously) against exceptions, excessive abstraction, poor quality abstractions, and a lack of stability and portability. That is just on the negative side. He also said that C++ lacked any compelling features above and beyond C. The same is not true of Rust.
For what it’s worth, kernel Rust does not hide memory allocations either. Every single allocation has always been fallible in the kernel dialect (unlike in userspace Rust), and as of
rust-next
this cycle there are separate Rust primitives for each kernel allocator variant and full support for Linux kernelgfp_flags
. For example, a kernel-rustKVec
uses kmalloc as the allocator, and push looks likefn push(v: T, flags: Flags) -> Result<(), AllocError>
whereflags
is the gfp_flags.This is, of course, also possible in C++ (as long as you don’t use the STL, just like Rust for Linux doesn’t use
std
), but I’m guessing nobody really pushed this approach when C++ was being considered for the kernel. And of course, Rust still has more compelling benefits over C than C++ (like safety).OK, but this is prefaced by
And a kernel is (almost by definition) long-lived. So we do actually need to track all the memory (and other GPU objects) in order to free it. And maybe arena allocation is a nice tool for that. But I don’t think it does much for GPU resources.
Regarding the other comments from Asahi:
I believe the idea is to use comptime instead. Which I understand to be more powerful than C’s macros.
Perhaps this was supposed to be “lacks a borrow checker”? Lifetimes are rather superfluous without one.
This is a feature. In zig, much like C, WYSIWYG (in terms of programmer-visible semantics).
I don’t think zig would be good fit for the kernel
FWIW I did have some Zig folks come up with implementations of what I do with Rust procmacros using comptime, although it did take a few attempts along several discussion threads (because my use case is very particular). So Zig and Rust are indeed both suitable to solve that particular problem (in very different ways, each with some pros and cons, and both quite frankly hacky in their own way because the problem I’m solving is just weird). The other points still stand though.
I’ve seen the “RAII leads to slow destruction” thing a lot, probably starting with comments about Jai. This just isn’t true. Or, at least, it’s radically overblown. I have written plenty of Rust code that puts short-lived allocations into an arena and frees them all (or just clears them, if it’s a pool) plenty of times.
I don’t think RAII is directly in tension, it just works on per-item basis by default and if you want to do batch alloc/free you can do so very easily. Rust makes this very easy, in part because of its RAII - like, genuinely, using an arena/ bump allocator is so nice in Rust because of RAII.
You use RAII in its “native” form for allocations that make sense, you use arenas if you need arenas, you use manual management if you need manual management, you use Rc if you need it, etc etc etc. Rust is perfectly happy to give you all of this.
We’re not unaware at all, nothing about this is new, bump allocators and arena crates have been a thing since pre-2015.
Linus’s rant about C++ doesn’t mention RAII. It’s hard to interpret this rant because it’s so non-technical, but I think Linus is criticizing the abstractions around OOP - things like inheritance, dynamic dispatch, etc. I don’t think he’s talking about RAII here, but it’s like reading tea leaves with these dumb rants. The irony is that Linux reinvents these in C, dynamic dispatch is literally everywhere, except the pointers are just raw pointers that the compiler can’t reason about for things like CFI or constification.
Yeah, I mentally equate “problems with closing the program” with the C++ SIOF, where crashes are much more common than being slow, rather than RAII.
Rust solves the SIOF by requiring all statics be constexpr-constructed and just not calling their Drop.
Have they? C++ is a lot more specific than RAII and destructors. If destructors were really the reason they banned C++, why did they start experimenting with Rust? Did they just not notice that Rust uses destructors?
Linus objected to C++ way back before the Itanium ABI was standardised, when:
This was long before C++11 made the language tolerable, but as far as I am aware the decision hasn’t really been revisited. The NT and XNU (macOS / iOS) kernels use a load of C++ without issues.
Apple actually did suffer from the ABI instability because they originally shipped gcc 2.95 and IOKit ended up depending on it. As I recall, they carried a load of patches to support it in clang until they dropped support for 32-bit PowerPC (newer architectures used the Itanium ABI). That would have been a problem for Linux (at least for downstream distros) and it was a good decision to avoid it.
Mind you, some Linux folks believe strange things about code generation. I had one tell me that the goto pattern that Linux uses for error handling generated better code than using nested if statements. This surprised me, since I would expect them to be equivalent. And, it turned out, when I refactored some of his code to use structured control flow, gcc 3.x (the latest at the time) generated an identical binary.
I don’t understand how compilers ABI is relevant here:
https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst
Linux kernel shouldn’t care about compilers ABI at all, because the model there is the same that, eg, Rust uses — everything is compiled from source, and, if you want to link in a separately compiled blob, you are on your own.
@david_chisnall is talking about a very different era :). That was back when compiling your own network card or sound card driver still wasn’t particularly uncommon and lots of things you needed lived out-of-tree. Requiring everything to be compiled from source like that would’ve meant distributions couldn’t have updated the compiler and the kernel separately, and then requiring everyone to recompile their own stuff aftwards, and that simply wasn’t practical in the computing landscape of that day. Even as late as 2007 linking in a separately compiled blob was a pretty common requirement, even at the desktop/workstation end. The amount of stuff we have in the mainline kernel today is unbelievable but it wasn’t always like that.
Frequent ABI changes were also one of the “soft” mechanisms by which the other problem was introduced – to put it mildly, g++ emitted really bad code up until the mid-’00s or so, and the further you got from what everyone was using the worse it got. Closer to the x86 end of the spectrum it was just kind of inefficient or weird, but on things like (oh god why) SuperH end it was just plain buggy, including truly hilarious things like barfing if identifier names in
pragma
statements had some specific lengths (multiple of 4, I think?).Indeed. Compiling your own kernel often took multiple hours. Compiling GCC was an overnight job. You absolutely wanted to be able to compile a kernel module against the distro-provided kernel, but if you needed to version-match the g++ version that the distro used and the one that you used, that would be annoying. Especially given how buggy g++ was: you might need a newer one to compile your module correctly.
Even on x86, g++‘s code generation was not great. GCC developers didn’t really care about C++, especially not for performance. LLVM changed that. It was written in C++ and so improvements to optimising C++ code were directly visible to compiler writers. Once LLVM started to be better for C++ than GCC, the GCC folks focused more on C++ and now both are often better for idiomatic C++ than for the same in idiomatic C.
Interesting viewpoint, doesn’t match my experience at all - I think the last kernel I had to compile on a personal laptop or workstation was either the 90s or 2004ish, and the “had to” was already a bit of a stretch.
But I think I never had any “weird” hardware and only a single laptop (centrino, so probably mainline by 2005ish) until 2010, so I guess I was just lucky!
Thanks for extra context!
I vaguely remember an attempt to resurrect the topic earlier this year, but I don’t think it went anywhere: https://lore.kernel.org/lkml/[email protected]/
I’m of the opinion that it’s practically impossible for people to write correct code in systems that grow larger than a team of people (~10, probably even less than that) without automated checking. And those checks can happen at many stages of the software development lifecycle (SDLC).
RAII is one tool to help guarantee a dimension of correctness - that memory you allocate is also deallocated. I don’t use Rust, but I do appreciate the goal it frequently has to pull many forms of correctness checking earlier in the SDLC - so you get earlier feedback and can stay more in context for the piece you’re working on. Things like leak detection, fuzzing, and such are important, but they’re increasingly far away from your component’s context.
I struggle with any argument against such correctness enforcement, whether it’s RAII, memory-safety via Rust’s borrow-checker, or data race safety (Swift) that doesn’t include an practical alternative. “Linux kernel developers are really good at writing C” and “we trade correctness for speed of closing apps” don’t strike me as practical alternatives.
Rust actually lets you leak memory safely (though of course it’s more explicit than in C where you leak by just forgetting the
free()
. To leak in Rust you need to either do it explicitly with aleak
orforget
, or have a reference cycle). I think the more important points are 1) ensuring that memory is always initialized before it is used, and 2) ensuring that memory is only deallocated after its last use. Missing a dealloc (memory leak) is less bad than use-after-free (UB).The article is quite weak in my opinion. Even with RAII, it’s possible to construct your objects in way that they do bulk operations inside and destroy the said bulk data. I guess you could somehow hand wave it in the direction that RAII promotes non-DOD code which your CPU might not like. On linux kernel, many resources are temporary and has to be tracked separately, sadly using arenas often is not a option. I do like zig though.
What I miss in this discussion are these points of view: bootstrapping, dependencies and overall complexity.
Imma gonna hide this, because I never like these threads - I glanced at the current comments and recalled again why I had to hide the “rust” tag, but, a saying I am fond of is:
“Never take down a fence, unless you know what it was there for.”
AKA: You’re probably not the first person to have though of something, and the people who came before you are probably not as stupid as you think they are.
It has served me well.
I like that outlook too. It’s known as Chesterton’s fence and I’m not shy about bringing it up in code reviews.
The middle ground I’d like to see is something like linear types where the compiler just tells you if you ‘forgot’ the drop call, vs. implicitly inserting it (where C++ and Rust do the implicit insertion). There could still be some codegen / derive-macro-like thing to have the default drop function propagate to member fields (but still require the explicit call at the top-level). Also not doing implicit drop on re-assignment and needing an explicit call to overwrite, and so on.
You can kind of try to do undroppable types in Rust right now by eg. const panic’ing or referencing an undefined extern func from the Drop impl, but I found it to be finicky (it seems to generate the drop even if I only have borrow parameters to the type). I ended up exploring a simple linear + borrow checker for C: https://gist.github.com/nikki93/2bb11237bf76fceb0bf687d6d9eef1b3
Currently interested in doing some formal methods with separation logic or such…
You may find the Vale language’s take on RAII interesting? https://verdagon.dev/blog/higher-raii-7drl