I am not really a fan of any of the standard C string functions. I tend to prefer string builder APIs these days, which manage the buffer for you and provide routines to append chars or other strings, etc.
I’ve worked through a few designs, but ilstr is my current favourite; see example usage in the boot banner code or here. I think the most novel thing about it is that the object has a sticky error state, so you can perform as many operations as you like without needing to check return values until the very end. Once you hit an error, subsequent operations are ignored, so you don’t end up with a result that is missing only some operation in the middle.
I think the most novel thing about it is that the object has a sticky error state, so you can perform as many operations as you like without needing to check return values until the very end
ICU uses a pattern with this property, where each call takes a pointer to an error object and has a (statically hinted false) check and early return if the value is not zero. You use it like this:
int error = 0;
do_a_thing(args, &error);
do_another_thing(args, &error);
do_a_third_thing(args, &error);
if (error != 0)
{
// Handle failure.
}
I think the most novel thing about it is that the object has a sticky error state, so you can perform as many operations as you like without needing to check return values until the very end.
This does sound pretty cool, but I wonder how often an error state is hit and if you’d rather get the error right away?
I’d also be curious how you feel about antirez sds. I don’t write much C code anymore, but that was my string builder API of choice when I did have to work with strings in C. Granted it doesn’t have any sticky error state (although maybe easy to add?).
I just looked at sds and I’m not a fan. I think hiding the metadata in front of the string buffer instead of using a new string buffer object is going to be a source of accidents. I also think the realloc-shaped API (always taking the pointer and returning a potentially but not always new pointer) will be a result of bugs as well.
As far as checking the error state after each call: my earlier design, custr, did that, and it was in practice extremely tedious. If you want to get the error state after any particular operation you can always call ilstr_errno() to check.
I also missed that it was in POSIX, but the equivalent functionality has been in BSD libcs for a long time. It’s how things like asprintf are implemented. The example on the POSIX site is logically how it works, though the real code allocates the FILE on the stack and initialises it (avoiding the second heap allocation), which you can get away with if you are libc.
I’m slightly surprised that earlier BSDs had their own stdio unrelated to the 7th Edition stdio: I thought there was more sharing at that time … but I suppose the BSD / AT&T USL / Bell Labs Research unix divergence had already started before the 7th Edition.
Here are two positive things about strlcpy that the author doesn’t mention.
It has the same style of interface as snprintf, which is handy if you are using both in the same program.
In both functions, the destination is described as a buffer and a size.
The return value is the number of characters that would have been copied if the output had not been truncated.
The best use case for strlcpy is where your algorithm has a fast path and a slow path.
The fast path is where you have an existing buffer that is the correct size in most cases.
You repeatedly reuse this buffer, avoiding malloc calls in a hot part of your code.
The slow path is where the existing buffer is too small for the data, so you have to malloc some new space.
The author claims that to implement design pattern #2 above, you should instead call strlen to determine how big the string is; then allocate memory if the buffer isn’t big enough; then call memcpy to copy the data. Their code is less efficient, because the string data is always traversed twice, once by strlen and once by memcpy. In the design pattern using strlcpy, the string data is only traversed once in the fast path.
I agree, strlcpy() is fine. I think some of the arguments against it were because strlcpy() and strlcat() were proposed as a pair, and strlcat() is not so fine – it’s basically designed to be quadratic – so they were rejected as a pair.
There is a use case for strlcat. It’s for software maintainers, not for writing new code. The strcpy and strcat functions are very old, and have existed in every libc since the beginning. So there’s a lot of old C code that calls these functions. And these functions are not memory safe: they don’t check for buffer overflow. The OpenBSD team was concerned with rewriting old C code to make it more memory safe. strlcpy and strlcat were designed so that they can be used as plug-in memory safe replacements for the unsafe strcpy and strcat: simple replacements that don’t require replacing these calls with a lot of boilerplate, or redesigning the program logic. You shouldn’t be replacing simple code with complex code if your goal is to make existing code less buggy, and as a maintainer you should only replace inefficient strcat/strlcat uses with higher performance code in places where performance is an issue.
Just replacing strcat with strlcat won’t fix any bugs you had though, it’ll just change them to be different bugs. Instead of an overflow, you’ll get a truncated string, which could quite possibly still be a security issue. You still have to check sizes yourself and if you’re doing that it makes no difference which is used
I am happy that OpenBSD’s ssh uses strlcpy, strlcat and snprintf instead of strcpy, strcat and sprintf, even though it doesn’t check the return value in some places. I think that’s a better choice for security. I can see that in some cases you can statically verify that truncation won’t happen, and the return value is ignored, but even in those cases the memory-safe variant is what’s used. Buffer overflows are a much bigger security risk than truncation.
I mean, I’d probably be happier if ssh were written in Rust, where an accidental attempt at buffer overflow will cause a panic. But this is C, so you have to lower your standards, a lot, and settle for better instead of best. Even with Rust, I’d be happier if the type system were provably memory safe, instead of the current situation where there are known memory unsafety bugs in the safe subset. Rust people will argue that a slightly broken memory safety checker is still better than C++ or C and I agree. “Better” is still better than nothing, even if “better” is not perfect. I’m saying essentially the same thing about the standardized C string functions.
I believe the author misses the point of strlcpy entirely and suggesting the use of mem* functions illustrates that. You, the programmer, are expecting to copy data into a controlled C string destination. You want that to be NUL-terminated. strlcpy guarantees that for destinations you control and know the size of i.e. is non-zer.
I am not really a fan of any of the standard C string functions. I tend to prefer string builder APIs these days, which manage the buffer for you and provide routines to append chars or other strings, etc.
I’ve worked through a few designs, but ilstr is my current favourite; see example usage in the boot banner code or here. I think the most novel thing about it is that the object has a sticky error state, so you can perform as many operations as you like without needing to check return values until the very end. Once you hit an error, subsequent operations are ignored, so you don’t end up with a result that is missing only some operation in the middle.
ICU uses a pattern with this property, where each call takes a pointer to an error object and has a (statically hinted false) check and early return if the value is not zero. You use it like this:
This does sound pretty cool, but I wonder how often an error state is hit and if you’d rather get the error right away?
I’d also be curious how you feel about antirez sds. I don’t write much C code anymore, but that was my string builder API of choice when I did have to work with strings in C. Granted it doesn’t have any sticky error state (although maybe easy to add?).
I just looked at sds and I’m not a fan. I think hiding the metadata in front of the string buffer instead of using a new string buffer object is going to be a source of accidents. I also think the realloc-shaped API (always taking the pointer and returning a potentially but not always new pointer) will be a result of bugs as well.
As far as checking the error state after each call: my earlier design, custr, did that, and it was in practice extremely tedious. If you want to get the error state after any particular operation you can always call ilstr_errno() to check.
People that want a “string builder” interface in C might want to check
open_memstream
.Holy shit, I had not noticed that POSIX had grown that functionality!
The more flexible precursors are funopen and the badly-named fopencookie
POSIX also has fmemopen which can read or write with a fixed-sized buffer, where open_memstream() is write-only to a dynamically growing buffer.
Naming things is hard, but come on. :P
I also missed that it was in POSIX, but the equivalent functionality has been in BSD libcs for a long time. It’s how things like
asprintf
are implemented. The example on the POSIX site is logically how it works, though the real code allocates theFILE
on the stack and initialises it (avoiding the second heap allocation), which you can get away with if you are libc.Yeah, funopen dates from 4.4BSD, a little bit too late to be in the more widely-copied 4.3BSD but years before fopencookie.
I’m slightly surprised that earlier BSDs had their own stdio unrelated to the 7th Edition stdio: I thought there was more sharing at that time … but I suppose the BSD / AT&T USL / Bell Labs Research unix divergence had already started before the 7th Edition.
Here are two positive things about
strlcpy
that the author doesn’t mention.snprintf
, which is handy if you are using both in the same program. In both functions, the destination is described as a buffer and a size. The return value is the number of characters that would have been copied if the output had not been truncated.strlcpy
is where your algorithm has a fast path and a slow path. The fast path is where you have an existing buffer that is the correct size in most cases. You repeatedly reuse this buffer, avoidingmalloc
calls in a hot part of your code. The slow path is where the existing buffer is too small for the data, so you have tomalloc
some new space.The author claims that to implement design pattern #2 above, you should instead call
strlen
to determine how big the string is; then allocate memory if the buffer isn’t big enough; then callmemcpy
to copy the data. Their code is less efficient, because the string data is always traversed twice, once bystrlen
and once bymemcpy
. In the design pattern usingstrlcpy
, the string data is only traversed once in the fast path.I agree, strlcpy() is fine. I think some of the arguments against it were because strlcpy() and strlcat() were proposed as a pair, and strlcat() is not so fine – it’s basically designed to be quadratic – so they were rejected as a pair.
There is a use case for
strlcat
. It’s for software maintainers, not for writing new code. Thestrcpy
andstrcat
functions are very old, and have existed in every libc since the beginning. So there’s a lot of old C code that calls these functions. And these functions are not memory safe: they don’t check for buffer overflow. The OpenBSD team was concerned with rewriting old C code to make it more memory safe.strlcpy
andstrlcat
were designed so that they can be used as plug-in memory safe replacements for the unsafestrcpy
andstrcat
: simple replacements that don’t require replacing these calls with a lot of boilerplate, or redesigning the program logic. You shouldn’t be replacing simple code with complex code if your goal is to make existing code less buggy, and as a maintainer you should only replace inefficientstrcat
/strlcat
uses with higher performance code in places where performance is an issue.Just replacing strcat with strlcat won’t fix any bugs you had though, it’ll just change them to be different bugs. Instead of an overflow, you’ll get a truncated string, which could quite possibly still be a security issue. You still have to check sizes yourself and if you’re doing that it makes no difference which is used
I am happy that OpenBSD’s ssh uses strlcpy, strlcat and snprintf instead of strcpy, strcat and sprintf, even though it doesn’t check the return value in some places. I think that’s a better choice for security. I can see that in some cases you can statically verify that truncation won’t happen, and the return value is ignored, but even in those cases the memory-safe variant is what’s used. Buffer overflows are a much bigger security risk than truncation.
I mean, I’d probably be happier if ssh were written in Rust, where an accidental attempt at buffer overflow will cause a panic. But this is C, so you have to lower your standards, a lot, and settle for better instead of best. Even with Rust, I’d be happier if the type system were provably memory safe, instead of the current situation where there are known memory unsafety bugs in the safe subset. Rust people will argue that a slightly broken memory safety checker is still better than C++ or C and I agree. “Better” is still better than nothing, even if “better” is not perfect. I’m saying essentially the same thing about the standardized C string functions.
Fun fact,
strlcpy(3)
is now part of the POSIX 2024 standard.I believe the author misses the point of
strlcpy
entirely and suggesting the use ofmem*
functions illustrates that. You, the programmer, are expecting to copy data into a controlled C string destination. You want that to beNUL
-terminated.strlcpy
guarantees that for destinations you control and know the size of i.e. is non-zer.I don’t think they even read the actual implementation in OpenBSD: https://github.com/openbsd/src/blob/master/lib/libc/string/strlcpy.c#L27-L49
Better solutions, depending on availability in your project:
I thought that
strscpy()
(not to be confused withstrcpy_s()
) is what’s “in” right now.