Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
stream_libarchive: workaround various types of locale braindeath
Fix that libarchive fails to return filenames for UTF-8/UTF-16 entries. The reason is that it uses locales and all that garbage, and mpv does not set a locale. Both C locales and wchar_t are shitfucked retarded legacy braindeath. If the C/POSIX standard committee had actually competent members, these would have been deprecated or removed long ago. (I mean, they managed to remove gets().) To justify this emotional outbreak potentially insulting to unknown persons, I will write a lot of text. Those not comfortable with toxic language should pretend this is a religious text. C locales are supposed to be a way to support certain languages and cultures easier. One example are character codepages. Back when UTF-8 was not invented yet, there were only 255 possible characters, which is not enough for anything but English and some european languages. So they decided to make the meaning of a character dependent on the current codepage. The locale (LC_CTYPE specifically) determines what character encoding is currently used. Of course nowadays, this is legacy nonsense. Everything uses UTF-8 for "char", and what doesn't is broken and terrible anyway. But the old ways stayed with us, and the stupidity of it as well. C locales were utterly moronic even when they were invented. The locale (via setlocale()) is global state, and global state is not a reasonable way to do anything. It will break libraries, or well modularized code. (The latter would be forced to strictly guard all entrypoints set set/restore locales, assuming a single threaded world.) On top of that, setting a locale randomly changes the semantics of a bunch of standard functions. If a function respects locale, you suddenly can't rely on it to behave the same on all systems. Some behavior can come as a surprise, and of course it will be dependent on the region of the user (it doesn't help that most software is US-centric, and the US locale is almost like the C locale, i.e. almost what you expect). Idiotically, locales were not just used to define the current character encoding, but the concept was used for a whole lot of things, like e. g. whether numbers should use "," or "." as decimal separaror. The latter issue is actually much worse, because it breaks basic string conversion or parsing of numbers for the purpose of interacting with file formats and such. Much can be said about how retarded locales are, even beyond what I just wrote, or will wrote below. They are so hilariously misdesigned and insufficient, I can't even fathom how this shit was _standardized_. (In any case, that meant everyone was forced to implement it.) Many C functions can't even do it correctly. For example, the character set encoding can be a multibyte encoding (not just UTF-8, but awful garbage like Shift JIS (sometimes called SHIT JIZZ), yet functions like toupper() can return only 1 byte. Or just take the fact that the locale API tries to define standard paper sizes (LC_PAPER) or telephone number formatting (LC_TELEPHONE). Who the fuck uses this, or would ever use this? But the badness doesn't stop here. At some point, they invented threads. And they put absolutely no thought into how threads should interact with locales. So they kept locales as global state. Because obviously, you want to be able to change the semantics of basic string processing functions _while_ they're running, right? (Any thread can call setlocale() at any time, and it's supposed to change the locale of all other threads.) At this point, how the fuck are you supposed to do anything correctly? You can't even temporarily switch the locale with setlocale(), because it would asynchronously fuckup the other threads. All you can do is to enforce a convention not to set anything but the C local (this is what mpv does), or to duplicate standard functions using code that doesn't query locale (this is what e.g. libass does, a close dependency of mpv). Imagine they had done this for certain other things. Like errno, with all the brokenness of the locale API. This simply wouldn't have worked, shit would just have been too broken. So they didn't. But locales give a delicious sweet spot of brokenness, where things are broken enough to cause neverending pain, but not broken enough that enough effort would have spent to fix it completely. On that note, standard C11 actually can't stringify an error value. It does define strerror(), but it's not thread safe, even though C11 supports threads. The idiots could just have defined it to be thread safe. Even if your libc is horrible enough that it can't return string literals, it could just just some thread local buffer. Because C11 does define thread local variables. But hey, why care about details, if you can just create a shitty standard? (POSIX defines strerror_r(), which "solves" this problem, while still not making strerror() thread safe.) Anyway, back to threads. The interaction of locales and threads makes no sense. Why would you make locales process global? Who even wanted it to work this way? Who decided that it should keep working this way, despite being so broken (and certainly causing implementation difficulties in libc)? Was it just a fucked up psychopath? Several decades later, the moronic standard committees noticed that this was (still is) kind of a bad situation. Instead of fixing the situation, they added more garbage on top of it. (Probably for the sake of "compatibility"). Now there is a set of new functions, which allow you to override the locale for the current thread. This means you can temporarily override and restore the local on all entrypoints of your code (like you could with setlocale(), before threads were invented). And of course not all operating systems or libcs implement this. For example, I'm pretty sure Microsoft doesn't. (Microsoft got to fuck it up as usual, and only provides _configthreadlocale(). This is shitfucked on its own, because it's GLOBAL STATE to configure that GLOBAL STATE should not be GLOBAL STATE, i.e. completely broken garbage, because it requires agreement over all modules/libraries what behavior should be used. I mean, sure, makign setlocale() affect only the current thread would have been the reasonable behavior. Making this behavior configurable isn't, because you can't rely on what behavior is active.) POSIX showed some minor decency by at least introducing some variations of standard functions, which have a locale argument (e.g. toupper_l()). You just pass the locale which you want to be used, and don't have to do the set locale/call function/restore locale nonense. But OF COURSE they fucked this up too. In no less than 2 ways: - There is no statically available handle for the C locale, so you have to initialize and store it somewhere, which makes it harder to make utility functions safe, that call locale-affected standard functions and expect C semantics. The easy solution, using pthread_once() and a global variable with the created locale, will not be easily accepted by pedantic assholes, because they'll worry about allocation failure, or leaking the locale when using this in library code (and then unloading the library). Or you could have complicated library init/uninit functions, which bring a big load of their own mess. Same for automagic DLL constructors/destructors. - Not all functions have a variant that takes a locale argument, and they missed even some important ones, like snprintf() or strtod() WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK I would like to know why it took so long to standardize a half-assed solution, that, apart from being conceptually half-assed, is even incomplete and insufficient. The obvious way to fix this would have been: - deprecate the entire locale API and their use, and make it a NOP - make UTF-8 the standard character type - make the C locale behavior the default - add new APIs that explicitly take locale objects - provide an emulation layer, that can be used to transparently build legacy code without breaking them But this wouldn't have been "compatible", and the apparently incompetent standard committees would have never accepted this. As if anyone actually used this legacy garbage, except other legacy garbage. Oh yeah, and let's care a lot about legacy compatibility, and let's not care at all about modern code that either has to suffer from this, or subtly breaks when the wrong locales are active. Last but not least, the UTF-8 locale name is apparently not even standardized. At the moment I'm trying to use "C.UTF-8", which is apparently glibc _and_ Debian specific. Got to use every opportunity to make correct usage of UTF-8 harder. What luck that this commit is only for some optional relatively obscure mpv feature. Why is the C locale not UTF-8? Why did POSIX not standardize an UTF-8 locale? Well, according to something I heard a few years ago, they're considering disallowing UTF-8 as locale, because UTF-8 would violate certain ivnariants expected by C or POSIX. (But I'm not sure if I remember this correctly - probably better not to rage about it.) Now, on to libarchive. libarchive intentionally uses the locale API and all the broken crap around it to "convert" UTF-8 or UTF-16 (as contained in reasonably sane archive formats) to "char*". This is a good start! Since glibc does not think that the C locale uses UTF-8, this fails for mpv. So trying to use archive_entry_pathname() to get the archive entry name fails if the name contains non-ASCII characters. Maybe use archive_entry_pathname_utf8()? Surely that should return UTF-8, since its name seems to indicate that it returns UTF-8. But of fucking course it doesn't! libarchive's horribly convoluted code (that is full of locale API usage and other legacy shit, as well as ifdefs and OS specific code, including Windows and fucking Cygwin) somehow fucks up and fails if the locale is not set to UTF-8. I made a PR fixing this in libarchive almost 2 years ago, but it was ignored. So, would archive_entry_pathname_w() as fallback work? No, why would it? Of course this _also_ involves shitfucked code that calls shitfucked standard functions (or OS specific ifdeffed shitfuck). The truth is that at least glibc changes the meaning of wchar_t depending on the locale. Unlike most people think, wchar_t is not standardized to be an UTF variant (or even unicode) - it's an encoding that uses basic units that can be larger than 8 bit. It's an implementation defined thing. Windows defines it to 2 bytes and UTF-16, and glibc defines it to 4 bytes and UTF-32, but only if an UTF-8 locale is set (apparently). Yes. Every libarchive function dealing with strings has 3 variants: plain, _utf8, and _w. And none of these work if the locale is not set. I cannot fathom why they even have a wchar_t variant, because it's redundant and fucking useless for any modern code. Writing a UTF-16 to UTF-8 conversion routine is maybe 3 pages of code, or a few lines if you use iconv. But libarchive uses all this glorious bullshit, and ends up with 3 not working API functions, and with over 4000 lines of its own string abstraction code with gratuitous amounts of ifdefs and OS dependent code that breaks in a fairly common use case. So what we do is: - Use the idiotic POSIX 2008 API (uselocale() etc.) (Too bad for users who try to build this on a system that doesn't have these - hopefully none are left in 2017. But if there are, torturing them with obscure build errors is probably justified. Might be bad for Windows though, which is a very popular platform except on phones.) - Use the "C.UTF-8" locale, which is probably not 100% standards compliant, but works on my system, so it's fine. - Guard every libarchive call with uselocale() + restoring the locale. - Be lazy and skip some libarchive calls. Look forward to the unlikely and astonishingly stupid bugs this could produce. We could also just set a C UTF-8 local in main (since that would have no known negative effects on the rest of the code), but this won't work for libmpv. We assume that uselocale() never fails. In an unexplainable stroke of luck, POSIX made the semantics of uselocale() nice enough that user code can fail failures without introducing crash or security bugs, even if there should be an implementation fucked up enough where it's actually possible that uselocale() fails even with valid input. With all this shitty ugliness added, it finally works, without fucking up other parts of the player. This is still less bad than that time when libquivi fucked up OpenGL rendering, because calling a libquvi function would load some proxy abstraction library, which in turn loaded a KDE plugin (even if KDE was not used), which in turn called setlocale() because Qt does this, and consequently made the mpv GLSL shader generation code emit "," instead of "." for numbers, and of course only for users who had that KDE plugin installed, and lived in a part of the world where "." is not used as decimal separator. All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.
- Loading branch information
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slowclap.gif
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this some kind of record?
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention the LADSPA filters invading programs via libasound and breaking mpv’s OpenGL renderer. For some reason many LADPSA filters call setlocale()…
The fun part is that this can happen even if you don’t use them directly because the ALSA LADSPA plugin would scan the LADSPA path for plugins, some of which do this as soon as they’re loaded.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love you @wm4.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder when you'll fork libarchive and make it 1/4 of the size by removing this whole mess.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You left out my favorite fun fact:
strcoll is strcmp that respects the locale. This will fuck up your library if you try to sort titles in the user's locale.
This only happens in the glibc implementation because POSIX forgot to specify that strings should have a total order in all locales. https://sourceware.org/bugzilla/show_bug.cgi?id=18927
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't have possibly been put better. Seriously, fuck C.
I eagerly await the day an alternative stars existing.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haasn Rust ?
By the way, "and lived in a part of the world where "." is not used as decimal separator.", https://en.wikipedia.org/wiki/Decimal_mark#/media/File:DecimalSeparator.svg. I think comma win ^^ but this is not important.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic shit man. One minor correction:
It's UCS-2 not UTF-16, i.e. UTF-16 without surrogate pairs, and the APIs dealing with it don't enforce UTF-16 validity. http://unicode.org/faq/utf_bom.html#utf16-11
The More You Know.
Fuck C and POSIX.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows uses WTF-16, not UCS-2 nor UTF-16. It is a superset of UTF-16, which supports surrogate pairs just fine, but it also allows lone surrogates. As long as you stick to the WTF-16 part of Windows, you have a single consistent encoding and everything is fine. You can even convert it losslessly to and from WTF-8 which is a superset of UTF-8. Rust handles this with
OsStr
/Path
and for the most part things work out really well.But yeah, try to work with any C library that ventures outside this wide world into the land of narrow encoding and everything falls apart. The system encoding on Windows is never UTF-8. Even if you set the console code page to UTF-8, if you try to read a multibyte character sequence it will fail. You have to stick to WTF-16 or else badness will ensue.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to get technical but I did waste too much time on this bullshit topic so I might as well correct mistakes that I spot. WTF-16 is a neologism invented here, UCS-2 is the older term and it's not correct to say WTF-16 is "not UCS-2". Windows
wchar_t
wide strings "supports" surrogate pairs in the sense that it leaves them alone and it doesn't mess with them, but it doesn't interpret them as codepoints - you have to explicitly convert them to a "multibyte string" for that. Windows low-level system APIs only use wide strings (i.e.wchar_t
, UCS-2, WTF-16, whatever). The confusion is furthered by the fact that lots of online docs call this "UTF-16" including Wikipedia. The Microsoft term is "wide string" and this refers towchar_t
16-bit string, uninterpreted and unencoded, with a (implicit) 1-to-1 mapping between 16-bit chars and Unicode codepoints.edit: actually even some Microsoft docs call this "UTF-16" but this is wrong at least in the context of the low-level system APIs, since they don't perform decoding nor validation. For example wcslen returns the length of the string in 16-bit units, without trying to decode any surrogate pairs.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to fail on most systems, pretty much anything but musl or recent (last couple years?) glibc, I think:
mpa->locale = newlocale(LC_ALL_MASK, "C.UTF-8", (locale_t)0);
I might suggest instead something like:
mpa->locale = newlocale(LC_ALL_MASK - LC_CTYPE_MASK, "C", (locale_t)0);
This will make a locale object that's "C" (guaranteed to exist; won't fail except for possible OOM) in all categories by LC_CTYPE, and matches the default locale (determined by environment or system default) in LC_CTYPE. This should be fine unless you're depending on not having any wacky locale-specific case mappings. If that's a problem, you could instead call newlocale again on the result to try replacing LC_CTYPE with various known "benign" UTF-8 locales like "C.UTF-8", "en_US.UTF-8", etc. that might exist on the system.
FYI, musl has a special case for newlocale where LC_CTYPE is C.UTF-8 and everything else is C; it's statically allocated and thus can't fail even if there's no memory.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlibC upstream still does not have a C.UTF-8 locale, so this will break on Arch Linux and other distros that use vanilla GlibC.
Last time I checked, this locale was not accepted because it has a broken collation that sorts all characters > 0xFFFF between 0x0 and 0x1.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can sort of accomplish this:
by overriding setlocale like so: https://github.com/smcameron/space-nerds-in-space/blob/3337de7428cb79ab84c82561d1b1dcd3af10a6dc/c-is-the-locale.c
Takes care of the multithreaded case, and the case that whatever stupid libraries that keep calling setlocale all the time (gtk, for example) always get "C" no matter what they ask for.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uselocale()
is absent in NetBSD... time to implement it.This comment was marked as abuse.
Sorry, something went wrong.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very emotional but why locales are blamed? Most of troubles come from their improper usage.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because locales and the libc in general make improper usage very widespread, which means that even code you do not control will affect global state, which is precisely what the problem here is.
Boy all these "experts" from reddit sure are embarrassing themselves here.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there's no way to do this:
In a sane way.
Also, in this particular case, “Most of troubles come from their improper usage.” might be true but that doesn't stop libarchive being a pile of shit.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@infinity0 It's more complicated than that. A lot of Windows APIs really do use UTF-16 strings. For example, if you call the Unicode version of SetWindowText with a UTF-16 string that contains supplementary plane characters (and you have the proper fonts installed,) you should see them in the window's title bar.
That a number of APIs are unaware of supplementary plane characters or can't accept surrogate pairs by design (like IsCharUpper,) is just due to the fact that a lot of these APIs were designed before Unicode 2.0, and even afterwards, not all developers knew that Unicode was no-longer a 16-bit encoding. Still, to say Windows strings aren't supposed to be interpreted as UTF-16 is misleading, since they often are.
The filesystem APIs don't validate UTF-16 strings, but I don't think that matters for much. Linux does not require filenames to be encoded in the system codepage either. Linux treats filenames as an uninterpreted sequence of 8-bit code units, and Windows treats filenames as an uninterpreted sequence of 16-bit code units, which is good in a way, because programs don't need to have complicated logic to know what is a legal filename.
When filenames are displayed to the user, Linux will (usually) interpret them in the current codepage (often UTF-8,) and Windows will interpret them as UTF-16. Yes, good Windows filesystem code should allow round-tripping of filenames with ill-formed UTF-16, but it's the same on Linux, where good filesystem code should allow round-tripping of filenames with ill-formed UTF-8. (Rust's WTF-8 encoding is a good way to do this in cross-platform code, but mpv does not use it at the moment.)
And yes, wcslen returns the number of code units, not codepoints in the string, but that's not uncommon in programming languages. You don't often need to know the length of a string in code points, which are not the same thing as user-perceived characters, but you often need to know the length of a string in code units for memory allocation or low-level string manipulation.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rossy
I think at this point we're down to semantics. The point is that the Windows APIs accept UCS-2 and treat it as UTF-16 for display purposes. Windows has dedicated
UCS-2"Wide" APIs, but Linux really only has one API for anything that deals with locale (and anything that doesn't.) Treating filenames as raw bytes works well in this system because the encoding can actually be different on disk. If I'm not mistaking, some kind of truly masochistic person could have Shift-JIS-encoded filenames right next to UTF-8, and it'd work right so as long as your locale is set properly when accessing the files.But in my opinion there is eventually a difference worth noting. What happens when the data comes back? If the
SetWindowTextW
API were truly UTF-16, you'd expect that the invalid surrogate pairs would be replaced with Unicode replacement characters. After all, this HAS to be done to even display the title - which Windows does do for display purposes. Alas, if I do... (please excuse the crappiness)Here
0xD852, 0xDF62
forms the (valid) codepoint 𤭢 (U+24B62
) and the remaining 2 words are the same surrogate pairs but reversed. Windows renders it more or less how you might expect:But
returnData
yields a (unsurprising?) result:So it treats it as a bag of words, like you said.
This may be for purely legacy reasons, or maybe they just feel it's better to treat it as a bag of words. But the same invariant does not hold for the ANSI/locale-based APIs, which will lose information if. A lot of arguments could be made here. But as far as I can tell, this APIs behaviors exhibit absolutely zero UTF-16 awareness; the end result on screen will display valid UTF-16, but the API itself doesn't do anything, meaning that both on the input and output side, invalid UTF-16 can be accepted and emitted.
To me the API itself is not UTF-16. It is cool that it Windows will display it like UTF-16, but UTF-16 is a variable-length encoding and Windows doesn't treat it the same way it would treat other variable-length encodings. To me this is no more a UTF-16 API than
fopen
on Linux is a UTF-8 API.I realize this is all horridly pedantic, but after keeping all of my frustrations about locale and text encoding in operating systems bottled up, it feels relieving to have expressed all of these thoughts. Rant over.
Footnote: Basically to be completely clear, what I'd really expect here is for Windows to error out if you pass in invalid UTF-16, and therefore it would never accept nor emit invalid UTF-16. That's what a 'UTF-16' API is to me. Otherwise it's just an array of 16-bit words that eventually gets treated as UTF-16.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jchv Right, that is exactly the situation as I understand it.
Essentially, it's an API which exposes
u16[]
, you always pass in au16[]
, and for some fraction of the provided function calls (generally UI-related or locale-related stuff) they effectively do an internal UTF-16 decode/validation. It's simply not useful to call theu16[]
"UTF-16", the better way of describing it is "u16[]
but certain API calls transparently decode it as UTF-16 if they really have to".1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using way too kind words, and miss a lot of retardness. For example:
• collation files for C.UTF-8 in glibc are massive despite collation being same as C (ie, compare either bytes or codepoints without any tables, it's equivalent for all legal strings)
A great post, though.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wchar_t
is not just Windows, but also NetBSD and FreeBSD.1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krytarowski: just checked, NetBSD and FreeBSD have sizeof(wchar_t) = 4. It's only Windows that's a special snowflake that violates the C standard:
But it takes just a single snowflake that you can't ignore...
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right.. I mixed something.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wm4, I salute your awesome rantitude. Others may quibble with technical details, but as a description of a toxic mess I too have noticed this is pretty accurate, and as a flight of rhetoric it is...epic poetry.
I bow three times to a master...
...and note that I'm writing new stuff in Go these days for good reasons.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct way to set this would be to do:
Which should pick up the system locale but only for
LC_CTYPE
(which is all libarchive appears to care about).Hard coding
C.UTF-8
is a bad idea since not all systems have aC.UTF-8
locale.Setting
LC_ALL_MASK
should not be needed and appears to be broken on some systems.1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my gosh, this was beautiful.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic. Slap
Fork
now!1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is epic
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zig is looking like a strong contender :D
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Linus 👍🏻
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haasn check out Ziglang.
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though not locale-dependent... anyone who remember cases where
sizeof(wchar_t) == 1
? Windows is not that evil, then.BTW, there are potential strict aliasing issues between
wchar_t
andchar16_t
/char32_t
in C++.1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petersjt014 seems something self-contradict on "hidden control flow" propaganda... how can
defer
andasync
used without "hidden control flow"?(It is also super confusion to take operator overloading as an instance of "hidden control flow".)
1e70e82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FrankHB This isn't really the place to discuss zig (feel free to jump on either the discord or IRC, the latter of which the language's creator is very active), but I'll address the few points you brought up here:
defer
is syntactic sugar and performs an operation when the current scope exits. It's identical to Go'sdefer
in almost every way - there is nothing hidden about it.async
is performed via stack frame replacement (e.g.[er]sp
on x86 machines). There's nothing magic about it, and oftentimes it's not super useful on its own but instead paired with something that manages those frames (e.g. some sort of event loop implementation). I'm currently working on a libuv wrapper that makes use of these, for example.operator +(const my_type &l, const my_type &r) { make_request_to_a_web_server(); }
which conflicts with the expected semantics when you seemy_type a, b; a + b;
somewhere else in the code. Zig expressly prohibits such behavior to ever be written.Hope that helps :)