-
-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wstring_convert sucks #571
Comments
It's hilarious: there's TONS of utf8 routines out there. But nobody -- NOBODY -- thought they'd make a minimal UTF code point conversion library, with none of the extra garbage involved! Goodness... gracious. I guess I'm writing one myself. |
Maybe our FromUTF8/ToUTF8 implementations would help you avoiding having to write most of the conversion stuff yourself: https://github.com/OpenMPT/openmpt/blob/master/common/mptString.cpp#L842 |
@ThePhD What's your specific problem with codec_vt? For what it's worth, I use the following routines, and on Windows I Having said that, I agree that Unicode support in standard C++ is a bad joke :-(
|
The Right Way™When I want a conversion routine, I need only 2 things.
In terms of a C or even C++ API, that's
Simple, no heap allocations, everything on the stack. We set up a for-loop around this, converting to the code units we want from the code points we have. That means we also need to have a Note we could even just not have the state there to start with! But if you wanted to make things more generic and take the same number of arguments and have a base "state" class that you type-cast in your conversion function (if it's necessary, utf is blessed that it is a stateless conversion). Implementation left to the reader! ... Oh wait, that's me. Warning: Rant(ionale)Line 470 in 63ec47b
On top of the above absolute SNAFU's on MinGW's part. rather than commit to fixing the behavior and weird interface of codecvt. The C++ standards committee just deprecated it outright. A good move because it is -- demonstrably -- complete garbage, truly. Of course, they don't introduce any replacement, and then std lib vendors added the deprecation tags to it before the replacement is agreed upon, which means that I also have to document (and add to all the test builds) that it's deprecated and I need to do something else. And ALSO document VC++'s deprecation warning, because I immediately get not one, but two issues opened asking me if it's okay to proceed beyond the warnings, because most people don't even use codecvt and what is this strange error showing up whoa. So now their deprecation warning is part of my documentation, just to make sure people know it's safe. Of course, even if it's safe, it doesn't make codecvt any less crap. Codecvt itself is garbage, but the implementations themselves are worse. From MinGW's bug, to where VC++ has a build of itself where What's hilarious about the VC++ bug is that it's not even a hair-tugging runtime weirdness like MinGW's byte-swapping bug that somebody quite literally programmed into the library after it worked fine in the 4.x and some 5.x iterations: all they had to do was use But things fall through the cracks all the time! Look at my commit history and you'll see it (thanks, no Two-Phase Lookup for MSVC). It was fixed rather speedily. Unfortunately, that doesn't help me when somebody still has VS Version X and company rules say they're not updating for a long time because {Corporate and Technical Debt Reasons, usually}. Oh. Don't forget that Everybody cares. Performance matters. And whoever designed codecvt in the first place didn't think that maybe the global locale should NOT be in there, given the locale's historical record of being not only poorly understood and used, but poorly optimized. (Or maybe optimized as much as it could be, given the implementation's constraints? I confess I never read codecvt's standardese closely.) |
Also, hilariously from the code @sagamusix linked: https://github.com/OpenMPT/openmpt/blob/master/common/mptString.cpp#L576 Codecvt implementations are trash, everywhere, and hopefully something more useful for doing encoding and decoding is properly standardized. |
This helps avoid bringing in <codecvt> and Boost.Locale just for converting between UTF-8 and UTF-16 on Windows in a locale-agnostic way. See also ThePhD/sol2#571.
This helps avoid bringing in <codecvt> and Boost.Locale just for converting between UTF-8 and UTF-16 on Windows in a locale-agnostic way. See also ThePhD/sol2#571.
This helps avoid bringing in <codecvt> and Boost.Locale just for converting between UTF-8 and UTF-16 on Windows in a locale-agnostic way. See also ThePhD/sol2#571.
And so does codec_vt.
It's time to write some utf8/16/32 conversions, seeing as there's literally no simple header-only library to perform just these conversions without a million years of baggage stacked on top of it.
The text was updated successfully, but these errors were encountered: