Web lists-archives.com

Re: [Mingw-users] Idea/Discussion of unicode filepath support for C++ STL on Windows

> Doing this only for file-open functionality is a start, but it is a partial solution at best.  Applications that manipulate
> file names almost always need to do string processing with file names, like finding only the base part of a file name,
> constructing a full absolute file name from a directory, a file name, and an extension, comparing file names in 
> case-insensitive manner, etc.  All of this will become subtly broken if you use UTF-8 encoded strings, because 
> Windows locales cannot use UTF-8 as their codeset, which means functions like isalpha, isupper, strcasecmp, 
> strcoll, mbstowcs, etc. will not work for any non-ASCII character encoded as a UTF-8 sequence.

Agreed that it's a partial solution and that a full solution would be desirable. I argue that mingw is already subtly 
broken because std::locale("") doesn't work. On MSYS2 it crashes because LANG=en_US.UTF-8 by default. 
On CMD it reports "C" when it in fact should be something with Latin 1 encoding which is what my ACP is set to. 

The input from std::cin is encoded in ACP in CMD (which is expected) which will break isalpha and company  
for anything else than 7bit ASCII, unless set::locale is fixed. The input from std::cin is encoded with UTF-8 on MSYS2
so we have breakage anyway because we can't get the right locale and isalpha etc are equally broken in the default
settings. What's even weirder is that std::wcin will encode UTF-8 into the low byte of the wchar_t leaving the rest
zero, which is totally unexpected and a waste of space, it is also inconsistent with how WinAPI expects wchar_t to
be encoded with UTF-16. 

So I argue that the proposed change will not make the situation any worse, but rather fix one of the already broken 
APIs. Or if we provide the wchar_t overloads that MSVC has and preserve the old behaviour for chars.
> So if we want to make MinGW Unicode-compatible, we need to have locale-aware functions that support UTF-8,
> which means replacements for all of them, starting with 'setlocale'.  Anything less than that will get us semi-broken
> implementation full of caveats.
> I do agree that this is the right direction, though.  I just think that more than a single API needs to be fixed for it to 
> become a reliable feature.

Don't get me wrong, I would LOVE to see all the other broken APIs fixed and have MinGW be fully Unicode compatible.
 However I do believe that fixing the file open APIs is low hanging fruit that would help a lot of people, even if it isn't 
full blown Unicode compatibility. And as was said, it is a first step. For example we have a particular case where we get
 Unicode file paths from Java into a JNI DLL compiled with MinGW, we just want to pass these strings through, and 
open the file.

Do you think there is a reasonable chance to get this into MinGW?

Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
MinGW-users mailing list

This list observes the Etiquette found at 
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

You may change your MinGW Account Options or unsubscribe at:
Also: mailto:mingw-users-request@xxxxxxxxxxxxxxxxxxxxx?subject=unsubscribe