Web lists-archives.com

Re: [Mingw-users] Idea/Discussion of unicode filepath support for C++ STL on Windows




> Doing this only for file-open functionality is a start, but it is a partial solution at best.  Applications that manipulate
> file names almost always need to do string processing with file names, like finding only the base part of a file name,
> constructing a full absolute file name from a directory, a file name, and an extension, comparing file names in 
> case-insensitive manner, etc.  All of this will become subtly broken if you use UTF-8 encoded strings, because 
> Windows locales cannot use UTF-8 as their codeset, which means functions like isalpha, isupper, strcasecmp, 
> strcoll, mbstowcs, etc. will not work for any non-ASCII character encoded as a UTF-8 sequence.

Agreed that it's a partial solution and that a full solution would be desirable. I argue that mingw is already subtly 
broken because std::locale("") doesn't work. On MSYS2 it crashes because LANG=en_US.UTF-8 by default. 
On CMD it reports "C" when it in fact should be something with Latin 1 encoding which is what my ACP is set to. 

The input from std::cin is encoded in ACP in CMD (which is expected) which will break isalpha and company  
for anything else than 7bit ASCII, unless set::locale is fixed. The input from std::cin is encoded with UTF-8 on MSYS2
so we have breakage anyway because we can't get the right locale and isalpha etc are equally broken in the default
settings. What's even weirder is that std::wcin will encode UTF-8 into the low byte of the wchar_t leaving the rest
zero, which is totally unexpected and a waste of space, it is also inconsistent with how WinAPI expects wchar_t to
be encoded with UTF-16. 

So I argue that the proposed change will not make the situation any worse, but rather fix one of the already broken 
APIs. Or if we provide the wchar_t overloads that MSVC has and preserve the old behaviour for chars.
 
> So if we want to make MinGW Unicode-compatible, we need to have locale-aware functions that support UTF-8,
> which means replacements for all of them, starting with 'setlocale'.  Anything less than that will get us semi-broken
> implementation full of caveats.
>
> I do agree that this is the right direction, though.  I just think that more than a single API needs to be fixed for it to 
> become a reliable feature.

Don't get me wrong, I would LOVE to see all the other broken APIs fixed and have MinGW be fully Unicode compatible.
 However I do believe that fixing the file open APIs is low hanging fruit that would help a lot of people, even if it isn't 
full blown Unicode compatibility. And as was said, it is a first step. For example we have a particular case where we get
 Unicode file paths from Java into a JNI DLL compiled with MinGW, we just want to pass these strings through, and 
open the file.

Do you think there is a reasonable chance to get this into MinGW?


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
MinGW-users mailing list
MinGW-users@xxxxxxxxxxxxxxxxxxxxx

This list observes the Etiquette found at 
http://www.mingw.org/Mailing_Lists.
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

_______________________________________________
You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Also: mailto:mingw-users-request@xxxxxxxxxxxxxxxxxxxxx?subject=unsubscribe