Web lists-archives.com

Re: [Mingw-users] Idea/Discussion of unicode filepath support for C++ STL on Windows

> From: Emily Leiviskä <emily.leiviska@xxxxxxxxxxxxxxx>
> Date: Fri, 7 Oct 2016 07:59:18 +0000
> #if defined _WIN32
>     auto __inlen = strlen(__file_name) + 1; // Add null byte to be processed
>     wchar_t* __buffer = new wchar_t [__inlen]; // UTF-8 string will have at most as many code points as bytes. 
>     if(0 == MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, __file_name, __inlen, __buffer, __inlen)){
>         delete [] __buffer;
>         set_fail();
>         return;
>     }
>     _M_cfile= _wfopen(__buffer, __c_mode);
>     delete [] __buffer;
>     if(_M_c_file)
> #else if defined _GLIBCXX_USE_LFS
> If I'm not mistaken this change would change make any function that accepts a const char* filename 
> become UTF-8 aware. The downside is that it changes the current (undocumented, and unspecified)
> behaviour from using the current Active Code Page for character encoding in const char* filenames to 
> being UTF-8.
> This might break some applications that rely on this undocumented feature; However it might also fix
> some applications that are currently assuming that fstreams etc are UTF-8 capable as they are on Linux.
> I'm looking for comments; what do you think of such a change and whether it is any idea of
> trying to pursue an attempt to get it into MinGW (or upstream?).
> Would you find it motivated to have some (hopefully minor) breakage of undocumented features wrt 
> ACP paths in exchange for UTF-8 support? Which would be in line with Microsofts recommendations to
> use UTF-8 or UTF-16 when possible.
> If not, would you consider it OK if it could be enabled by setting locale or codepage? For example if 
> std::locale() contains "UTF-8" then use the above conversion otherwise use old behaviour?
> The standard locale on startup is "C" on windows AFAICT. Other ideas?

Doing this only for file-open functionality is a start, but it is a
partial solution at best.  Applications that manipulate file names
almost always need to do string processing with file names, like
finding only the base part of a file name, constructing a full
absolute file name from a directory, a file name, and an extension,
comparing file names in case-insensitive manner, etc.  All of this
will become subtly broken if you use UTF-8 encoded strings, because
Windows locales cannot use UTF-8 as their codeset, which means
functions like isalpha, isupper, strcasecmp, strcoll, mbstowcs,
etc. will not work for any non-ASCII character encoded as a UTF-8

So if we want to make MinGW Unicode-compatible, we need to have
locale-aware functions that support UTF-8, which means replacements
for all of them, starting with 'setlocale'.  Anything less than that
will get us semi-broken implementation full of caveats.

I do agree that this is the right direction, though.  I just think
that more than a single API needs to be fixed for it to become a
reliable feature.

Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
MinGW-users mailing list

This list observes the Etiquette found at 
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

You may change your MinGW Account Options or unsubscribe at:
Also: mailto:mingw-users-request@xxxxxxxxxxxxxxxxxxxxx?subject=unsubscribe