Re: Cygwin fails to utilize Unicode replacement character
- Date: Tue, 4 Sep 2018 23:43:16 +0200
- From: Thomas Wolff <towo@xxxxxxxx>
- Subject: Re: Cygwin fails to utilize Unicode replacement character
Am 04.09.2018 um 21:53 schrieb Steven Penny:
Traditionally, many terminals used to display the DEL character as a
checkered block, which is more or less the MEDIUM SHADE.
On Tue, 4 Sep 2018 20:41:48, Thomas Wolff wrote:
the .notdef glyph is not an appropriate indication of illegal
encoding (like broken UTF-8 bytes)
true, but neither is U+2592. as far as i know U+2592 is not defined
anywhere as being a representation of anything other than "MEDIUM SHADE".
This makes the glyph appear somewhat "erroneous" by convention.
Corinna originally added it in 2009:
with no justification of why it was chosen that i can tell.
Justification is traditional usage of the symbol as described above.
actually changed from U+FFFD to U+2592 in 2009:
with actually a good reason, which was to avoid ambiguity with fonts
have U+FFFD. but again, no reason why U+2592 was chosen. i personally
sides of the argument but i tend to land of the side of any standards
Here is the standard for U+FFFD:
FFFD � Replacement Character
• used to replace an incoming character whose value is
unknown or unrepresentable in Unicode
if we were to use something other than U+FFFD, I would propose U+25A1,
as it is
also defined by Unicode:
25A1 □ White Square
• may be used to represent a missing ideograph
Quoting yourself from your other response:
U+2592 MEDIUM SHADE is *only* used in cases of invalid UTF-8. In case
of missing character - the ".notdef" glyph is used
This is my point. We have two use cases here:
invalid code point -> MEDIUM SHADE
valid code point with no glyph in font -> .notdef glyph -> WHITE SQUARE
Now if you switch to FFFD REPLACEMENT CHARACTER for invalid code point,
and considering that it does not exist in most actual fonts and that the
console does not apply font fallback, it will resolve to WHITE SQUARE, thus:
folding the two different use cases into the same appearance,
which is bad.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple