Web lists-archives.com

Re: Need help with multibyte UTF-8 characters

On 2017-12-12 12:42, Thomas Taylor wrote:
> I believe that Cygwin displays certain UTF-8 characters incorrectly.  To see the
> problem, first save the attached "utf-8_test.sed" text file to your desktop. 
> Then run "mintty," and set its options by right clicking in its title bar,
> selecting "Options" and then "Text."  On the Text page set "Locale" to "en_US"
> and "Character set" to "UTF-8," and then "Save."  Now exit and restart mintty. 
> Change directory to your desktop and run the editor "vim" on the utf-8_test.sed
> file.  Once inside vim do a ":set fileencoding=utf-8".  You should now see that
> vim displays correctly a sample of one-, two-, and three-byte UTF-8 character
> encodings in the test file.  Vim fails, however, on the three-byte encodings for
> the "en" dash, the "em" dash, and the ellipsis, each of which displays
> incorrectly as a filled-in rectangle.  Now exit vim and do a "less" or "cat" on
> the utf-8_test.sed file.  You should see most of the sample UTF-8 encoded
> characters displayed correctly, except once again for the en dash, em dash, and
> ellipsis.  So it looks like a problem in the underlying Cygwin run-time
> libraries rather than in vim, less, or cat.  I haven't tested this on four-byte
> UTF-8 character encodings, but assume Cygwin will have similar problems.

Like many others -- no problems visible -- all UTF-8 characters displayed
correctly in gvim/X, vim, less, cat from mintty.

Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple