Web lists-archives.com

Re: Need help with multibyte UTF-8 characters

Greetings, Thomas Taylor!

> I believe that Cygwin displays certain UTF-8 characters incorrectly.  To 
> see the problem, first save the attached "utf-8_test.sed" text file to 
> your desktop. 

First, your "NBSP" is actually http://www.fileformat.info/info/unicode/char/23b5/index.htm

> Then run "mintty," and set its options by right clicking
> in its title bar, selecting "Options" and then "Text." 

I just keep them clear.

> On the Text page
> set "Locale" to "en_US" and "Character set" to "UTF-8," and then 
> "Save."  Now exit and restart mintty.  Change directory to your desktop 
> and run the editor "vim" on the utf-8_test.sed file.  Once inside vim do 
> a ":set fileencoding=utf-8".  You should now see that vim displays 
> correctly a sample of one-, two-, and three-byte UTF-8 character 
> encodings in the test file.  Vim fails, however, on the three-byte 
> encodings for the "en" dash, the "em" dash, and the ellipsis, each of 
> which displays incorrectly as a filled-in rectangle.  Now exit vim and 
> do a "less" or "cat" on the utf-8_test.sed file.  You should see most of 
> the sample UTF-8 encoded characters displayed correctly, except once 
> again for the en dash, em dash, and ellipsis. 

All displayed correctly. Lucida Console 11pt.

> So it looks like a problem in the underlying Cygwin run-time libraries
> rather than in vim, less, or cat.  I haven't tested this on four-byte UTF-8
> character encodings, but assume Cygwin will have similar problems.

I don't have a good console font for mb4, but I presume it will be displaed
just fine.

With best regards,
Andrey Repin
Thursday, December 14, 2017 21:59:07

Sorry for my terrible english...
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple