Web lists-archives.com

Re: Need help with multibyte UTF-8 characters




On 2017-12-11 16:36, Thomas Taylor wrote:
> Thank you for your advice on setting my locale to en_US.UTF-8.  Unfortunately,
> Cygwin still seems to have trouble displaying some three-byte UTF-8 encoded
> characters correctly.  For example, see the following snippet from a "sed"
> file.  This file attempts to convert XML-encoded filenames to UTF-8.  As you can
> see, it converts one- and two-byte encodings correctly, but fails on some
> three-byte encodings (the en dash, the em dash, and the ellipsis, all of which
> are displayed as a filled-in rectangle):

Going back to first principles - what is your script encoded as and run as?
What characters are in your script?
	$ wc -lwmc ...
What does vim say for that script:
	:set enc? tenc? fenc? fencs? eol? bomb?
What does locale say sed runs as:
	$ locale

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple