Re: Issues with width of emoji
- Date: Fri, 21 Sep 2018 19:43:13 +0200
- From: Thomas Wolff <towo@xxxxxxxx>
- Subject: Re: Issues with width of emoji
Am 21.09.2018 um 03:42 schrieb Pokechu22:
Some of your problem descriptions are staying a bit obscure, e.g. what
recent changes have caused which problems...
Hello! For a while I've had issues with emoji and cygwin, but due to
some recent configuration changes on my end it's gotten to the point
where it's actively causing problems.
What are "their escapes"? Emojis are encoded in Unicode directly, not
needing any escapes then.
My specific case involves running weechat on my rapsberry pi, which I
connect to with `mosh pi -- screen -D -RR weechat
/usr/local/bin/weechat-curses`. When someone used an emoji on IRC,
the entire screen would get messed up in some cases, as things got
misaligned (an example of this: https://i.imgur.com/V7D6jPc.png).
Previously I had a script that converted emoji into their escapes,
Unicode does not define any emojis in the range Mathematical
Alphanumeric Symbols (U+1D400-U+1D7FF).
but that recently started misbehaving; even with that script there were
other unicode characters such as the mathematical alphanumeric symbols
that caused the issue too; I'm still going to refer to these as emoji
because I most commonly have this problem wtih emoji and I don't have
a good name otherwise.
I initially assumed that this was a problem with mosh on the pi, what
with the pi being an ARM device. However, after later investigation,
it turns out that it's a cygwin problem. Some different cases where
things behave weirdly:
* Typing an emoji and then pressing backspace twice ends up deleting
the emoji and the character before visually, but the character before
isn't actually deleted (e.g. echo hi<emoji> then backspace twice still
See your own conclusion below.
This may be related to different Unicode versions. Width for many emojis
changed from 1 to 2 in Unicode 9 (I think).
* Running mosh, even as a loopback (`mosh --local ::1`), shows 2
characters when the emoji is typed
* Emoji behave incorrectly when pasted into nano
* curses apps (which include mosh and nano) write a 2-wide space for
emoji, as can be seen in this script
This is only 1 character wide on my pi.
So please describe how you connect when the same test cases behave
* There are no problems when using SSH, at least to my pi, interestingly.
Yes, and of different applications. Do you actually run the cygwin
terminal or the cygwin console for your test cases?
* Python refuses to create a ctypes.c_wchar containing an emoji, but
considers the len of a string with a single emoji to be 1. On my pi
it creates a c_wchar properly.
I think that most of the desyncs and other weird things I've been
getting are a result of different systems disagreeing about how wide
the character should be;
Yes, as UTF-16 may be involved, which represents non-BMP characters as
two "surrogate" code points.
It might be helpful to repeat all observations with other, non-emoji,
non-BMP characters, in order to isolate the effects.
that makes the most sense at least.
Alternatively, it might be an issue with the character being
represented as multiple characters; as far as I can tell there are
only problems with characters outside of the basic multilingual plane
(i.e. value >= 0x10000).
One last thing I noticed: in ncurses, there seems to be some special
stuff to implement wcwidth and wcswidth, including a comment in
ncurses/widechar/widechars.c that says "MinGW has wide-character
functions, but they do not work correctly." As far as I can tell,
this is not enabled on cygwin; I'm not sure if it should be enabled or not.
I hope I explained this well enough; it's a somewhat complicated issue
and I don't know all of the relevant unicode vocabulary.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple