Re: Issues with width of emoji

Am 21.09.2018 um 03:42 schrieb Pokechu22:
Hello!  For a while I've had issues with emoji and cygwin, but due to
some recent configuration changes on my end it's gotten to the point
where it's actively causing problems.
Some of your problem descriptions are staying a bit obscure, e.g. what recent changes have caused which problems...

My specific case involves running weechat on my rapsberry pi, which I
connect to with `mosh pi -- screen -D -RR weechat
/usr/local/bin/weechat-curses`.  When someone used an emoji on IRC,
the entire screen would get messed up in some cases, as things got
misaligned (an example of this: https://i.imgur.com/V7D6jPc.png).
Previously I had a script that converted emoji into their escapes,
What are "their escapes"? Emojis are encoded in Unicode directly, not needing any escapes then.

but that recently started misbehaving; even with that script there were
other unicode characters such as the mathematical alphanumeric symbols
characters (<https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols>)
Unicode does not define any emojis in the range Mathematical Alphanumeric Symbols (U+1D400-U+1D7FF).

that caused the issue too; I'm still going to refer to these as emoji
because I most commonly have this problem wtih emoji and I don't have
a good name otherwise.

I initially assumed that this was a problem with mosh on the pi, what
with the pi being an ARM device.  However, after later investigation,
it turns out that it's a cygwin problem.  Some different cases where
things behave weirdly:

* Typing an emoji and then pressing backspace twice ends up deleting
the emoji and the character before visually, but the character before
isn't actually deleted (e.g. echo hi<emoji> then backspace twice still
prints hi)
See your own conclusion below.
* Running mosh, even as a loopback (`mosh --local ::1`), shows 2
characters when the emoji is typed
* Emoji behave incorrectly when pasted into nano
* curses apps (which include mosh and nano) write a 2-wide space for
emoji, as can be seen in this script
This is only 1 character wide on my pi.
This may be related to different Unicode versions. Width for many emojis changed from 1 to 2 in Unicode 9 (I think).
* There are no problems when using SSH, at least to my pi, interestingly.
So please describe how you connect when the same test cases behave differently.
* Python refuses to create a ctypes.c_wchar containing an emoji, but
considers the len of a string with a single emoji to be 1.  On my pi
it creates a c_wchar properly.

I think that most of the desyncs and other weird things I've been
getting are a result of different systems disagreeing about how wide
the character should be;
Yes, and of different applications. Do you actually run the cygwin terminal or the cygwin console for your test cases?

that makes the most sense at least.
Alternatively, it might be an issue with the character being
represented as multiple characters; as far as I can tell there are
only problems with characters outside of the basic multilingual plane
(i.e. value >= 0x10000).
Yes, as UTF-16 may be involved, which represents non-BMP characters as two "surrogate" code points. It might be helpful to repeat all observations with other, non-emoji, non-BMP characters, in order to isolate the effects.

One last thing I noticed: in ncurses, there seems to be some special
stuff to implement wcwidth and wcswidth, including a comment in
ncurses/widechar/widechars.c that says "MinGW has wide-character
functions, but they do not work correctly."  As far as I can tell,
this is not enabled on cygwin; I'm not sure if it should be enabled or not.

I hope I explained this well enough; it's a somewhat complicated issue
and I don't know all of the relevant unicode vocabulary.

