Web lists-archives.com

Issues with width of emoji




Hello!  For a while I've had issues with emoji and cygwin, but due to
some recent configuration changes on my end it's gotten to the point
where it's actively causing problems.

My specific case involves running weechat on my rapsberry pi, which I
connect to with `mosh pi -- screen -D -RR weechat
/usr/local/bin/weechat-curses`.  When someone used an emoji on IRC,
the entire screen would get messed up in some cases, as things got
misaligned (an example of this: https://i.imgur.com/V7D6jPc.png).
Previously I had a script that converted emoji into their escapes, but
that recently started misbehaving; even with that script there were
other unicode characters such as the mathematical alphanumeric symbols
characters (<https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols>)
that caused the issue too; I'm still going to refer to these as emoji
because I most commonly have this problem wtih emoji and I don't have
a good name otherwise.

I initially assumed that this was a problem with mosh on the pi, what
with the pi being an ARM device.  However, after later investigation,
it turns out that it's a cygwin problem.  Some different cases where
things behave weirdly:

* Typing an emoji and then pressing backspace twice ends up deleting
the emoji and the character before visually, but the character before
isn't actually deleted (e.g. echo hi<emoji> then backspace twice still
prints hi)
* Running mosh, even as a loopback (`mosh --local ::1`), shows 2
characters when the emoji is typed
* Emoji behave incorrectly when pasted into nano
* curses apps (which include mosh and nano) write a 2-wide space for
emoji, as can be seen in this script
<https://gist.github.com/Pokechu22/45d19aa5e41ee6db00723f808ac4339e>.
This is only 1 character wide on my pi.
* There are no problems when using SSH, at least to my pi, interestingly.
* Python refuses to create a ctypes.c_wchar containing an emoji, but
considers the len of a string with a single emoji to be 1.  On my pi
it creates a c_wchar properly.

I think that most of the desyncs and other weird things I've been
getting are a result of different systems disagreeing about how wide
the character should be; that makes the most sense at least.
Alternatively, it might be an issue with the character being
represented as multiple characters; as far as I can tell there are
only problems with characters outside of the basic multilingual plane
(i.e. value >= 0x10000).

One last thing I noticed: in ncurses, there seems to be some special
stuff to implement wcwidth and wcswidth, including a comment in
ncurses/widechar/widechars.c that says "MinGW has wide-character
functions, but they do not work correctly."  As far as I can tell,
this is not enabled on cygwin; I'm not sure if it should be enabled or
not.

I hope I explained this well enough; it's a somewhat complicated issue
and I don't know all of the relevant unicode vocabulary.

--Poke

Attachment: cygcheck.out
Description: Binary data

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple