Web lists-archives.com

Re: G_UTF8String: Boxed Type Proposal

On 03/19/2016 03:41 AM, Errol van de l'Isle wrote:
Just to add my two cents worth as a user of glibmm.

Glib::usting uses g_utf8_pointer_to_offset() to obtain the length of
the string in characters in the method Glib::ustring::length. The
method Glib::ustring::bytes returns the length in bytes;

At no point does it store the number of UTF-8 characters as this would
be inefficient.

For simple string manipulation like inserting a string or character or
concatenating would require extra work to be done. The string needs to
be checked that it is still valid UTF-8 before the length is updated.
The next issue is what to do when the string becomes invalid UTF-8.
Doing this for every string operation will have a performance
implication. Imagine doing this in a loop inserting a byte from a

Checking at the end of all the operations or handing it over to GTK to
deal with the problems will be more efficient and less of a headache.

Thank you, Errol.

I understand that it would be inefficient to validate the string each time.

I picked up on this fact from Matthias Clasen's first response in this thread (https://mail.gnome.org/archives/gtk-devel-list/2016-March/msg00014.html):

"Every string we pass around in GLib and GTK+, and every char * in their APIs is expected to be in utf8."

My response to this (https://mail.gnome.org/archives/gtk-devel-list/2016-March/msg00015.html):

"Here is the vision: Once raw string data - or gunichar value - has been passed and validated into the construction of a "G_UTF8String" structure, then contents of two-or-more of these can be easily combined without the need for additional measuring or validating."

It is inefficient for functions which need to know the code-point length of a utf8 string to have to calculate that value each time it is needed. This is the current state of affairs.

Some object classes - such as GtkEntryBuffer - store this value and update it as text is inserted or deleted. That is efficient. The fact that developers need to write equivalent code for each such class is inefficient.

It is these two inefficiencies which I am addressing.

gtk-devel-list mailing list

gtk-devel-list mailing list