Web lists-archives.com

Re: G_UTF8String: Boxed Type Proposal

On 03/17/2016 09:30 AM, Matthias Clasen wrote:
Hi Randall,

thanks for contributing!
Pleased to be of service! Looking forward to learning how folks work together in this community.
I believe that you haven't found such a proposal because most people
don't see much use in a separate boxed type for utf8 strings. Every
string we pass around in GLib and GTK+, and every char * in their APIs
is expected to be in utf8. The few exceptions to this rule are
explicitly documented.
There already is GString. It dynamically allocates its contents while keeping track of the number of bytes required - but not for the number of characters it contains.
The main reason you mention for wanting such a type is to do away with
the need for repeatedly calculating the character count. I think this
falls into the same category as the length of the string in bytes - C
doesn't have counted strings either, and expects you to just call
strlen() over and over again. In practice, most strings we're handling
are short enough for this to not be much of an issue.
For interactive applications which employ text-oriented widgets, there is a need to keep track of utf-8 character lengths for rendering purposes - text selection, etc. Each time this is called for, code needs to be written for such management. Take a look at gtkentrybuffer.c for example. I see a call for the provision of core code which handles this overhead repeatedly for these sorts of demands.
GLib already provides a number of utilities for dealing with utf8
strings in terms of characters, such as g_utf8_strlen,
g_utf8_substring, g_utf8_find_next/prev_char. We can certainly discuss
adding to that list, if there are glaring omissions.
As I mentioned above, there is GString with its limitations. My intent in presenting the possibility of "G_UTF8String" is to combine the dynamic allocation provided by GString while employing in the background these very utilities you mention.

Here is the vision: Once raw string data - or gunichar value - has been passed and validated into the construction of a "G_UTF8String" structure, then contents of two-or-more of these can be easily combined without the need for additional measuring or validating.

I have cloned a copy of glib-2.47.92. I am currently documenting the source code I have written.

I'll let you know when I have posted my first patch.

gtk-devel-list mailing list