Re: G_UTF8String: Boxed Type Proposal
- Date: Thu, 17 Mar 2016 10:39:12 -0400
- From: Randall Sawyer <srandallsawyer@xxxxxxxxxxx>
- Subject: Re: G_UTF8String: Boxed Type Proposal
On 03/17/2016 09:30 AM, Matthias Clasen wrote:
Pleased to be of service! Looking forward to learning how folks work
together in this community.
thanks for contributing!
There already is GString. It dynamically allocates its contents while
keeping track of the number of bytes required - but not for the number
of characters it contains.
I believe that you haven't found such a proposal because most people
don't see much use in a separate boxed type for utf8 strings. Every
string we pass around in GLib and GTK+, and every char * in their APIs
is expected to be in utf8. The few exceptions to this rule are
For interactive applications which employ text-oriented widgets, there
is a need to keep track of utf-8 character lengths for rendering
purposes - text selection, etc. Each time this is called for, code needs
to be written for such management. Take a look at gtkentrybuffer.c for
example. I see a call for the provision of core code which handles this
overhead repeatedly for these sorts of demands.
The main reason you mention for wanting such a type is to do away with
the need for repeatedly calculating the character count. I think this
falls into the same category as the length of the string in bytes - C
doesn't have counted strings either, and expects you to just call
strlen() over and over again. In practice, most strings we're handling
are short enough for this to not be much of an issue.
As I mentioned above, there is GString with its limitations. My intent
in presenting the possibility of "G_UTF8String" is to combine the
dynamic allocation provided by GString while employing in the background
these very utilities you mention.
GLib already provides a number of utilities for dealing with utf8
strings in terms of characters, such as g_utf8_strlen,
g_utf8_substring, g_utf8_find_next/prev_char. We can certainly discuss
adding to that list, if there are glaring omissions.
Here is the vision: Once raw string data - or gunichar value - has been
passed and validated into the construction of a "G_UTF8String"
structure, then contents of two-or-more of these can be easily combined
without the need for additional measuring or validating.
I have cloned a copy of glib-2.47.92. I am currently documenting the
source code I have written.
I'll let you know when I have posted my first patch.
gtk-devel-list mailing list