Re: G_UTF8String: Boxed Type Proposal
- Date: Thu, 17 Mar 2016 14:18:29 -0400
- From: Randall Sawyer <srandallsawyer@xxxxxxxxxxx>
- Subject: Re: G_UTF8String: Boxed Type Proposal
On 03/17/2016 10:39 AM, Randall Sawyer wrote:
On 03/17/2016 09:30 AM, Matthias Clasen wrote:
Here is the vision: Once raw string data - or gunichar value - has
been passed and validated into the construction of a "G_UTF8String"
structure, then contents of two-or-more of these can be easily
combined without the need for additional measuring or validating.
I believe that you haven't found such a proposal because most people
don't see much use in a separate boxed type for utf8 strings. Every
string we pass around in GLib and GTK+, and every char * in their APIs
is expected to be in utf8. The few exceptions to this rule are
GLib already provides a number of utilities for dealing with utf8
strings in terms of characters, such as g_utf8_strlen,
g_utf8_substring, g_utf8_find_next/prev_char. We can certainly discuss
adding to that list, if there are glaring omissions.
Alright Matthias, after your thoughtful response, I have come to the
following conclusion: When considering management of dynamically
allocated UTF-8 strings, there are actually two points to consider: 1)
Whether the byte sequences are valid per IETF RFC 3629 Section 4 - and -
2) The number of distinct characters represented in the string vs. the
total number of bytes used to represent these.
If someone were to write a widget library or an application using
libraries which ensure valid UTF-8 as input - Gdk key-press events and
GtkIMContexts for example - then it wouldn't make sense to run those
strings through yet another course of validation. That addresses the
There is still the question of character length vs. byte length.
Therefore - from what you have told me - I will be sure to present
methods which feature validation as an option and not as the rule.
gtk-devel-list mailing list