Web lists-archives.com

Re: utf




> You just seem to have Decided, for reasons known only to you, that
> The Character Length Of A String Is Not Useful.  Despite literally
> decades of programs that have used strlen() in various ways.

strlen was mostly used in a context where char-length = byte-length =
display-width.  Most of those calls to strlen have nothing to do with
char-length but are more interested in display-width or byte-length.

In the context of Unicode, using utf-8 doesn't make byte-length any
harder than with ASCII.  And in the context of Unicode, display-width
is a lot more complex than strlen regardless of which encoding you use
because any given Unicode char can have a display-width of 0, 1, or
2 (even if you disregard proportional fonts and other fancy rendering
tricks).  So utf-8 doesn't make the computation of display-width any
more complex than utf-32.

> What if the question is "Find all the English words that have an E
> in the 5th position and a U in the 7th"?

That can be answered just as easily and efficiently from a utf-8
representation of the string as from a utf-32 representation.


        Stefan