Web lists-archives.com

Re: utf

On 03/04/18 20:55, Darac Marjal wrote:
If these things matter to you, it's better to convert from UTF-8 to Unicode, first.

Fixed length encodings like UTF-32 will not fix broken assumptions about some relationship between byte length and number of characters because Unicode contains things like combining characters. What is the length of a string? Are you trying to count the number of glyphs? I do not think that you can do this by naïvely counting code points, regardless of encoding.

Because there is more than one way to represent an accented character, Unicode string comparison is nontrivial:

Kind regards,

Ben Caradoc-Davies <ben@xxxxxxxxxxxx>
Transient Software Limited <https://transient.nz/>
New Zealand