Web lists-archives.com

Re: Bits from the release team: full steam ahead towards buster




On April 18, 2018 9:19 AM, Gunnar Wolf <gwolf@xxxxxxxxxx> wrote:
> But why would ü not be part of the sorting? Yes, that was my example
> before you censored my thought process - In Spanish, [áéíóú] and
> [aeiou] share the same spot while ordering, as do ñ and n, as do u and
> ü (and we have no further diacriticals). I understand that German
> sorts äöü at the end.
> 
> But... Ok, lets stick to 7-bit ASCII as defined. When I was in primary
> school, "ch" and "ll" were treated as single letters (sorted
> respectively between "c" and "d", and between "l" and "m". So,
> thinking with an Ubuntu slant, we would have cow < cheetah < dinosaur
> and lobster < llama < manatee.

Not speaking as a programmer, but as a native American English speaker...

Your example is incorrect sorting behavior in English. Although Spanish might sort their words that way, English does not have double-character letters; ch and ll are treated as c then h, and l then l, for purposes of sorting. Therefore in English, it is correct that we sort cheetah < cow < dinosaur, and llama < lobster < manatee.

As far as diacritics go, American English is practically devoid of them. Where they are present, native (American) English speakers basically ignore them, so the words résumé (n) and resume (v) share the same spot in any given English dictionary. Other symbols like Æ and ß will be changed to ae and ss, and the like, and then sorted accordingly.

So if you are sorting words with an American English locale, and it diverges from this behavior, it is wrong.