- Date: Tue, 3 Apr 2018 09:55:51 +0100
- From: Darac Marjal <mailinglist@xxxxxxxxxxxx>
- Subject: Re: utf
On Mon, Apr 02, 2018 at 09:39:05AM +0200, Andre Majorel wrote:
On 2018-04-02 08:00 +1200, Ben Caradoc-Davies wrote:On 02/04/18 02:05, mess-mate wrote: >howto change the system utf to eu character set ? Why? UTF (especially UTF-8) is vastly superior for all purposes:I wouldn't say that. UTF-8 breaks a number of assumptions. For instance, 1) every character has the same size, 2) every byte sequence is a valid character, 3) the equality or inequality of two characters comes down to the equality or inequality of the bytes they encode to.
If these things matter to you, it's better to convert from UTF-8 to Unicode, first. I tend to think of Unicode as an arbitrarily large code page. Each character maps to a number, but that number could be 1, 1000 or 500_000 (Unicode seems to be growing without might end in sight). Internally, you might store those code points as Integers or QUad Words or whatever you like. Only once you're ready to transfer the text to another process (print on screen, save to a file, stream across a network), do you convert the Unicode back into UTF-8.
Basically, you consider UTF-8 to be a transfer-only format (like Base64). If you want to do anything non-trivial with it, decode it into Unicode.
With ASCII and the many encodings based on it, most things can be done without having knowledge of the encoding. With UTF-8, even basic operations like determining the length of a string or reporting at what column an error occurred require knowledge of the encoding. -- André Majorel <http://www.teaser.fr/~amajorel/> Imagine what would happen if the Debian project disclosed the email addresses of their users. Spambots would harvest them and Debian users would be inundated with spam. Good thing they don't, eh ?
-- For more information, please reread.
Description: PGP signature