Web lists-archives.com

Re: utf8 options under Mysql




Hi,

On 22/04/2016 04:50, Martin Mueller wrote:
MySQL has a bewildering variety of unicode collation choices. Most of them are language specific, but what is the difference between "utf8-general-ci", "utf8-unicode-ci", and "utf8-unicode-520-ci." Do they differ in the range of characters they can handle or is it just a matter of the cort order. I understand that utf8-bin is different because it is case sensitive, but the other differences elude me.

Under what circumstances does it make a difference to use on or the other? I work with a lot of Early Modern print data and the weird symbols of various kinds they use. I've had trouble at times with the "utf8-general-ci" setting, but it may have been more a matter of settings on my front end tool than of the choice of this rather than unicode collation.

Under character sets, there is just one utf8 setting.  The simplest way to make sense of the choices would be to say that given a character set (utf8) the collation only makes a difference to the sort but makes no difference to what can be displayed. Is that correct.
A collation contains definitions for sorting order and comparison. For most purposes one wants "crème brûlée" to be the same as "creme brulee". For unicode characters these rules can be complex. A character set (in your case UTF-8) defines which character can be stored.

utf8-general-ci contains a simplified version of those conversion rules. It works for a lot of Western European languages very well, but in some cases there are problems. For Asian languages there are a lot more problems. For example, 'ß' isn't considered the same as 'ss'.

utf8-unicode-ci has more complex rules and works fine for more languages. Due to the more complex rule set it is a bit slower than utf8-general-ci.

utf8-unicode-520-ci uses a newer version of the rule set that is used in utf8-unicode-ci.

Other utf8-* collations may contain specific rules for specific languages

utf8-general-ci is the default collation for utf-8 in MySQL. If you use literal strings MySQL may assume that these have the default collation and comparing them to columns with other collations or performing things like cast operations may produce errors about invalid combinations of collations.

--

Met vriendelijke groet,

Jigal van Hemert.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql