Web lists-archives.com

Re: duplicate rows in spite of multi-column unique constraint




Thanks for the suggestions regarding non-printing characters, definitely makes sense as a likely culprit!

However, the data really does seem to be identical in this case:

mysql> select id, customer_id, concat('-', group_id, '-') from app_customergroupmembership  where customer_id ='ajEiQA';
+-------------+-------------+----------------------------+
| id          | customer_id | concat('-', group_id, '-') |
+-------------+-------------+----------------------------+
| 20279608258 | ajEiQA      | -ddH6Ev-                   |
| 20279608269 | ajEiQA      | -ddH6Ev-                   |
+-------------+-------------+----------------------------+
2 rows in set (0.00 sec)


I also ran the data through hexdump as a secondary check, also looks identical:

 mysql  --defaults-extra-file=~/.customers_mysql.cnf app -s -e "select id, customer_id, group_id from app_customergroupmembership  where customer_id ='ajEiQA';"  | hexdump -c
0000000   2   0   2   7   9   6   0   8   2   5   8  \t   a   j   E   i
0000010   Q   A  \t   d   d   H   6   E   v  \n   2   0   2   7   9   6
0000020   0   8   2   6   9  \t   a   j   E   i   Q   A  \t   d   d   H
0000030   6   E   v  \n               


Any other suggestions given this info?


Thanks,
--
Chris Hornung
March 24, 2015 at 6:08 AM
Please do 

  select id, customer_id, concat('-', group_id, '-') 
    from app_customergroupmembership 
    where customer_id ='ajEiQA';

I suspect one of those group IDs has a trailing space or similar 'invible' character that makes it not identical.


----- Original Message -----
From: "Chris Hornung" <chris.hornung@xxxxxxxxxxx>
To: "MySql" <mysql@xxxxxxxxxxxxxxx>
Sent: Monday, 23 March, 2015 18:20:36
Subject: duplicate rows in spite of multi-column unique constraint
Hello,

I'm come across a situation where a table in our production DB has a
relatively small number of duplicative rows that seemingly defy the
unique constraint present on that table.

We're running MySQL 5.6.19a via Amazon RDS. The table in question is
~250M rows.

`show create table` gives:

app_customergroupmembership | CREATE TABLE `app_customergroupmembership` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` varchar(6) COLLATE utf8_bin NOT NULL,
  `group_id` varchar(6) COLLATE utf8_bin NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `app_customergroupmembership_customer_id_31afe160_uniq`
(`customer_id`,`group_id`),
  KEY `app_customergroupmembership_group_id_18aedd38e3f8a4a0`
(`group_id`,`created`)
) ENGINE=InnoDB AUTO_INCREMENT=21951158253 DEFAULT CHARSET=utf8
COLLATE=utf8_bin


Despite that, records with duplicate customer_id/group_id do exist:

mysql> select * from app_customergroupmembership where customer_id =
'ajEiQA';
+-------------+-------------+----------+---------------------+
| id          | customer_id | group_id | created             |
+-------------+-------------+----------+---------------------+
| 20279608258 | ajEiQA      | ddH6Ev   | 2015-02-17 00:14:54 |
| 20279608269 | ajEiQA      | ddH6Ev   | 2015-02-17 00:14:54 |
+-------------+-------------+----------+---------------------+

Interestingly, these dupe records can't seem to be queried when using
both columns from the unique constraint in the WHERE clause:

mysql> select * from app_customergroupmembership where customer_id =
'ajEiQA' and group_id = 'ddH6Ev';
+-------------+-------------+----------+---------------------+
| id          | customer_id | group_id | created             |
+-------------+-------------+----------+---------------------+
| 20279608258 | ajEiQA      | ddH6Ev   | 2015-02-17 00:14:54 |
+-------------+-------------+----------+---------------------+


Any thoughts on how this situation came to pass, and how to prevent it
from happening?

Thanks,
--
Chris Hornung