有时候咱们会碰到这样的状况,在一个表中有不少重复的数据,处于某种须要,咱们须要(创建惟一索引等)清除重复的数据,重复的数据仅保留一行,这样一来,咱们一般会想编写一个sql语句来删除重复的,而且保留一行,可是效率每每都不是很高。下面给一个比较通用且高效的方法,思路是:“把重复的(行或者某个字段重复,根据具体状况)都选出一条存到一张临时表,而后删除原表中全部存在重复的行,再把临时表的数据所有插入原表,这是比较通用而且高效的作法。好比有以下一张表,mysql
mysql> select * from a1;
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v1 | bc | NULL |
| 1 | v2 | b3 | NULL |
| 1 | v3 | b4 | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v1 | bc | NULL |
+------+------+------+------+sql
显然,重复的数据根据不一样的须要可分为:整条记录重复,id 重复, id_c 重复,cd 重复,或者是b 重复ide
咱们这里考虑 id_c 和 cd 重复的状况下的演练操做spa
1,先查看查找出来的重复记录有哪些,可见有两行orm
mysql> select * from a1 group by cd having count(*) > 1;
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v4 | bb | NULL |
| 1 | v1 | bc | NULL |
+------+------+------+------+
2 rows in set (0.00 sec)索引
2,建立临时表it
mysql> create table a1_tmp like a1;
Query OK, 0 rows affected (0.09 sec)table
3,把重复的行存到临时表中class
mysql> insert into a1_tmp select * from a1 group by cd having count(*) > 1;
Query OK, 2 rows affected (0.00 sec)效率
4,删除源有重复记录的行
#mysql> delete from a1 where exists (select cd from a1_tmp where a1.cd=a1_tmp.cd);
#Query OK, 6 rows affected (0.00 sec)
上面语句最好改为 delete from a1 where cd in (select cd from a1_tmp where a1.cd=a1_tmp.cd);
避免数据被删除清空表
PS:这删以前,请先确认你要删除的行是否是很临时表的数据一致,不然错删,后果严重,如
mysql> select * from a1 where exists (select cd from a1_tmp where a1.cd=a1_tmp.cd);
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v1 | bc | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v4 | bb | NULL |
| 1 | v1 | bc | NULL |
+------+------+------+------+
6 rows in set (0.00 sec)
上面sql 的另一种写法是,
select * from a1 where cd in (select cd from a1_tmp);
5,导入临时表数据到源表
mysql> insert into a1 select * from a1_tmp;
Query OK, 2 rows affected (0.04 sec)
Records: 2 Duplicates: 0 Warnings: 0
6,确认数据正确且无重复
mysql> select * from a1;
+------+------+------+------+
| id | id_c | cd | b |
+------+------+------+------+
| 1 | v2 | b3 | NULL |
| 1 | v3 | b4 | NULL |
| 1 | v1 | bc | NULL |
| 1 | v4 | bb | NULL |+------+------+------+------+4 rows in set (0.00 sec)