聊聊数据库~4.SQL优化篇

时间 2019-11-07 标签聊聊数据库 4.sql 4 sql 优化篇

1.5.查询的艺术

上期回顾：http://www.noobyard.com/article/p-yseftgpn-q.htmlhtml

本节脚本：https://github.com/lotapp/BaseCode/blob/master/database/SQL/02.索引、查询优化.sqlmysql

文章有点小长，但认真阅读确定会有所感触和收获的。PS：我把我能想到的都列下来了，若是有新的会追加，欢迎补充和纠错~git

1.5.1.索引

大方向：减小冗余索引，避免重复（无用）索引github

1.概念

大一统分类：算法

聚簇索引、非聚簇索引：看看数据是否与索引存储在一块儿（一块儿是聚簇索引）
主键索引、辅助索引
稠密索引、稀疏索引
- 是否索引了每个数据项（是则为稠密索引）
B+ Tree索引、hash索引（键值索引，只有Memory存储引擎支持）、R Tree索引（空间索引，MyISAM存储引擎支持）、Fulltext索引（全文索引）
简单索引、组合索引

PS：索引一般作查询条件的字段（索引是在存储引擎级别实现的）sql

经常使用分类：数据库

语法分类：
1. 普通索引：一列一索引
2. 惟一索引：设置unique以后产生（可空）
  - 能够这么理解：惟一+非空=主键
3. 复合索引：多列一索引
物理存储：（Innodb和MyISAM存储引擎）
1. 聚簇索引：通常都是主键
  - 数据和索引存储在一块儿的存储方式
  - Innodb文件后缀：frm、ibd（数据+索引）
2. 非聚簇索引：不是汇集索引的索引
  - 数据和索引分开存放
  - MyISAM文件后缀：frm、myd（数据）、myi（索引）
3. PS：它俩都是b树索引，frm（表结构）和存储引擎无关

2. 语法基础

查看索引：show index from tb_name;
- show index from worktemp.userinfo\G;
- show index from worktemp.userinfo;
建立索引：
- create [unique] index index_name on tb_name(列名,...)
- alter table tb_name add [unique] index [index_name] on (列名,...)
删除索引：
- drop index index_name on tb_name
- alter table tb_name drop index index_name

1.5.2.执行计划

1.往期回顾

先回顾下上节课内容：app

手写SQL的语法顺序：运维

select distinct
    <select_list>
from <tb_name>
    <join_type> join <right_table> on <join_condition>
where
    <where_condition>
group by
    <group_by_list>
having
    <having_condition>
order by
    <order_by_list>
limit <limit_number>

SQL执行顺序：函数

from <tb_name>
on <join_condition>
<join_type> join <right_table>
where <where_condition>
group by <group_by_list>
having <having_condition>
select [distinct] <select_list>
order by <order_by_list>
limit <limit_number>

2.基础

语法：explain + SQL语句

执行计划：使用explain关键词能够模拟优化器执行SQL查询语句，通常用来分析查询语句或者表结构的性能瓶颈

执行计划通常用来干这些事情：

查看表的读取顺序
查看数据读取操做的操做类型
查看哪些索引可使用
查看哪些索引被实际使用
查看表之间的引用
查看每张表有多少行被优化器读取

主要参数

主要是看这几个参数：

id：当前查询语句中，每一个select语句的编号
- 主要是针对子查询、union查询
select_type：查询类型
- 简单查询：simple（通常的查询语句）
- 复杂查询：（详解见附录1）
  - subquery：用于where中的子查询（简单子查询）
  - derived：用于from中的子查询
  - union：union语句的第一个以后的select语句
  - union result：匿名临时表
type：访问类型(MySQL查询表中行的方式)
1. all：全表扫描
2. index：根据索引的次序进行全表扫描（覆盖索引效率更高）
3. range：根据索引作指定范围扫描
4. ref：返回表中全部匹配某单个值的全部行
5. eq_ref：等同于ref，与某个值作比较且仅返回一行
6. const：根据具备惟一性索引查找时，且返回单个行（性能最优）
  - eg：主键、惟一键
7. PS：1~6 ==> 数字越大效率越高（性能递增），（详解见附录2）
possible_keys：查询可能会用到的索引
key：查询中使用了的索引
key_len：索引使用的字节数（详解见附录3）
- 根据这个值，能够判断索引使用状况
- eg：使用组合索引时，判断全部索引字段是否都被查询到
ref：显示key列索引用到了哪些列、常量值
- 在索引列上查找数据时，用到了哪些列或者常量
rows：估算大概须要扫描多少行
Extra：额外信息（性能递减）
1. using index：使用了覆盖索引
2. using where：在存储引擎检索后，再进行一次过滤
3. using temporary：对结果排序时会使用临时表
4. using filesort：对结果使用一个外部索引排序
  - 没有有索引顺序，使用了本身的排序算法
  - 可能出现的状况：（出现这个状况基本上都是须要优化的）
    - where后面的索引列和order by|group by后面的索引列不一致(只能用到一个索引)
    - eg：explain select * from users where id<10 order by email;（只用到了id）

附录

1.select_type

select_type：查询类型

-- `subquery`：用于where中的子查询（简单子查询）
explain
    select name, age
    from students
    where age > (select avg(age) from students);

-- `union`：union语句的第一个以后的select语句
-- `union result`：匿名临时表
explain
    select name, age, work
    from students
    where name = '小张'
    union
    select name, age, work
    from students
    where name = '小明';

-- `derived`：用于from中的子查询
explain
    select *
    from (select name, age, work from students where name = '小张'
          union
          select name, age, work from students where name = '小明') as tmp;

图示输出：

2.type

type：访问类型(MySQL查询表中行的方式)

-- all：全表扫描（效率极低）
explain
    select *
    from students
    where name like '%小%';

-- index：根据索引的次序进行全表扫描（效率低）
explain
    select name, age, work
    from students
    where name like '%小%'; -- 其实就是上面全表扫描的改进版

-- range：根据索引作指定范围扫描
explain
    select name, age, work
    from students
    where id > 5;

-- ref：返回表中全部匹配某单个值的全部行
explain
    select name, age, work
    from students
    where name = '小明';

-- eq_ref：等同于ref，与某个值作比较且仅返回一行
explain
    select *
    from userinfo
             inner join (select id from userinfo limit 10000000,10) as tmp
                        on userinfo.id = tmp.id; -- 1s

-- const：根据具备惟一性索引查找时，且返回单个行（**性能最优**）
explain
    select name, age, work
    from students
    where id = 3; -- 通常都是主键或者惟一键

图示输出：

3.key-len

是否为空：
- not null 不须要额外的字节
- null 须要1字节用来标记
- PS：索引最好不要为null，这样须要额外的存储空间并且统计也变得更复杂
字符类型（char、varchar）的索引长度计算：
- 字符编码：(PS：不一样字符编码占用的存储空间不一样)
  - latin1|ISO8859占1个字节，gbk占2个字节，utf8占3个字节
- 变长字段（varchar）须要额外的2个字节
  - 1字节用来保存须要的字符数
  - 1字节用来记录长度（PS：若是列定义的长度超过255则须要2个字节【总共3字节】）
- 定长字段（char）不须要额外的字节
数值类型、日期类型的索引长度计算：
- 通常都是其自己长度，若是可空则+1
  - 标记是否为空须要占1个字节
- PS：datetime在5.6中字段长度是5，在5.5中字段长度是8
复合索引有最左前缀的特性。若是复合索引能所有用上，则为复合索引字段的索引长度之和
- PS：能够用来判断复合索引是否所有使用到
举个栗子：
- eg：char(20) index 可空
  - key-len=20*3(utf8)+1(可空)=61
- eg：varchar(20) index 可空
  - key-len=20*3(utf8)+2(可变长度)+1(是否可空的标记)=63

建表语句

create table if not exists `students`
(
    id          int unsigned auto_increment primary key,
    name        varchar(25)      not null default '' comment '姓名',
    age         tinyint unsigned not null default 0 comment '年龄',
    work        varchar(20)      not null default '普通学生' comment '职位',
    create_time datetime         not null comment '入学时间',
    datastatus  tinyint          not null default 0 comment '数据状态'
) charset utf8 comment '学生表';

-- select current_timestamp(), now(), unix_timestamp();
insert into students(name, age, work, create_time, datastatus)
values ('111', 22, 'test', now(), 99),
       ('小张', 23, '英语课表明', now(), 1),
       ('小李', 25, '数学课表明', now(), 1),
       ('小明', 21, '普通学生', now(), 1),
       ('小潘', 27, '物理课表明', now(), 1),
       ('张小华', 22, '生物课表明', now(), 1),
       ('张小周', 22, '体育课表明', now(), 1),
       ('小罗', 22, '美术课表明', now(), 1);

-- 建立一个组合索引
create index ix_students_name_age_work on students (name, age, work);

说了这么多题外话，如今进入正题：

1.5.3.建表优化

定长和变长分离（具体得看业务）
- eg：varchar、text、blob等变长字段单独出一张表和主表关联起来便可
经常使用字段和不经常使用字段分离
- 根据业务来分析，不经常使用的字段拎出来
在1对多须要关联统计的字段上添加点冗余字段
- 分表分库时，扩表跨库查询的情景（注意数据一致性）
- eg：在分类表中添加一个数量字段，统计天天新增商品数量
  - 添加商品时，选完分类就update一下count值（次日清零）
字段类型通常都是按照这个优先级：（尽可能使用优先级高的类型）
- 数值 > 日期 > char > varchar > text、blob
- PS：整体原则就是够用便可，而后尽可能避免null（不利于索引，浪费空间）
  - eg：varchar(10)和varchar(300)，在表链接查询时，须要的内存是不同的
伪hash法：好比商品url是一个varchar的列
- 这时候再建一个hash(url)以后的列，把索引设置到该列
  - 推荐使用crc32（用bigint存储）索引空间就会小不少并且能够避免全表扫描
  - eg：select crc32('http://www.baidu.com/shop/1.html');
- PS：若是DBA配置了crc64，则使用；若是没有，能够加个条件（CRC32碰撞后的解决方案）
  - 对于少部分碰撞的记录，只须要多扫描几行就好了，不会出现全表扫描的状况
  - eg：select xxx from urls where crc_url=563216577 and url='url地址'

PS：须要关注的技术点：crc32

1.5.4.组合索引专题

项目里面使用最多的是组合索引，这边先以组合索引为例：

1.尽量多的使用索引列，尽量使用覆盖索引

-- 若是我查询的时候，索引的三列都用到了，那么速度无疑是最快的
-- Extra：using where
explain
    select id, name, age, work, create_time
    from students
    where name = '小张'
      and age = 23
      and work = '英语课表明';

-- PS：★尽可能使用覆盖索引★（近乎万能）
-- 覆盖索引：仅仅查找索引就能找到所须要的数据
-- Extra：using where;using index
explain
    select name, age, work
    from students
    where name = '小张'
      and age = 23
      and work = '英语课表明';
-- PS：通常把常常select出的列设置一个组合索引，通常不超过5个

图示：

2.最左前缀原则

类比火车，火车头本身能够开，车身要是没有了车头就开不了

-- 查询的时候从最左边的列开始，而且不跳过中间的列，一直到最后
explain
    select id, name, age, work, create_time
    from students
    where name = '小张'
      and age = 23
      and work = '英语课表明';

-- 跳过了中间的age，这时候只用到了name列的索引（work列没用到）
explain
    select id, name, age, work, create_time
    from students
    where name = '小张'
      and work = '英语课表明';

图示：

再看两个补充案例：

-- PS：若是跳过了第一列，这时候索引一个也用不到，直接全表扫描了
explain
    select id, name, age, work, create_time
    from students
    where age = 23
      and work = '英语课表明';

-- PS：列不必定须要按照指定顺序来写
explain
    select id, name, age, work, create_time
    from students
    where age = 23
      and work = '英语课表明'
      and name = '小张';

图示：

2.3.范围条件放在最后面（范围条件后面的列索引会失效）

-- name、age、work索引生效时，key_len=140
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小张'
      and age = 23
      and work = '英语课表明';

-- 如今key_len=78 ==> work列索引就失效了（PS：age索引列未失效，只是age以后的列失效了）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小张'
      and age > 22
      and work = '英语课表明';

图示：

补充说明：

-- 加快查询速度可使用覆盖索引
explain
    select name, age, work
    from students
    where name = '小张'
      and age > 22
      and work = '英语课表明';

-- PS：多个主键列也同样
explain
    select id, name, age, work
    from students
    where name = '小张'
      and age > 22
      and work = '英语课表明';

-- PS：调换顺序是无法解决范围后面索引失效的（原本对顺序就不在乎）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小张'
      and work = '英语课表明'
      and age > 22;

图示：

2.4.不在索引列上作其余操做

容易致使全表扫描，这时候利用覆盖索引能够简单优化下

1.`!=`、`is not null`、`is null`、`not in`、`in`、`like`慎用

!=、is not null、is null的案例

-- 1.不等于案例
-- 索引失效（key,key_len ==> null）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name != '小明'; -- <> 等同于 !=

-- 项目里面不少使用都要使用，那怎么办呢？==> 使用覆盖索引
-- key=ix_students_name_age_work，key_len=140
explain
    select name, age, work
    from students
    where name != '小明'; -- <> 等同于 !=

-- 2.is null、is not null案例
-- 索引失效（key,key_len ==> null）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name is not null;

-- 解决：覆盖索引 key=ix_students_name_age_work，key_len=140
explain
    select name, age, work
    from students
    where name is not null;

图示：

not in、in的案例

-- 3.not in、in案例
-- 索引失效（key,key_len ==> null）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name in ('小明', '小潘', '小李');

explain
    select id, name, age, work, create_time, datastatus
    from students
    where name not in ('小明', '小潘', '小李');

-- 解决：覆盖索引 key=ix_students_name_age_work，key_len=140
explain
    select name, age, work
    from students
    where name in ('小明', '小潘', '小李');

explain
    select name, age, work
    from students
    where name not in ('小明', '小潘', '小李');

图示：

like案例：尽可能使用xxx%的方式来全文搜索，能和覆盖索引联合使用更好

-- 4.like案例
-- 索引不失效 key=ix_students_name_age_work，key_len=77（尽可能这么用like）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name like '张%';

-- 索引失效
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name like '%张';

-- 索引失效
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name like '%张%';

-- 解决：覆盖索引 key=ix_students_name_age_work，key_len=140（尽可能避免）
explain
    select name, age, work
    from students
    where name like '%张%';

2.计算、函数、类型转换（自动 or 手动）【尽可能避免】

-- 4.2.计算、函数、类型转换（自动 or 手动）【尽可能避免】
-- 这时候索引直接失效了，并全表扫描了
-- 解决虽然可使用覆盖索引，可是尽可能避免下面的状况：
-- 1.计算
explain
    select id, name, age, work, create_time, datastatus
    from students
    where age = (10 + 13);

-- 2.隐式类型转换（111==>'111'）
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name = 111;
-- PS：字符类型不加引号索引就直接失效了
-- 虽然覆盖索引能够解决，可是不要这样作（严格意义上讲，这个算个错误）

-- 3.函数
explain
    select id, name, age, work, create_time, datastatus
    from students
    where right(name, 1) = '明';

图示：

光看没意思，再举个简单的业务案例：

eg：用户通常都是根据商品的大分类=>小分类=>品牌来查找，有时候到不看品牌，直接小分类后就本身找了。那么组合索引能够这么建：index(分类id,商品价格),index(分类id,品牌id,商品价格)(通常都须要根据查询日记来肯定)

PS：有些条例是流传甚广的，有些是工做中的经验，至少都是我踩过坑的，能够相对放心（业务不一样优化角度不一样）

1.5.5.写法上的优化

5.1.or改为union

-- 5.1.or改为union
-- 如今高版本对只有一个or的sql语句有了优化
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小明'
       or name = '小张'
       or name = '小潘';

-- PS：等同上面or的语句
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name in ('小明', '小张', '小潘');

-- 高效
explain
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小明'
    union all
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小张'
    union all
    select id, name, age, work, create_time, datastatus
    from students
    where name = '小潘';

PS：union老是产生临时表，优化起来比较棘手

通常来讲union子句尽可能查询最少的行，union子句在内存中合并结果集须要去重（浪费资源），因此使用union的时候尽可能加上all（在程序级别去重便可）

5.2.count优化

通常都是count(主键|索引)，但如今count(*)基本上数据库内部都优化过了（根据公司要求使用便可）

PS：记得当时踩了次坑，等复现的时候补上案例（记得好像跟null相关）

看下就知道为何说无所谓了（PS，你count(非索引)就有所谓了）

explain
    select count(id) -- 经常使用
    from userinfo;

explain
    select count(*)
    from userinfo;

-- 你`count(非索引)`就有所谓了
explain
    select count(password)
    from userinfo;

我想说的优化是下面这个count优化案例：（有时候拆分查询会更快）

-- 须要统计id>10000的数据总量（实际中可能会根据时间来统计）
explain
    select count(*) as count
    from userinfo
    where id > 10000; -- 2s

-- 分解成用总数-小数据统计 ==> 1s
explain
    select (select count(*) from userinfo) - (select count(*) from userinfo where id <= 10000) as count;

执行图示：

分析图示：

5.3.group by和order by

group by和order by的列尽可能相同，这样能够避免filesort

-- 5.3.group by和order by的列尽可能相同，这样能够避免filesort
explain
    select *
    from students
    group by name
    order by work;

explain
    select *
    from students
    group by name
    order by name;

-- 加where条件也同样
explain
    select *
    from students
    where name like '小%'
    group by age
    order by work;

-- PS：通常group by和order by的列都和where索引列相同（不一致也只会使用一个索引）
explain
    select *
    from students
    where name like '小%' and age>20
    group by name
    order by name;

-- where后面的索引列和`order by|group by`后面的索引列不一致
-- id和email都是索引，但只用了一个索引
explain
    select *
    from users
    where id < 10
    order by email;

图示：

PS：不一致也只会使用一个索引（在索引误区有详细说明）

5.4.用链接查询来代替子查询

通常来讲都是用链接查询来代替子查询，有些时候子查询更方便（具体看业务吧）

-- 用exists代替in？MySQL查询优化器针对in作了优化（改为了exists，当users表越大查询速度越慢）
explain
    select *
    from students
    where name in (select username from users where id < 7);

-- ==> 等同于：
explain
    select *
    from students
    where exists(select username from users where username = students.name and users.id < 7);

-- 真正改进==>用链接查询代替子查询
explain
    select students.*
    from students
             inner join users on users.username = students.name and users.id < 7;

-- 等效写法：这个tmp是临时表，是没有索引的，若是须要排序能够在（）里面先排完序
explain
    select students.*
    from students
             inner join (select username from users where id < 7) as tmp on students.name = tmp.username;

图示：(内部已经把in转换成exists了，因此改不改写无所谓了)

5.5.★limit优化★

limit offset,N：mysql并非跳过offset行，而后取N行，而是取offset+N行，而后放弃前offset行，返回N行

PS：offset越大效率越低（你去翻贴的时候，页码越大通常越慢）

知识点

为了更加的直观，咱们引入一下profiling

-- 查看profiling系统变量
show variables like '%profil%';
-- profiling：开启SQL语句剖析功能（开启以后应为ON）

-- 来查看是否已经启用profile
select @@profiling;

-- 启动profile（当前会话启动）
set profiling = 1; -- 0：未启动，1：启动

show profiles; -- 显示查询的列表

show profile for query 5; -- 查看指定编号查询的详细信息

输出：

MariaDB [dotnetcrazy]> show variables like '%profil%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| have_profiling         | YES   |
| profiling              | OFF   |
| profiling_history_size | 15    |
+------------------------+-------+
3 rows in set (0.002 sec)

MariaDB [dotnetcrazy]> select @@profiling;
+-------------+
| @@profiling |
+-------------+
|           0 |
+-------------+
1 row in set (0.000 sec)

MariaDB [dotnetcrazy]> set profiling = 1;
Query OK, 0 rows affected (0.000 sec)

正文

上面设置完后，分别执行下面SQL：

select * from userinfo limit 10,10;
select * from userinfo limit 1000,10;
select * from userinfo limit 100000,10;
select * from userinfo limit 1000000,10;
select * from userinfo limit 10000000,10;

输出：

+----------+------------+------------------------------------------+
| Query_ID | Duration   | Query                                    |
+----------+------------+------------------------------------------+
|        1 | 0.00060250 | select * from userinfo limit 10,10       |
|        2 | 0.00075870 | select * from userinfo limit 1000,10     |
|        3 | 0.03121300 | select * from userinfo limit 100000,10   |
|        4 | 0.30530230 | select * from userinfo limit 1000000,10  |
|        5 | 3.03068020 | select * from userinfo limit 10000000,10 |
+----------+------------+------------------------------------------+

图示：

解决方法

业务上解决，eg：不准翻页超过100（通常都是经过搜索来查找数据）
- PS：百度搜索页面也只是最多翻到76
使用where而不使用offset
- id完整的状况：eg：limit 5,3 ==> where id > 5 limit 3;
- PS：项目里面通常都是逻辑删除，id基本上算是比较完整的
覆盖索引+延迟关联：经过使用覆盖索引查询返回须要的主键,再根据主键关联原表得到须要的数据
- 使用场景：好比主键为uuid或id不连续（eg：部分数据物理删除了等等）

说太空洞，演示下就清楚了：

-- 全表扫描
explain
    select *
    from userinfo
    limit 10000000,10; -- 3s

-- 先range过滤了一部分
explain
    select *
    from userinfo
    where id > 10000000
    limit 10; -- 20ms

-- 内部查询使用了索引覆盖
explain
    select *
    from userinfo
             inner join (select id from userinfo limit 10000000,10) as tmp
                        on userinfo.id = tmp.id; -- 2s

分析图示：

查询图示：

扩展：索引误区和冗余索引

1.索引误区

不少人喜欢把where条件的经常使用列上都加上索引，可是遗憾的事情是：独立的索引只能同时用上一个

PS：在实际应用中每每选择组合索引

别不信，来验证一下就知道了：

-- id和email都是索引，可是只能使用一个索引（独立的索引只能同时用上一个）
-- id的key-len=4（int4个字节）
-- email的key-len=152（50*3(utf8下每一个字符占3位)+2(varchar须要额外两个字节存放)==>152）

-- 1.惟一索引和主键：优先使用主键
explain
    select * from users where id = 4 and email = 'xiaoming@qq.com';

-- 2.组合索引和主键：优先使用主键
explain
    select * from users where id=4 and createtime='2019-02-16 17:10:29';

-- 3.惟一索引和组合索引：优先使用惟一索引
explain
    select * from users where createtime='2019-02-16 17:10:29' and email='xiaoming@qq.com';

-- 4.组合索引和通常索引：优先使用组合索引
-- create index ix_users_datastatus on users(datastatus);
-- create index ix_users_username_password on users(username,password);
explain
    select * from users where datastatus=1 and username='小明';
-- 删除临时添加的索引
-- drop index ix_users_datastatus on users;
-- drop index ix_users_username_password on users;

图示：

PS：根据测试得知，一次只能使用1个索引。索引优先级：主键 > 惟一 > 组合 > 普通

2.冗余索引

举个标签表的例子：

create table tags
(
    id         int unsigned auto_increment primary key,
    aid        int unsigned not null,
    tag        varchar(25)  not null,
    datastatus tinyint      not null default 0
);
insert into tags(aid,tag,datastatus) values (1,'Linux',1),(1,'MySQL',1),(1,'SQL',1),(2,'Linux',1),(2,'Python',1);

select id, aid, tag, datastatus from tags;

输出：

+----+-----+--------+------------+
| id | aid | tag    | datastatus |
+----+-----+--------+------------+
|  1 |   1 | MySQL  |          1 |
|  2 |   1 | SQL    |          1 |
|  3 |   2 | Linux  |          1 |
|  4 |   2 | Python |          1 |
+----+-----+--------+------------+

实际应用中可能会根据tag查找文章列表，也可能经过文章id查找对应的tag列表

项目里面通常是这么创建索引（冗余索引）：index(文章id,tag),index(tag,文章id)，这样在上面两种状况下能够直接用到覆盖索引

create index ix_tags_aid_tag on tags(aid,tag);
create index ix_tags_tag_aid on tags(tag,aid);

select tag from tags where aid=1;
select aid from tags where tag='Linux';

3.修复碎片

这边简单说下，下一章应该还会继续说运维相关的知识

数据库表使用时间长了会出现碎片，能够按期修复一下（不影响数据）：optimize table users;

修复表的数据以及索引碎片会把数据文件整理一下，这个过程相对耗费时间（数据量大的状况下）通常根据状况选择按周|月|年修复一下

PS：能够配合crontab（定时任务）使用：

使用命令：crontab -e：***** 命令 [ > /dev/null 2>&1 ]
- 5个*的含义：分、时、日、月、周
- 从定向知识：
  - >> /xx/日志文件：输出重定向到日记文件（不包含错误信息）
  - >> /xx/日志文件 2>&1：输出信息包括错误信息
  - > /dev/null 2>&1：出错信息重定向到垃圾桶（黑洞）
- 举几个栗子：
  - 21*** xxx ==> 天天 1:02 执行 xxx命令
  - 5921*** xxx ==> 天天 21::59 执行 xxx命令
  - */*1*** xxx ==> 每1小时执行一次xxx命令
    - 定时任务以*/开头

课后拓展：

【推荐】一步步分析为何B+树适合做为索引的结构
https://blog.csdn.net/weixin_30531261/article/details/79312676

善用mysql中的FROM_UNIXTIME()函数和UNIX_TIMESTAMP()函数
https://www.cnblogs.com/haorenergou/p/7927591.html

【推荐】MySQL crc32 & crc64函数 提升字符串查询效率
https://www.jianshu.com/p/af6cc7b72dac

MySQL优化之profile
https://www.cnblogs.com/lizhanwu/p/4191765.html