环境说明:
DB:MySQL 5.7.35 OS:Redhat 7.5
问题:
查询数据库内没有主键/唯一键的表速度慢,耗时12秒。
查询SQL:
SELECT DISTINCT table_name,table_schema FROM information_schema.columns WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select DISTINCT table_name FROM information_schema.columns WHERE column_key in ('PRI','UNI'));
问题重现:
创建测试库
create database cjc; use cjc;
创建测试表
vi cjc_test.sh #!/bin/bash for i in {1..1000} do mysql -uroot -p1 cjc -e "create table with_pk_${i}(pk int primary key)" mysql -uroot -p1 cjc -e "create table whthout_pk_${i}(col int)" done
执行脚本
sh cjc_test.sh
执行SQL,耗时12秒。
SELECT DISTINCT table_name,table_schema FROM information_schema.columns WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select DISTINCT table_name FROM information_schema.columns WHERE column_key in ('PRI','UNI')); ....... 1004 rows in set (12.29 sec)
问题分析:
查看执行计划
explain SELECT DISTINCT table_name,table_schema FROM information_schema.columns WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select DISTINCT table_name FROM information_schema.columns WHERE column_key in ('PRI','UNI')); +----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+ | 1 | PRIMARY | columns | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using where; Open_frm_only; Scanned all databases; Using temporary | | 2 | DEPENDENT SUBQUERY | columns | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using where; Open_frm_only; Scanned all databases | +----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
查看SQL改写后的语句
MySQL [cjc]> show warnings\G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select distinct `information_schema`.`columns`.`TABLE_NAME` AS `table_name`,`information_schema`.`columns`.`TABLE_SCHEMA` AS `table_schema` from `information_schema`.`columns` where ((`information_schema`.`columns`.`TABLE_SCHEMA` not in ('sys','information_schema','mysql','performance_schema')) and (not((`information_schema`.`columns`.`TABLE_NAME`, (/* select#2 */ select 1 from `information_schema`.`columns` where ((`information_schema`.`columns`.`COLUMN_KEY` in ('PRI','UNI')) and ( (`information_schema`.`columns`.`TABLE_NAME`) = `information_schema`.`columns`.`TABLE_NAME`))))))) 1 row in set (0.00 sec) ERROR: No query specified
将SQL格式化:
/* select#1 */ select distinct information_schema.columns.TABLE_NAME AS table_name, information_schema.columns.TABLE_SCHEMA AS table_schema from information_schema.columns where ((information_schema.columns.TABLE_SCHEMA not in ('sys', 'information_schema', 'mysql', 'performance_schema')) and ( not ( < in_optimizer > (information_schema.columns.TABLE_NAME, < exists > ( /* select#2 */ select 1 from information_schema.columns where ((information_schema.columns.COLUMN_KEY in ('PRI', 'UNI')) and (< cache > (information_schema.columns.TABLE_NAME) = information_schema.columns.TABLE_NAME)))))));
SQL自动改写:
可以看到MySQL将非关联子查询:select from A where A.x not in (select x from B);
转换成了
关联子查询:select from A where not exists (select 1 from B where B.x = a.x);
其中:非关联子查询
select from A where A.x not in (select x from B where ...);
扫描 B 表中的所有记录,找到满足条件的记录,存放在临时表 C 中, 在扫描A表中的记录,与临时表C中的记录进行比对。
而关联子查询就需要循环迭代:
select from A where not exists (select 1 from B where B.x = a.x and ...);
扫描A表的每一条记录rA:
扫描B表,找到其中的第一条满足rA条件的记录。
显然,关联子查询的扫描成本会高于非关联子查询。
继续分析执行计划
检查SQL执行计划,分析子查询结果是否进行了缓存(物化,MATERIALIZATION)。
使用 optimizer trace工具,观察MySQL对SQL的优化处理过程。
MySQL [cjc]> show variables like '%optimizer_trace%'; +------------------------------+----------------------------------------------------------------------------+ | Variable_name | Value | +------------------------------+----------------------------------------------------------------------------+ | optimizer_trace | enabled=off,one_line=off | | optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on | | optimizer_trace_limit | 1 | | optimizer_trace_max_mem_size | 16384 | | optimizer_trace_offset | -1 | +------------------------------+----------------------------------------------------------------------------+
调大 optimizer trace 的内存容量(否则 trace 的输出会被截断),然后开启了 optimizer trace功能。
MySQL [cjc]> set optimizer_trace="enabled=on"; Query OK, 0 rows affected (0.00 sec) MySQL [cjc]> set optimizer_trace_max_mem_size=104856; Query OK, 0 rows affected (0.00 sec)
执行SQL
SELECT DISTINCT table_name,table_schema FROM information_schema.columns WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select DISTINCT table_name FROM information_schema.columns WHERE column_key in ('PRI','UNI')); ... 1005 rows in set (12.70 sec)
跑完 SQL 后,可以在 INFORMATION_SCHEMA.OPTIMIZER_TRACE 看到 SQL 的优化处理过程:
结果较多,JSON格式,需要输出到文件。
MySQL [(none)]> SELECT TRACE INTO DUMPFILE "/home/mysql/test01.log" FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE; Query OK, 1 row affected (0.00 sec) [mysql@mysql02 ~]$ vi test01.log { "steps": [ { "creating_tmp_table": { "tmp_table_info": { "row_length": 6015, "key_length": 0, "unique_constraint": false, "location": "disk (InnoDB)", "record_format": "packed" } } }, { "creating_tmp_table": { "tmp_table_info": { "row_length": 6015, "key_length": 0, "unique_constraint": false, "location": "disk (InnoDB)", "record_format": "packed" } } }, ...... [mysql@mysql02 ~]$ cat test01.log |wc -l 28732 [mysql@mysql02 ~]$ ll -rth test* -rwxrw-r-- 1 mysql mysql 175 Feb 5 11:41 test.sh -rw-rw-rw- 1 mysql mysql 103K Feb 5 13:21 test01.log -rw-rw-rw- 1 mysql mysql 660K Feb 5 13:53 test02.log
详细执行计划,可以通过在线json编辑器查看
在线json编辑器
http://json.la/online.html
查看到没有选择物化?
MySQL引入了Materialization(物化)这一关键特性用于子查询(比如在IN/NOT IN子查询以及 FROM 子查询)优化。
具体实现方式是:在SQL执行过程中,第一次需要子查询结果时执行子查询并将子查询的结果保存为临时表 ,后续对子查询结果集的访问将直接通过临时表获得。
与此同时,优化器还具有延迟物化子查询的能力,先通过其它条件判断子查询是否真的需要执行。
物化子查询优化SQL执行的关键点在于对子查询只需要执行一次。
与之相对的执行方式是对外表的每一行都对子查询进行调用,其执行计划中的查询类型为“DEPENDENT SUBQUERY”。
为什么没有物化,检查统计信息,表行数为NULL。
MySQL [(none)]> select * from information_schema.tables where table_name='COLUMNS'\G; *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: information_schema TABLE_NAME: COLUMNS TABLE_TYPE: SYSTEM VIEW ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Dynamic TABLE_ROWS: NULL AVG_ROW_LENGTH: 0 DATA_LENGTH: 16384 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 8388608 AUTO_INCREMENT: NULL CREATE_TIME: NULL UPDATE_TIME: NULL CHECK_TIME: NULL TABLE_COLLATION: utf8_general_ci CHECKSUM: NULL CREATE_OPTIONS: max_rows=2789 TABLE_COMMENT: 1 row in set (0.00 sec) ERROR: No query specified
检查columns创建语句
MySQL [information_schema]> show create table columns\G; *************************** 1. row *************************** Table: COLUMNS Create Table: CREATE TEMPORARY TABLE `COLUMNS` ( `TABLE_CATALOG` varchar(512) NOT NULL DEFAULT '', `TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '', `TABLE_NAME` varchar(64) NOT NULL DEFAULT '', `COLUMN_NAME` varchar(64) NOT NULL DEFAULT '', `ORDINAL_POSITION` bigint(21) unsigned NOT NULL DEFAULT '0', `COLUMN_DEFAULT` longtext, `IS_NULLABLE` varchar(3) NOT NULL DEFAULT '', `DATA_TYPE` varchar(64) NOT NULL DEFAULT '', `CHARACTER_MAXIMUM_LENGTH` bigint(21) unsigned DEFAULT NULL, `CHARACTER_OCTET_LENGTH` bigint(21) unsigned DEFAULT NULL, `NUMERIC_PRECISION` bigint(21) unsigned DEFAULT NULL, `NUMERIC_SCALE` bigint(21) unsigned DEFAULT NULL, `DATETIME_PRECISION` bigint(21) unsigned DEFAULT NULL, `CHARACTER_SET_NAME` varchar(32) DEFAULT NULL, `COLLATION_NAME` varchar(32) DEFAULT NULL, `COLUMN_TYPE` longtext NOT NULL, `COLUMN_KEY` varchar(3) NOT NULL DEFAULT '', `EXTRA` varchar(30) NOT NULL DEFAULT '', `PRIVILEGES` varchar(80) NOT NULL DEFAULT '', `COLUMN_COMMENT` varchar(1024) NOT NULL DEFAULT '', `GENERATION_EXPRESSION` longtext NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) ERROR: No query specified
无法收集统计信息
MySQL [information_schema]> analyze table information_schema.COLUMNS; ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database 'information_schema'
尝试创建相同数据量的t1表
MySQL [cjc]> create table t1 like information_schema.columns; Query OK, 0 rows affected (0.04 sec) MySQL [cjc]> insert into t1 select * from information_schema.columns; Query OK, 5102 rows affected (1.97 sec) Records: 5102 Duplicates: 0 Warnings: 0
执行相同语句只需要0.03秒
SELECT DISTINCT table_name,table_schema FROM t1 WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select DISTINCT table_name FROM t1 WHERE column_key in ('PRI','UNI')); ...... 1005 rows in set (0.03 sec)
查看执行计划
explain SELECT DISTINCT table_name,table_schema FROM t1 WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select DISTINCT table_name FROM t1 WHERE column_key in ('PRI','UNI')); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 5005 | 60.00 | Using where; Using temporary | | 2 | SUBQUERY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 5005 | 20.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 2 rows in set, 1 warning (0.00 sec)
查看t1表统计信息
MySQL [(none)]> select * from information_schema.tables where table_name='t1'\G; *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: cjc TABLE_NAME: t1 TABLE_TYPE: BASE TABLE ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Dynamic TABLE_ROWS: 5005 AVG_ROW_LENGTH: 317 DATA_LENGTH: 1589248 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 4194304 AUTO_INCREMENT: NULL CREATE_TIME: 2022-02-05 11:55:22 UPDATE_TIME: 2022-02-05 11:55:40 CHECK_TIME: NULL TABLE_COLLATION: utf8_general_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT:
尝试强制物化
查看参数:
MySQL [cjc]> show variables like '%optimizer_switch%'\G; *************************** 1. row *************************** Variable_name: optimizer_switch Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,prefer_ordering_index=on 1 row in set (0.00 sec) ERROR: No query specified
速度没有改变,强制物化没成功?
SELECT DISTINCT table_name,table_schema FROM information_schema.columns WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select /*+ SUBQUERY(MATERIALIZATION) */ DISTINCT table_name FROM information_schema.columns WHERE column_key in ('PRI','UNI')); ... 1005 rows in set (12.23 sec)
执行计划
explain SELECT DISTINCT table_name,table_schema FROM information_schema.columns WHERE table_schema NOT IN ('sys','information_schema','mysql','performance_schema') AND table_name NOT IN (select /*+ SUBQUERY(MATERIALIZATION) */ DISTINCT table_name FROM information_schema.columns WHERE column_key in ('PRI','UNI')); +----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+ | 1 | PRIMARY | columns | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using where; Open_frm_only; Scanned all databases; Using temporary | | 2 | DEPENDENT SUBQUERY | columns | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using where; Open_frm_only; Scanned all databases | +----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+ 2 rows in set, 1 warning (0.01 sec)
---参考:《MySQL大智小技》