innodb行锁对象mutext争用导致LINUX内核态CPU使用率高诊断

MSYQL 版本:

mysql> select version()\G

*************************** 1. row ***************************

version(): 5.6.27-log

1 row in set (0.00 sec)

 

1. 测试数据如下

mysql> select * from t1;

+------+------+

| id   | name |

+------+------+

|   15 | a    |

|   10 | b    |

|    6 | c    |

|   10 | d    |

|   11 | f    |

|    2 | zz   |

+------+------+

6 rows in set (0.00 sec)

 

表中只有6条数据都存储在同一个数据块中。

 

 

 

 

 

mysql> show index from t1\G

*************************** 1. row ***************************

        Table: t1

   Non_unique: 0

     Key_name: PRIMARY

 Seq_in_index: 1

  Column_name: name

    Collation: A

  Cardinality: 5

     Sub_part: NULL

       Packed: NULL

         Null:

   Index_type: BTREE

      Comment:

Index_comment:

*************************** 2. row ***************************

        Table: t1

   Non_unique: 1

     Key_name: idx_t1_id  # 该索引是非唯一索引

 Seq_in_index: 1

  Column_name: id

    Collation: A

  Cardinality: 5

     Sub_part: NULL

       Packed: NULL

         Null: YES

   Index_type: BTREE

      Comment:

Index_comment:

2 rows in set (0.00 sec)

 

 

2. 创建用来测试的存储过程

 

DELIMITER $$

 

USE `test`$$

 

DROP PROCEDURE IF EXISTS `p7`$$

 

CREATE DEFINER=`root`@`localhost` PROCEDURE `p7`()

BEGIN

DECLARE i INT;

SET i = 1;

WHILE i < 10000000000 DO

UPDATE t1 SET id=15 WHERE id=15;

commit;

SET i = i +1;

END WHILE;

END$$

 

DELIMITER ;

 

 

 

DELIMITER $$

 

USE `test`$$

 

DROP PROCEDURE IF EXISTS `p8`$$

 

CREATE DEFINER=`root`@`localhost` PROCEDURE `p8`()

BEGIN

DECLARE i INT;

SET i = 1;

WHILE i < 10000000000 DO

UPDATE t1 SET id=15 WHERE id=15;

commit;

SET i = i +1;

END WHILE;

END$$

 

DELIMITER ;

 

 

 

DELIMITER $$

 

USE `test`$$

 

DROP PROCEDURE IF EXISTS `p9`$$

 

CREATE DEFINER=`root`@`localhost` PROCEDURE `p9`()

BEGIN

DECLARE i INT;

SET i = 1;

WHILE i < 10000000000 DO

UPDATE t1 SET id=15 WHERE id=15;

commit;

SET i = i +1;

END WHILE;

END$$

 

DELIMITER ;

 

 

3. 分别在三个会话中调用存储过程

会话一调用 存储过程 p4

mysql> call p7();

 

会话二调用存储过程 p5

mysql > call p8();

 

 

会话三调用存储过程 p6

mysql > call p9();

 

4. 运行 pt-stalk  采集性能数据

 

bin/pt-stalk  --collect-oprofile  TRUE    --dest /root/olm/pt-stalk-data-12-07  --no-stalk --run-time 300 --iterations 1 --sleep=3




5. 分析 pt-stalk 采集到的性能数据





从收集到的 TOP 输出的数据我们看到用户空间使用的CPU22.1%,内核空间使用的CPU 55.2%(非常高);然后我们再看下具体进程使用的CPU,看到 MYSQL 使用了149%,测试的服务器只有2个逻辑CPUMYSQLD 进程使用了149% CPU,也就是说 CPU都是MSYQLD消耗的,结合CPU使用率汇总信息中用户空间22.1%、内核空间CPU 使用率 55.2%,可以判断内核这么高的CPU使用率也是 MYSQLD进行系统调用消耗的。




上图中是 oprofile 采集的MYSQLD 线程堆栈的报告,按 CPU 时间排序。

mutex_spin_wait函数是 MSYQL 线程等待 mutex CPU上自旋。

ut_delay 函数是在CPU上自旋时进行空转。该函数消耗MSYQLD总执行时间的18.4%排在首位,不过oprofile报告无法看出是什么函数(比如锁操作函数或缓冲池页面操作函数)调用了ut_delay




pt-stalk 启动时采集的mutext 状态信息,内容如下:



Type Name       Status

InnoDB     trx0trx.cc:108 os_waits=10745

InnoDB     trx0trx.cc:108 os_waits=8446

InnoDB     trx0trx.cc:108 os_waits=7268

InnoDB     trx0sys.cc:597 os_waits=353

InnoDB     lock0lock.cc:614       os_waits=2699035

InnoDB     lock0lock.cc:611       os_waits=42172

InnoDB     log0log.cc:844 os_waits=1

InnoDB     buf0buf.cc:1334       os_waits=5

InnoDB     fil0fil.cc:1690   os_waits=1

InnoDB     log0log.cc:907 os_waits=904

InnoDB     trx0i_s.cc:1439        os_waits=3

InnoDB     combined buf0buf.cc:1069       os_waits=134

 

pt-stalk 结束时采集的mutext 状态信息,内容如下:

 

Type Name       Status

InnoDB     trx0trx.cc:108 os_waits=13256

InnoDB     trx0trx.cc:108 os_waits=10133

InnoDB     trx0trx.cc:108 os_waits=9260

InnoDB     trx0sys.cc:597 os_waits=439

InnoDB     lock0lock.cc:614       os_waits=3580728 # 开始采集的值是 2699035,现在是35807285分钟增加了 88.196 万。

InnoDB     lock0lock.cc:611       os_waits=53356

InnoDB     log0log.cc:844 os_waits=1

InnoDB     buf0buf.cc:1334       os_waits=5

InnoDB     fil0fil.cc:1690   os_waits=1

InnoDB     log0log.cc:907 os_waits=1206

InnoDB     trx0i_s.cc:1439        os_waits=3

InnoDB     combined buf0buf.cc:1069       os_waits=163




mutext status 数据os_waits最多的是 lock0lock.cc(保护锁对象的MUTXT)争用,os_waits MYSQL线程通过自旋没有获得 mutext 而进入了睡眠状态。


TS 1481942502.009370813 2016-12-17 10:41:42

*************************** 1. row ***************************

     Id: 1

   User: root

   Host: localhost

     db: test

Command: Query

   Time: 0

  State: Searching rows for update

   Info: UPDATE t1 SET id=15 WHERE id=15

*************************** 2. row ***************************

     Id: 2

   User: root

   Host: localhost

     db: test

Command: Query

   Time: 0

  State: Searching rows for update

   Info: UPDATE t1 SET id=15 WHERE id=15

*************************** 3. row ***************************

     Id: 3

   User: root

   Host: localhost

     db: test

Command: Query

   Time: 0

  State: Searching rows for update

   Info: UPDATE t1 SET id=15 WHERE id=15

*************************** 4. row ***************************

     Id: 3643

   User: root

   Host: localhost

     db: NULL

Command: Sleep

   Time: 1

  State:

   Info: NULL

*************************** 5. row ***************************

     Id: 3659

   User: root

   Host: localhost

     db: NULL

Command: Query

   Time: 0

  State: init

   Info: SHOW FULL PROCESSLIST

 

上面是 pt-stalk 采集的MYSQL线程状态,出现 mutext争用MYSQL CPU使用率很高时,MYSQL 在循环执行UPDATE 操作。




pt-stalk  采集的锁等待信息,内容如下:

 

...

*************************** 1. row ***************************

   who_blocks: thread 1 from localhost

  idle_in_trx: 0

max_wait_time: 0

  num_waiters: 2

*************************** 2. row ***************************

   who_blocks: thread 2 from localhost

  idle_in_trx: 0

max_wait_time: 0

  num_waiters: 1

*************************** 1. row ***************************

    waiting_trx_id: 211608264

    waiting_thread: 3

         wait_time: 0

     waiting_query: UPDATE t1 SET id=15 WHERE id=15

waiting_table_lock: `test`.`t1`

   blocking_trx_id: 211608262

   blocking_thread: 1

     blocking_host:

     blocking_port: localhost

       idle_in_trx: 0

    blocking_query: UPDATE t1 SET id=15 WHERE id=15

*************************** 2. row ***************************

    waiting_trx_id: 211608263

    waiting_thread: 2

         wait_time: 0

     waiting_query: UPDATE t1 SET id=15 WHERE id=15

waiting_table_lock: `test`.`t1`

   blocking_trx_id: 211608262

   blocking_thread: 1

     blocking_host:

     blocking_port: localhost

       idle_in_trx: 0

    blocking_query: UPDATE t1 SET id=15 WHERE id=15

*************************** 3. row ***************************

    waiting_trx_id: 211608264

    waiting_thread: 3

         wait_time: 0

     waiting_query: UPDATE t1 SET id=15 WHERE id=15

waiting_table_lock: `test`.`t1`

   blocking_trx_id: 211608263

   blocking_thread: 2

     blocking_host:

     blocking_port: localhost

       idle_in_trx: 0

    blocking_query: UPDATE t1 SET id=15 WHERE id=15

...

 

从上面的信息我们看到出现 CPU使用很高的情况时(内核空间CPU使用率也很高),有频繁的锁等待。





6. 使用 perf 采集MYSQLD 线程堆栈信息分析

 

perf record -e cpu-clock -g -p 2578

 

export LANG=zh_CN.GBK

 

perf report






_spin_unlock_irqrestore 占了采样期间内核CPU使用率的47.30%retint_careful占了采样期间内核CPU使用率的20.85%[kernel.kallsyms] 表示MYSQLD 调用了 LINUX 内核函数或宏。



_spin_unlock_irqrestore 宏的工作是释放LINUX 中的 MUTEX和启用CPU的本地中断并恢复local_irq_save() 所保存的中断状态。

local_irq_save() 禁止中断并中断状态保存到变量中。

retint_careful 是处理完中断准备离开中断处理程序时检查是否还有其他工作需要处理。如果有其他工作处理的情况下跳转到retint_careful执行。没有其他工作处理则首先调用swapgs命令在gs寄存器的内核与用户态值之间切换,为离开内核做准备。然后调用RESTORE_ARGS宏恢复之前SAVE_ARGS宏保存现场所压栈的那些寄存器。最后就是调用iretq离开中断;如果还有其他工作要处理首先检查是否需要重新调度,不需要重新调度就跳转到retint_signal执行。需要重新调度的话先开中断,然后调用schedule函数执行任务的调度。当调度的任务执行完之后,重新获得cpu时,跳转到retint_check处再次检查是否有工作需要处理。


接下来要理清是MYSQL的什么操作调用了 _spin_unlock_irqrestore retint_careful

 

 

perf report > perf_report_20161217.txt 

该命令把进程调用堆栈生成树形结构的报告,报告如下:



# Overhead  Command        Shared Object                                                                                                                               

                                                                                    Symbol

# ........  .......  ...................  .............................................................................................................................

..........................................................................................

#

    47.30%   mysqld  [kernel.kallsyms]    [k] _spin_unlock_irqrestore                                                                                                  

                                                                                         

             |

             --- _spin_unlock_irqrestore

                |         

                |--97.97%-- pthread_cond_broadcast@@GLIBC_2.3.2

                |          |         

                |          |--94.19%-- lock_wait_suspend_thread(que_thr_t*)

                |          |          row_mysql_handle_errors(dberr_t*, trx_t*, que_thr_t*, trx_savept_t*)

                |          |          row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)

                |          |          ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)

                |          |          handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)

                |          |          handler::read_range_first(st_key_range const*, st_key_range const*, bool, bool)

                |          |          handler::multi_range_read_next(char**)

                |          |          DsMrr_impl::dsmrr_next(char**)

                |          |          QUICK_RANGE_SELECT::get_next()

                |          |          rr_quick(READ_RECORD*)

                |          |          mysql_update(THD*, TABLE_LIST*, List&, List&, Item*, unsigned int, st_order*, unsigned long long, enum_duplicates, bo

ol, unsigned long long*, unsigned long long*)

                |          |          mysql_execute_command(THD*)

                |          |          sp_instr_stmt::exec_core(THD*, unsigned int*)

                |          |          sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)

                |          |          sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)

                |          |          sp_instr_stmt::execute(THD*, unsigned int*)

                |          |          sp_head::execute(THD*, bool)

                |          |          sp_head::execute_procedure(THD*, List*)

                |          |          mysql_execute_command(THD*)

                |          |          mysql_parse(THD*, char*, unsigned int, Parser_state*)

                |          |          dispatch_command(enum_server_command, THD*, char*, unsigned int)

                |          |          do_command(THD*)

                |          |          do_handle_one_connection(THD*)

                |          |          handle_one_connection

                |          |          pfs_spawn_thread

                |          |          start_thread

                |          |         

                |          |--4.42%-- lock_grant(ib_lock_t*)

                |          |          lock_rec_dequeue_from_page(ib_lock_t*)

                |          |          lock_trx_release_locks(trx_t*)

           |          |          trx_commit_low(trx_t*, mtr_t*)

                |          |          trx_commit_for_mysql(trx_t*)

                |          |          innobase_commit(handlerton*, THD*, bool)

                |          |          ha_commit_low(THD*, bool, bool)

                |          |          MYSQL_BIN_LOG::commit(THD*, bool)

                |          |          ha_commit_trans(THD*, bool, bool)

                |          |          trans_commit_stmt(THD*)

                |          |          mysql_execute_command(THD*)

                |          |          sp_instr_stmt::exec_core(THD*, unsigned int*)

                |          |          sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)

                |          |          sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)

                |          |          sp_instr_stmt::execute(THD*, unsigned int*)

                |          |          sp_head::execute(THD*, bool)

                |          |          sp_head::execute_procedure(THD*, List*)

                |          |          mysql_execute_command(THD*)

                |          |          mysql_parse(THD*, char*, unsigned int, Parser_state*)

                |          |          dispatch_command(enum_server_command, THD*, char*, unsigned int)

                |          |          do_command(THD*)

                |          |          do_handle_one_connection(THD*)

                |          |          handle_one_connection

                |          |          pfs_spawn_thread

                |          |          start_thread

|          |--4.42%-- lock_grant(ib_lock_t*)

                |          |          lock_rec_dequeue_from_page(ib_lock_t*)

                |          |          lock_trx_release_locks(trx_t*)

                |          |          trx_commit_low(trx_t*, mtr_t*)

                |          |          trx_commit_for_mysql(trx_t*)

                |          |          innobase_commit(handlerton*, THD*, bool)

                |          |          ha_commit_low(THD*, bool, bool)

                |          |          MYSQL_BIN_LOG::commit(THD*, bool)

                |          |          ha_commit_trans(THD*, bool, bool)

                |          |          trans_commit_stmt(THD*)

                |          |          mysql_execute_command(THD*)

                |          |          sp_instr_stmt::exec_core(THD*, unsigned int*)

                |          |          sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)

                |          |          sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)

                |          |          sp_instr_stmt::execute(THD*, unsigned int*)

                |          |          sp_head::execute(THD*, bool)

                |          |          sp_head::execute_procedure(THD*, List*)

                |          |          mysql_execute_command(THD*)

                |          |          mysql_parse(THD*, char*, unsigned int, Parser_state*)

                |          |          dispatch_command(enum_server_command, THD*, char*, unsigned int)

                |          |          do_command(THD*)

                |          |          do_handle_one_connection(THD*)

                |          |          handle_one_connection

                |          |          pfs_spawn_thread

                |          |          start_thread





_spin_unlock_irqrestore CPU 时间的 94.19% MYSQL lock_wait_suspend_thread 函数消耗了,还有4.42%MSYQL lock_grant 函数消耗的

, lock_wait_suspend_thread 实现的是MYSQL 锁等待的逻辑,lock_grant的作用是获得锁。调用关系是lock_wait_suspend_thread -> pthread_cond_broadcast  -> _spin_unlock_irqrestore lock_grant -> pthread_cond_broadcast  -> _spin_unlock_irqrestore

一个线程在使用完 MUTEX 后释放 MUTEX 会调用pthread_cond_broadcast 函数唤醒在条件变量上等待 MUTEXT 的所有线程。



20.95%   mysqld  [kernel.kallsyms]    [k] retint_careful                                                                                                          

                                                                                         

             |

             --- retint_careful

                |         

                |--9.09%-- ut_delay(unsigned long)

                |         

                |--3.01%-- buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*

)

                |         

                |--2.51%-- _int_malloc

                |         

                |--1.91%-- cmp_dtuple_rec_with_match_low(dtuple_t const*, unsigned char const*, unsigned long const*, unsigned long, unsigned long*, unsigned long*)

                |         

                |--1.77%-- btr_cur_search_to_nth_level(dict_index_t*, unsigned long, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, unsigned long, char con

st*, unsigned long, mtr_t*)

                |         

                |--1.64%-- malloc

                |         

                |--1.37%-- get_mm_parts(RANGE_OPT_PARAM*, Item_func*, Field*, Item_func::Functype, Item*, Item_result)

                |         

                |--1.28%-- rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)

                |         

                |--1.20%-- SQL_SELECT::test_quick_select(THD*, Bitmap<64u>, unsigned long long, unsigned long long, bool, st_order::enum_order)

                |         

                |--1.20%-- mysql_update(THD*, TABLE_LIST*, List&, List&, Item*, unsigned int, st_order*, unsigned long long, enum_duplicates, bool, unsigne

d long long*, unsigned long long*)

                |         

                |--1.15%-- __memset_sse2

                |         

                |--1.11%-- page_cur_search_with_match(buf_block_t const*, dict_index_t const*, dtuple_t const*, unsigned long, unsigned long*, unsigned long*, unsigned

 long*, unsigned long*, page_cur_t*)

                |         

                |--0.98%-- sel_arg_range_seq_next(void*, st_key_multi_range*)

                |         

                |--0.93%-- pthread_getspecific

                |         

                |--0.93%-- ha_innobase::records_in_range(unsigned int, st_key_range*, st_key_range*)

                |         

                |--0.92%-- memcpy

                |         

                |--0.92%-- get_key_scans_params(PARAM*, SEL_TREE*, bool, bool, double)

                |         

                |--0.82%-- THD::decide_logging_format(TABLE_LIST*)

                |         

                |--0.75%-- alloc_root

                |         

                |--0.72%-- check_quick_select(PARAM*, unsigned int, bool, SEL_ARG*, bool, unsigned int*, unsigned int*, Cost_estimate*)

                |         

                |--0.59%-- optimize_cond(THD*, Item*, COND_EQUAL**, List*, bool, Item::cond_result*)

                |         

                |--0.59%-- start_stage_v1

                |         

                |--0.58%-- ha_innobase::extra(ha_extra_function)

                |         

                |--0.55%-- build_equal_items_for_cond(THD*, Item*, COND_EQUAL*, bool)

                |         

                |--0.55%-- row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)

                |         

                |--0.53%-- get_quick_keys(PARAM*, QUICK_RANGE_SELECT*, st_key_part*, SEL_ARG*, unsigned char*, unsigned int, unsigned char*, unsigned int)

                |         

                |--0.53%-- start_table_lock_wait_v1

                |         

                |--0.51%-- ha_innobase::innobase_get_index(unsigned int)

                |         

                |--0.51%-- _int_free

                |         

                |--0.51%-- handler::multi_range_read_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*)

                |         

                |--0.50%-- open_table(THD*, TABLE_LIST*, Open_table_context*)

                 --59.82%-- [...]

 

 

虽然 perf 报告中没有显示出MYSQL 锁等待调用 retint_careful ,但 lock_wait_suspend_thread 有做激活中断的操作,在激活中断的过程中就会调用

retint_careful 所以由此推断 lock_wait_suspend_thread 调用了 retint_careful,估计 --59.82%-- [...] 的意思是把这59.82%调用 retint_careful CPU时间

计算在 MYQLD 的进程中,而MYSQLD 进程主要的 CPU都是消耗在锁等待处理上,由此判断这59.82% retint_careful 调用时间,应该是

lock_wait_suspend_thread消耗的。

 

 

5.66%   mysqld  mysqld               [.] ut_delay(unsigned long)                                                                                                  

                                                                                         

             |

             --- ut_delay(unsigned long)


perf 的调用关系图报告中没有显示其他函数调用 ut_delay MSYQL ut_delay 函数是在 MSYQL等待 MUTEXT 时进行自旋操作的函数,在MYSQL

中所有等待 MUTEXT 的情况都会先调用 ut_delay 在用户态进行自旋,在通过用户态自旋无法获得 MUTEXT MYSQL才会调用 LINUX 内核函数在操作系统层进行自旋获取 MUTEXT,在前面我们看到 lock_wait_suspend_thread 消耗了 _spin_unlock_irqrestore 97%CPU 时间做释放操作系统层 mutext的操作,由此判断 ut_delay CPU 时间主要是被 lock_wait_suspend_thread 消耗了。

 

结论:

通过分析确定MYSQLD 内核态 CPU 使用率很高的原因就是更新数据的SQL语句锁争用导致的。

 

mutext 获取的大致过程:

线程检查某 mutext 是否能够获取 --> 如果不能获取线自旋 --> 自旋MSYQL 系统设定次数后还无法获得 mutext MYSQL 线程 SLEEP --> MSYQL 线程 SLEEP 时有 LINUX 负责在LINUX 内核中进行自旋操作等待 mutext

LINUX 自旋时在 LINUX 内核空间消耗 CPU ,如果在LINUX 层面自旋等待 MUTEXT MYSQL 线程很多,或导致LINUX 内核空间CPU 使用率很高。当MYSQL 线程 SLEEP LINUX 内核自旋等待 MUTEXT MYSQL 对应的等待是 OS_WAITS

扩展阅读:

























请使用浏览器的分享功能分享到微信等