【MySQL】SHOW ENGINE INNODB STATUS \G之Pages flushed up to的理解

1)前言以及问题:
我们现在先看截图


Log sequence number --内存中日志产生的序列号
Log flushed up to --刷入redo日志的值,字节数
Pages flushed up to --这里正是我要讲的问题,后面详述
Last checkpoint at --最后一次检查点的位置

这里Pages flushed up to到底是什么?而且数据库没有写数据的情况下,总是Pages flushed up to 不等于 Last checkpoint at,这又是为何,并且Pages flushed up to - Last checkpoint at=9(这里你们可以仔细去观察)?
查过网上资料,也问过一些大佬,答案真的是五花八门
他们中有两种最主要的答案,一种是顾名思义,脏页刷新到的LSN;另外一种是:new_modification LSN


2)分析:
对于第一种,说脏页刷新到磁盘的lsn的,明显就不成立。其实检查点的意思就是LSN之前的数据已经落盘,这里检查点的数据落盘就包含了脏页的落盘;实际测试过程中Pages flushed up to 明显大于检查点的LSN,而且检查点之后,就不能保证所有数据是落盘的,那么如果说Pages flushed up to 是脏页刷新到磁盘的lsn,那么就会出现一种情况,lsn1=3000脏页刷盘了(pages flushed up to是3000),lsn=2800的脏页没刷盘,那么你认为这个值有意义吗?


对于第二种:说是flush list做检查点的页的new modification lsn(即一个内存page包含两个lsn,flush list是按照页的第一次修改的lsn排序的,只要一个页修改就加入flush list,而new modification lsn是写了多少字节数据,new modification lsn 就加多少);为什么数据库最后没做修改了,Pages flushed up to 不等于 Last checkpoint at?他们给出的答复是一个页被多次修改。

为了更清楚,我把这些数据结构贴出来:

每个buffer pool都包含flush list
struct buf_pool_t{
    ...
    UT_LIST_BASE_NODE_T(buf_page_t) flush_list;
    UT_LIST_BASE_NODE_T(buf_page_t) free;
    ...
}

class buf_page_t {
public:
        ...
        //这个值是只要做了修改,lsn就会增加,如果一个页在内存中没有修改,就一直是0,也不会加入flush list
        lsn_t           newest_modification;
                                        /*!< log sequence number of
                                        the youngest modification to
                                        this block, zero if not
                                        modified. Protected by block
                                        mutex */
        //这个值,只要当前页被刷入了磁盘,他的值就会置为0,第一次对页的修改,就会加入flush list,第二次对该页修改,此值不会增加,因为flush list 是推进检查点的。
        lsn_t           oldest_modification;
                                        /*!< log sequence number of
                                        the START of the log entry
                                        written of the oldest
                                        modification to this block
                                        which has not yet been flushed
                                        on disk; zero if all
                                        modifications are on disk.
                                        Writes to this field must be
                                        covered by both block->mutex
                                        and buf_pool->flush_list_mutex. Hence
                                        reads can happen while holding
                                        any one of the two mutexes */
        ...
}

按照这种说法,我先绘制一个图:
    

图中flush list中,page no为3并且oldest modification lsn =400的时候,做了checkpoint了,那么按照第二种说法(Pages flushed up to的值为new modification lsn),则此时的Pages flushed lsn to的值为800;但图中page no为4的页的new modification lsn还没有刷盘;这样的话,其实Pages flushed up to这值就没有什么意义了。


3)要想了解真相,我们就得从源码探究

    ①首先找到源码中show engine innodb status \G  pages flushed up to的位置,我们发现,他调用了log_buf_pool_get_oldest_modification;从字面意思来说貌似是取buffer pool页的oldest modification lsn,究竟是哪些页呢,还需要探究。
   log_print(
/*======*/
        FILE*   file)   /*!< in: file where to print */
{
        double  time_elapsed;
        time_t  current_time;

        log_mutex_enter();

        fprintf(file,
                "Log sequence number " LSN_PF "\n"
                "Log flushed up to   " LSN_PF "\n"
                "Pages flushed up to " LSN_PF "\n"
                "Last checkpoint at  " LSN_PF "\n",
                log_sys->lsn,
                log_sys->flushed_to_disk_lsn,
                log_buf_pool_get_oldest_modification(),
                log_sys->last_checkpoint_lsn);

                ...
}

    ②查看log_buf_pool_get_oldest_modification源代码

static lsn_t
log_buf_pool_get_oldest_modification(void)
/*======================================*/
{
        lsn_t   lsn;

        ut_ad(log_mutex_own());

        lsn = buf_pool_get_oldest_modification();
        //这里如果buf_pool_get_oldest_modification返回LSN为0(这里的言下之意是oldest lsn为0,即数据都刷入了磁盘),则取log_sys->lsn
        if (!lsn) {

                lsn = log_sys->lsn;
        }

        return(lsn);
}
   从这个函数我们可以看出,如果这时候buffer pool中数据脏页都刷入了磁盘(同时这时候页没数据写入),那么取的就是最大的LSN(log_sys->lsn),也即:show engineinnodb status显示的Log sequence number。


    ③但是这里我们任然不知道oldest modification lsn是怎么取的,这就需要查看buf_pool_get_oldest_modification的源代码:
    buf_pool_get_oldest_modification(void)
/*==================================*/
{
        lsn_t           lsn = 0;
        lsn_t           oldest_lsn = 0;

        /* When we traverse all the flush lists we don't want another
        thread to add a dirty page to any flush list. */
        log_flush_order_mutex_enter();

        //遍历所有的buffer pool
        for (ulint i = 0; i < srv_buf_pool_instances; i++) {
                buf_pool_t*     buf_pool;

                buf_pool = buf_pool_from_array(i);

                buf_flush_list_mutex_enter(buf_pool);

                buf_page_t*     bpage;

                /* We don't let log-checkpoint halt because pages from system
                temporary are not yet flushed to the disk. Anyway, object
                residing in system temporary doesn't generate REDO logging. */

                //从这里我们可以看出bpage是取flush list中最大的oldest modifcation LSN(注意这里的逻辑,如果是系统临时表空间,就不应该算在内)
                for (bpage = UT_LIST_GET_LAST(buf_pool->flush_list);
                     bpage != NULL
                        && fsp_is_system_temporary(bpage->id.space());
                     bpage = UT_LIST_GET_PREV(list, bpage)) {
                        /* Do nothing. */
                }

                if (bpage != NULL) {
                        ut_ad(bpage->in_flush_list);
                        lsn = bpage->oldest_modification;
                }

                buf_flush_list_mutex_exit(buf_pool);
                //上面是从buffer pool中的flush list取最大oldest modication lsn的page,而这里是从每个buffer pool 的最大oldest modification lsn中取最小的oldest modification lsn 的page。
                if (!oldest_lsn || oldest_lsn > lsn) {
                        oldest_lsn = lsn;
                }
        }

        log_flush_order_mutex_exit();

        /* The returned answer may be out of date: the flush_list can
        change after the mutex has been released. */

        return(oldest_lsn);
}
 
④到这里,我们整个流程就明朗了,pages flushed up to 取的是所有buffer pool中最大oldest modification lsn頁中的带有最小的 oldest modification lsn的值,如果去到的oldest modification lsn为0,意味着没有脏页,那么我们就取log_sys->lsn的值,即show engineinnodb status显示的Log sequence number。
举个例子:我们的buffer pool 设置为4个;第一个buffer pool,我们取flush list 中页带oldest modification lsn最大值为100
第二个buffer pool,我们取flush list 中页带oldest modification lsn最大值为140;第三个buffer pool,我们取flush list 中页带oldest modification lsn最大值为98;第四个buffer pool,我们取flush list 中页带oldest modification lsn最大值为130;那么我们得到的pages flushed up to 就是98;


⑤我们知道了这个Pages flushed up to从哪里取的值,但是现在我并不知道这个值的含义啊

我们来看一个函数
/** Make a checkpoint. Note that this function does not flush dirty
blocks from the buffer pool: it only checks what is lsn of the oldest
modification in the pool, and writes information about the lsn in
log files. Use log_make_checkpoint_at() to flush also the pool.
**/
//从这个函数的注释我们可以明显知道,这个函数是用来做检查点的
log_checkpoint(
        bool    sync,
        bool    write_always)
{
        lsn_t   oldest_lsn;

        ut_ad(!srv_read_only_mode);

        if (recv_recovery_is_on()) {
                recv_apply_hashed_log_recs(TRUE);
        }

#ifndef _WIN32
        switch (srv_unix_file_flush_method) {
        case SRV_UNIX_NOSYNC:
                break;
        case SRV_UNIX_O_DSYNC:
        case SRV_UNIX_FSYNC:
        case SRV_UNIX_LITTLESYNC:
        case SRV_UNIX_O_DIRECT:
        case SRV_UNIX_O_DIRECT_NO_FSYNC:
                fil_flush_file_spaces(FIL_TYPE_TABLESPACE);
        }
#endif /* !_WIN32 */

        log_mutex_enter();

        ut_ad(!recv_no_log_write);
        //取得oldest LSN
        oldest_lsn = log_buf_pool_get_oldest_modification();

        /* Because log also contains headers and dummy log records,
        log_buf_pool_get_oldest_modification() will return log_sys->lsn
        if the buffer pool contains no dirty buffers.
        We must make sure that the log is flushed up to that lsn.
        If there are dirty buffers in the buffer pool, then our
        write-ahead-logging algorithm ensures that the log has been
        flushed up to oldest_lsn. */

       //从这里我们可以知道为什么last checkpoint lsn跟pages flushed up to不相等了,因为在mysql做检查点的时候,还要打上MLOG_CHECKPOINT redo 日志,而这个MLOG_CHECKPOINT恰好占用9个字节,所以就有了在脏页都刷盘之后,pages flushed up to-last checkpoint lsn=9
        ut_ad(oldest_lsn >= log_sys->last_checkpoint_lsn);
        if (!write_always
            && oldest_lsn
            <= log_sys->last_checkpoint_lsn + SIZE_OF_MLOG_CHECKPOINT) {
                /* Do nothing, because nothing was logged (other than
                a MLOG_CHECKPOINT marker) since the previous checkpoint. */
                log_mutex_exit();
                return(true);
        }


        lsn_t           flush_lsn       = oldest_lsn;
        const bool      do_write
                = srv_shutdown_state == SRV_SHUTDOWN_NONE
                || flush_lsn != log_sys->lsn;

        if (fil_names_clear(flush_lsn, do_write)) {
                ut_ad(log_sys->lsn >= flush_lsn + SIZE_OF_MLOG_CHECKPOINT);
                flush_lsn = log_sys->lsn;
        }

        log_mutex_exit();

        log_write_up_to(flush_lsn, true);

        DBUG_EXECUTE_IF(
                "using_wa_checkpoint_middle",
                if (write_always) {
                        DEBUG_SYNC_C("wa_checkpoint_middle");

                        const my_bool b = TRUE;
                        buf_flush_page_cleaner_disabled_debug_update(
                                NULL, NULL, NULL, &b);
                        dict_stats_disabled_debug_update(
                                NULL, NULL, NULL, &b);
                        srv_master_thread_disabled_debug_update(
                                NULL, NULL, NULL, &b);
                });

        log_mutex_enter();

        ut_ad(log_sys->flushed_to_disk_lsn >= flush_lsn);
        ut_ad(flush_lsn >= oldest_lsn);
        //检查点大于等于oldest_lsn说明已经刷盘
        if (log_sys->last_checkpoint_lsn >= oldest_lsn) {
                log_mutex_exit();
                return(true);
        }

        if (log_sys->n_pending_checkpoint_writes > 0) {
                /* A checkpoint write is running */
                log_mutex_exit();

                if (sync) {
                        /* Wait for the checkpoint write to complete */
                        rw_lock_s_lock(&log_sys->checkpoint_lock);
                        rw_lock_s_unlock(&log_sys->checkpoint_lock);
                }

                return(false);
        }
       //下一次LSN的位置
        log_sys->next_checkpoint_lsn = oldest_lsn;
        log_write_checkpoint_info(sync);
        ut_ad(!log_mutex_own());

        return(true);
}


4)总结:

pages flushed up to指的是下一次即将做checkpoint lsn 的位置;在没有新数据的写入的情况下,
pages flushed up to取的是Log sequence number(log_sys->lsn);在没数据写入的情况下,为什么last checkpoint point不等于pages flushed up to?是因为做checkpoint是同时redo日志会写MLOG_CHECKPOINT,而MLOG_CHECKPOINT占用九个字节,所以会出现pages flushed up to-last checkpoint point=9;




请使用浏览器的分享功能分享到微信等