本文来自获得《2021MongoDB技术实践与应用案例征集活动》入围案例奖作品

作者：张家侨

问题描述

本文介绍一次帮助业务排查线上查询操作慢的问题的详细过程以及解决方法。

业务在使用find().limit()命令查询数据时遇到卡顿，一次操作卡最多1分钟，问题描述如下：

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom:"zhang" } ], nid: 50, status: 0 }).sort({ timetag: 1 }).limit(200)

业务反馈：出问题的MongoDB版本为4.2.12。这个查询从慢日志来看要四五十秒，但是直接跑的话第一批返回数据很快，getmore的时候就慢了，你看下为啥会慢在getmore

从问题描述来看，直观上是getmore造成了查询卡顿，可能原因有如下：

getmore操作内核出现卡顿 -- 内核问题可能性较低
查询计划与被查询数据不匹配 -- 可能性较高

下面问题排查将从这两个主要问题点入手。

问题重现

将数据恢复到一个备用集群，然后使用业务的查询进行操作：

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200)

第一次查询立刻返回，当获取数据条数到102条时，出现卡顿。操作产生慢日志，发现是触发了一次getmore操作。日志输出如下：

2021 -12-15T10 :21:55.007 + 0800 I COMMAND [conn857] command xxx_db.xxx_collection appName: "MongoDB Shell" command: getMore { getMore: 1244654399487918694 , collection: "xxx_collection" , ****** planSummary: IXSCAN { timetag: 1 } cursorid:1244654399487918694 keysExamined:11338296 docsExamined:11338296cursorExhausted:1 numYields:88583 nreturned:99 reslen:100170 locks:{ReplicationStateTransition: { acquireCount: { w: 88584 } }, Global: { acquireCount: { r: 88584 } }, Database: { acquireCount: { r: 88584 } }, Collection: { acquireCount: { r: 88584 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 15442700982 , timeReadingMicros: 40865619 },timeWaitingMicros: { cache: 33773 } }protocol:op_msg 65270ms

问题排查

确认是否是getmore的问题

在MongoDB中，其查询返回结果批大小默认为101。也就是说，MongoDB一次查询会找到101个结果然后一次性返回，当需要更多结果时再查找101个结果，通过getmore返回。

我们是可以通过batchSize操作来设置MongoDB一次返回的数据个数的，包括getmore操作。

如果是getmore的问题，理论上调整其batchSize并不会影响问题的出现，所以我们进行了如下调整。

将batchSize设置为150：

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).batchSize(150).limit(200)

第一次查询立刻返回结果，超过batchSize数据后，getmore又卡顿，问题依旧：

2021 -12-15T10 :25:54.781 + 0800 I COMMAND [conn859] command xxx_db.xxx_collection appName: "MongoDB Shell" command: getMore { getMore: 8826588127480667691 , collection: "xxx_collection" , batchSize: 150 , ******planSummary: IXSCAN { timetag: 1 }cursorid:8826588127480667691 keysExamined:11338234 docsExamined:11338234cursorExhausted:1 numYields:88582 nreturned:50 reslen:50818 locks:{ReplicationStateTransition: { acquireCount: { w: 88583 } }, Global: { acquireCount: { r: 88583 } }, Database: { acquireCount: { r: 88583 } }, Collection: { acquireCount: { r: 88583 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 16610295032 , timeReadingMicros: 30201139 },timeWaitingMicros: { cache: 17084 } }protocol:op_msg 53826ms

调整为199后效果也类似，但是调整为200后，问题变为第一次查询就出现卡顿：

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).batchSize(200).limit(200)

相应慢日志如下：

2021 -12-15T10 :27:23.729 + 0800 I COMMAND [conn859] command xxx_db.xxx_collection appName: "MongoDB Shell" command: find { find : "xxx_collection" , filter: { timetag: { $gt : 1636513802167 .0 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 .0, status: 0 .0 }, projection: { $sortKey : { $meta : "sortKey" } }, sort : { timetag: 1 .0 }, limit: 200 , batchSize: 200 , ******planSummary: IXSCAN { timetag: 1 }keysExamined:11338445 docsExamined:11338445 cursorExhausted:1 numYields:88582nreturned:200 queryHash:ECA82717 planCacheKey:AC7EC9E3 reslen:202045 locks:{ReplicationStateTransition: { acquireCount: { w: 88583 } }, Global: { acquireCount: { r: 88583 } }, Database: { acquireCount: { r: 88583 } }, Collection: { acquireCount: { r: 88583 } }, Mutex: { acquireCount: { r: 2 } } } storage:{ data: { bytesRead: 17688667726 , timeReadingMicros: 14907251 },timeWaitingMicros: { cache: 11784 } }protocol:op_msg 36654ms

所以我们可以基本排除是getmore操作本身的问题。从慢操作日志我们可以看出，查询计划使用timetag索引来进行数据获取和过滤，一共遍历了1千万的数据。问题应该和数据查询计划以及数据分布有关，具体在查询进行第199～200个结果获取时发生了卡顿，且遍历timetag索引不够快。

所以我们的分析方向转为查询计划以及查询的数据上，确认是否存在问题，即查询计划不适合目标数据。

分析查询的数据分布

首先，我们需要了解业务数据的分布格式以及查询的目的，业务数据关键字段如下：

{ "nto" : , "nfrom" : , "nid" : , "timetag" : , "status" : }

从库表名以及数据格式来看，查询数据为某种消息的传递记录。

目标查询如下：

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200)

而查询所要做的就是要找到某人在某个时间点后，接收到或者发送的消息记录，且 nid = 50 & status = 0 。所以整个查询比较重要的条件就是时间timetag以及用户名。而大于某个时间点这种查询可能存在目标数据量特别大的情形，下面进行确认。

数据在timetag上的分布

mongos>db.xxx_collection.count()
538058824

由上我们可以看出，整个目标集合数据量为5亿多，数据量较大。下面针对性取出造成卡顿的第199个和第200个查询结果的timetag。

第199个结果timetag：

{ "_id" : ObjectId ( "618b380a82010a586b4265e6" ), "timetag" : NumberLong ( "1636513802759" ) }

第200个结果timetag：

{ "_id" : ObjectId ( "618ce26682010a586be027f9" ), "timetag" : NumberLong ( "1636622950801" ) }

以查询条件的时间点 1636513802167 为 T0 ，第199个结果的时间点 1636513802759 为 T1 ，第200个结果的时间点 1636622950801 为 T2 。下面以这三个时间为节点，对数据进行分段分析。

查询数据的分布

T0后总的数据量：

mongos>db.xxx_collection.find({ timetag: { $gt : 1636513802167 }}).count()
191829505

处于目标时间段的数据总量为1.9亿多，数据量比较大

T0~T1的数据量：

mongos>db.xxx_collection.find({ $and : [{timetag: { $gt : 1636513802167 }}, {timetag: { $lte : 1636513802759 }}]}).count()
287

T1~T2之间的数据量：

mongos>db.xxx_collection.find({ $and : [{timetag: { $gt : 1636513802759 }}, {timetag: { $lt : 1636622950801 }}]}).count()
11338157

T2后的数据量：

mongos>db.xxx_collection.find({timetag: { $gte : 1636622950801 }}).count()
180491061

查询结果的分布

总查询结果：

mongos>db.xxx_collection.find({ timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).count()
428

T0~T1：

mongos>db.xxx_collection.find({ $and :[{timetag: { $gt : 1636513802167 }}, {timetag: { $lte : 1636513802759 }}], $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).count()
199

T1~T2：

mongos>db.xxx_collection.find({ $and :[{timetag: { $gt : 1636513802759 }}, {timetag: { $lt : 1636622950801 }}], $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).count()
0

T2后：

mongos>db.xxx_collection.find({ timetag: { $gte: 1636622950801 }, $or: [ { nto:"zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0}).sort({ timetag: 1 }).count()
229

从数据以及相应结果数量的分布可以看出，查询结果主要分布在T0~T1和T2后的时间段内。T1～T2时间段内不存在符合条件的结果，有1千万不符合条件的数据存在。总结来说，结果分布在timetag字段上呈现两头分布，中间存在巨大空洞。

分析执行的查询计划

原查询计划

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200).explain( "executionStats" )

得到的查询计划：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT" ,
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN" ,
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"timetag" : [
"(1636513802167.0, inf.0]"
]
}
}
}
}

由上面的结果我们可以看出，原查询的查询计划为利用timetag的索引进行扫描，然后逐个过滤出符合timetag条件的记录。在数据分布分析中我们知道，符合目标timetag的数据有1.9亿，而扫描这些数据将会非常慢，即使使用索引。因为getmore操作使用的cursor是原查询计划产生的，同一个查询内只会使用同一个查询计划。下面通过分段执行原查询计划来佐证扫描timetag慢。

T0 ～ T1 数据段执行原计划

使用上述查询计划查询T0~T1的数据：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$lte" : 1636513802759
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT" ,
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN" ,
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"timetag" : [
"(1636513802167.0, 1636513802759.0]"
]
}
}
}
},

结果立刻得到，因为总的需要扫描的数据量只有287。

T1 ～ T2 数据段执行原计划

使用上述查询计划查询T1～T2的数据：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$lt" : 1636622950801
}
},
{
"timetag" : {
"$gt" : 1636513802759
}
}
]
},
"winningPlan" : {
"stage" : "SORT" ,
"sortPattern" : {
"timetag" : 1
},
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR" ,
"inputStage" : {
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "OR" ,
"inputStages" : [
{
"stage" : "IXSCAN" ,
"keyPattern" : {
"nto" : 1 ,
"validflag" : 1 ,
"timetag" : 1
},
"indexName" : "nto_1_validflag_1_timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"nto" : [ ],
"validflag" : [ ],
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"nto" : [
"[\"zhang\", \"zhang\"]"
],
"validflag" : [
"[MinKey, MaxKey]"
],
"timetag" : [
"(1636513802759.0, 1636622950801.0)"
]
}
},
{
"stage" : "IXSCAN" ,
"keyPattern" : {
"nfrom" : 1 ,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"nfrom" : [ ],
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"nfrom" : [
"[\"zhang\", \"zhang\"]"
],
"timetag" : [
"(1636513802759.0, 1636622950801.0)"
]
}
}
]
}
}
}
},

查询计划变了，强制使用timetag_1索引来使用原查询计划：

mongos>db.xxx_collection.find({ $and:[{timetag: { $gt: 1636513802759 }}, {timetag:{$lt: 1636622950801}}], $or: [ { nto: "zhang" }, { nfrom:"zhang" } ], nid: 50, status: 0 }).sort({ timetag: 1}).limit(200).hint("timetag_1").explain("executionStats")

查询计划：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$lt" : 1636622950801
}
},
{
"timetag" : {
"$gt" : 1636513802759
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT" ,
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN" ,
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"timetag" : [
"(1636513802759.0, 1636622950801.0)"
]
}
}
}
},

查询耗时：

2021 -12-15T11 :18:43.650 + 0800 I COMMAND [conn913] command xxx_db.xxx_collection appName: "MongoDB Shell" command: find { find : "xxx_collection" , filter: { $and : [ { timetag: { $gt : 1636513802759 .0 } }, { timetag: { $lt : 1636622950801 .0 } } ], $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 .0, status: 0 .0 }, projection: { $sortKey : { $meta : "sortKey" } }, sort : { timetag: 1 .0 }, hint: { $hint : "timetag_1" }, limit: 200 , runtimeConstants: { localNow: new Date(1639538294423), clusterTime:Timestamp(1639538291, 1 ) },shardVersion: [ Timestamp(0, 0 ), ObjectId( '000000000000000000000000' ) ], ****** planSummary: IXSCAN {timetag: 1 } keysExamined:11338157 docsExamined:11338157 cursorExhausted:1numYields:88579 nreturned:0 reslen:386 locks:{ ReplicationStateTransition: {acquireCount: { w: 88580 } }, Global: {acquireCount: { r: 88580 } }, Database:{ acquireCount: { r: 88580 } },Collection: { acquireCount: { r: 88580 } }, Mutex: {acquireCount: { r: 2 } } } storage:{data: { bytesRead: 16223299833 ,timeReadingMicros: 9431804 },timeWaitingMicros: { cache: 14082 } }protocol:op_msg 29226ms

我们发现，查询T1～T2的空洞区域非常慢，耗时29秒，因为需要扫描1千万多的数据。

T2 后数据段执行原计划

mongos> db.xxx_collection.find({timetag: { $gt : 1636622950801 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200).explain( "executionStats" )

查询计划：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636622950801
}
}
]
},
"winningPlan" : {
"stage" : "SORT" ,
"sortPattern" : {
"timetag" : 1
},
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR" ,
"inputStage" : {
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "OR" ,
"inputStages" : [
{
"stage" : "IXSCAN" ,
"keyPattern" : {
"nto" : 1 ,
"validflag" : 1 ,
"timetag" : 1
},
"indexName" : "nto_1_validflag_1_timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"nto" : [ ],
"validflag" : [ ],
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"nto" : [
"[\"zhang\", \"zhang\"]"
],
"validflag" : [
"[MinKey, MaxKey]"
],
"timetag" : [
"(1636622950801.0, inf.0]"
]
}
},
{
"stage" : "IXSCAN" ,
"keyPattern" : {
"nfrom" : 1 ,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"nfrom" : [ ],
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"nfrom" : [
"[\"zhang\", \"zhang\"]"
],
"timetag" : [
"(1636622950801.0, inf.0]"
]
}
}
]
}
}
}
},

查询计划变了，强制使用timetag_1索引来使用原查询计划：

mongos>db.xxx_collection.find({ timetag: { $gt : 1636622950801 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200).hint( "timetag_1" ).explain( "executionStats" )

查询计划：

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636622950801
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT" ,
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN" ,
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"timetag" : [
"(1636622950801.0, inf.0]"
]
}
}
}
},

日志时间输出：

2021 -12-15T11 :36:34.388 + 0800 I COMMAND [conn918] command xxx_db.xxx_collection appName: "MongoDB Shell" command: explain { explain: { find : "xxx_collection" , filter: { timetag: { $gt : 1636622950801 .0 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 .0, status: 0 .0 }, sort : { timetag: 1 .0 }, hint: { $hint : "timetag_1" }, limit: 200 , ****** $db : "xxx_db" } numYields:1109015 reslen:2691 locks:{ReplicationStateTransition: { acquireCount: { w: 1109016 } }, Global: { acquireCount: { r: 1109016 } }, Database: { acquireCount: { r: 1109016 } }, Collection: { acquireCount: { r: 1109016 } }, Mutex: { acquireCount: { r: 2 } } } storage:{ data: { bytesRead: 195293544507 , timeReadingMicros: 518472952 },timeWaitingMicros: { cache: 272870 } }protocol:op_msg 801697ms

由上面可以看出，查询T2后的数据时，需要800秒，因为需要扫描1.8亿的数据。因为原查询使用limit(200)限制了查询的总结果数，且只有一个结果在这个时间段内，所以查询找到一个结果即可返回，不需要扫描整个时间段。

问题原因总结

总结来说，问题出现原因是MongoDB给出的查询计划不符合目标数据的查询。

原查询计划为使用timetag索引来进行扫描，获取结果后再根据其他条件进行过滤。

我们发现，查询结果在timetag上分段分布，分布在timetag的两头，中间存在一大段无效数据。第1～199的结果分布在T0～T1时间段内，第200后的结果分布在T2时间段后。

如果使用原查询计划，即只使用timetag索引来扫描全表，在查询第199个后的结果时将会非常慢，因为这些结果分布在1.9亿的数据里。这也解释了第一次查询获取101个结果快，因为只需查扫描T0～T1时间段内400多个数据，第二次查询需要获取第102～202个数据则慢，因为需要先扫描1千多万的无效数据然后才能返回结果。原查询计划不够高效，导致扫描比较慢。

问题解决方案

只使用timetag索引进行数据遍历较低效，使用聚合索引来在遍历数据时进行过滤，减少遍历的数据量。所以预期要使用timetag和nfrom以及timetag和nto的联合索引来进行并行查询，最后将两个查询结果进行合并。由于MongoDB的查询优化器不能将原有的查询转化为上面预期的查询计划，所以我们需要改写查询计划，便于MongoDB的查询优化器进行识别。

将原查询：

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200)

转化为等价的优化查询：

db.xxx_collection.find({ $or : [ { $and : [{nto: "zhang" }, {timetag: { $gt : 1636513802167 }}, { nid:50}, {status: 0 }]}, { $and : [{nfrom: "zhang" }, {timetag: { $gt : 1636513802167 }}, { nid:50}, {status: 0 }]} ] }).sort({timetag:1}).limit(200)

并在 nto 和 timetag 以及 nfrom 和 timetag （备注：原库已经有nfrom和timetag的联合索引）上建立 联合索引：

{
"v" : 2 ,
"key" : {
"nfrom" : 1 ,
"timetag" : 1
},
"name" : "nfrom_1_timetag_1" ,
"ns" : "xxx_db.xxx_collection"
},
{
"v" : 2 ,
"key" : {
"nto" : 1 ,
"timetag" : 1
},
"name" : "nto_1_timetag_1" ,
"ns" : "xxx_db.xxx_collection"
},

得到的查询计划：

"parsedQuery" : {
"$or" : [
{
"$and" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
},
{
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"nto" : {
"$eq" : "zhang"
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
}
]
},
"winningPlan" : {
"stage" : "SUBPLAN" ,
"inputStage" : {
"stage" : "LIMIT" ,
"limitAmount" : 200 ,
"inputStage" : {
"stage" : "PROJECTION_SIMPLE" ,
"transformBy" : {
"timetag" : 1
},
"inputStage" : {
"stage" : "SORT_MERGE" ,
"sortPattern" : {
"timetag" : 1
},
"inputStages" : [
{
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN" ,
"keyPattern" : {
"nfrom" : 1 ,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"nfrom" : [ ],
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"nfrom" : [
"[\"zhang\", \"zhang\"]"
],
"timetag" : [
"(1636513802167.0, inf.0]"
]
}
}
},
{
"stage" : "FETCH" ,
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN" ,
"keyPattern" : {
"nto" : 1 ,
"timetag" : 1
},
"indexName" : "nto_1_timetag_1" ,
"isMultiKey" : false ,
"multiKeyPaths" : {
"nto" : [ ],
"timetag" : [ ]
},
"isUnique" : false ,
"isSparse" : false ,
"isPartial" : false ,
"indexVersion" : 2 ,
"direction" : "forward" ,
"indexBounds" : {
"nto" : [
"[\"zhang\", \"zhang\"]"
],
"timetag" : [
"(1636513802167.0, inf.0]"
]
}
}
}
]
}
}
}
},

这个查询计划符合我们的预期。

查询等价证明

上述优化的查询计划可以通过布尔等价运算得到。

原查询为：

db.xxx_collection.find({timetag: { $gt : 1636513802167 }, $or : [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50 , status: 0 }).sort({ timetag: 1 }).limit(200)

我们将find的查询条件定义抽象如下：

a：timetag > 1636513802167
b：nto = "zhang"
c：nfrom = "zhang"
d：nid = 50
e：status = 0

所以原查询条件相当于一个合取范式：

$$
a \bigwedge (b \bigvee c) \bigwedge d \bigwedge e
$$

通过逻辑运算，可以转化为一个析取范式：

$$
(a \bigwedge b \bigwedge d \bigwedge e) \bigvee (a \bigwedge c \bigwedge d\bigwedge e)
$$

对应如下查询：

db.xxx_collection.find({ $or : [ { $and : [{nto: "zhang" }, {timetag: { $gt : 1636513802167 }}, { nid:50}, {status: 0 }]}, { $and : [{nfrom: "zhang" }, {timetag: { $gt : 1636513802167 }}, { nid:50}, {status: 0 }]} ] }).sort({timetag:1}).limit(200)

查询计划等价得证

关于作者： 网易游戏张家侨

MongoDB find getmore操作慢问题排查