向量方式查询存在的问题

OCP版本:Version: 4.3.0-20240711151555

observer版本: oceanbase-4.2.1.8-108010012024082314.el7.x86_64.rpm

使用的是企业版  OceanBase

问题发现

在跑批完成后,对整个跑批进行分析,在OCP,slowsql中发现耗时长,cpu消耗高的SQL,如下图:

1729578467

1729578543

分析此类sql

库表已做脱敏,XXXX表示

SELECT
  "SEQ_NO",
  "CHANNEL_SEQ_NO",
  "SUB_SEQ_NO",
  "TAE_SUB_SEQ_NO",
  TRAN_DATEFROM
  XXX."XXX" WHERE ("SEQ_NO", "TRAN_DATE") >= (10023931,'2024-01-19 00:00:00') AND ("SEQ_NO", "TRAN_DATE") <= (10024360,'2024-01-19 00:00:00');

执行计划如下:

=================================================================|ID|OPERATOR                 |NAME        |EST.ROWS|EST.TIME(us)|-----------------------------------------------------------------|0 |PX COORDINATOR           |            |1281311 |5678820     ||1 |└─EXCHANGE OUT DISTR     |:EX10000    |1281311 |4161250     ||2 |  └─PX PARTITION ITERATOR|            |1281311 |748539      ||3 |    └─TABLE FULL SCAN    |XXXXXX      |1281311 |748539      |=================================================================Outputs & filters:-------------------------------------
  0 - output([INTERNAL_FUNCTION(XXXXXX.SEQ_NO(0x7fdf66625ea0), XXXXXX.CHANNEL_SEQ_NO(0x7fdf6662dbe0), XXXXXX.SUB_SEQ_NO(0x7fdf6662e1e0),
       XXXXXX.TAE_SUB_SEQ_NO(0x7fdf6662e7e0), XXXXXX.TRAN_DATE(0x7fdf66623ac0))(0x7fdf667ed850)]), filter(nil), rowset=256
  1 - output([INTERNAL_FUNCTION(XXXXXX.SEQ_NO(0x7fdf66625ea0), XXXXXX.CHANNEL_SEQ_NO(0x7fdf6662dbe0), XXXXXX.SUB_SEQ_NO(0x7fdf6662e1e0),
       XXXXXX.TAE_SUB_SEQ_NO(0x7fdf6662e7e0), XXXXXX.TRAN_DATE(0x7fdf66623ac0))(0x7fdf667ed850)]), filter(nil), rowset=256
      dop=1
  2 - output([XXXXXX.SEQ_NO(0x7fdf66625ea0)], [XXXXXX.TRAN_DATE(0x7fdf66623ac0)], [XXXXXX.CHANNEL_SEQ_NO(0x7fdf6662dbe0)], [XXXXXX.SUB_SEQ_NO(0x7fdf6662e1e0)],
       [XXXXXX.TAE_SUB_SEQ_NO(0x7fdf6662e7e0)]), filter(nil), rowset=256
      force partition granule  3 - output([XXXXXX.SEQ_NO(0x7fdf66625ea0)], [XXXXXX.TRAN_DATE(0x7fdf66623ac0)], [XXXXXX.CHANNEL_SEQ_NO(0x7fdf6662dbe0)], [XXXXXX.SUB_SEQ_NO(0x7fdf6662e1e0)],
       [XXXXXX.TAE_SUB_SEQ_NO(0x7fdf6662e7e0)]), filter([(cast(XXXXXX.SEQ_NO(0x7fdf66625ea0), NUMBER(-1, -1))(0x7fdf66626d70), XXXXXX.TRAN_DATE(0x7fdf66623ac0))(0x7fdf66635dc0) 
      >= (10023931, cast('2024-01-19 00:00:00', DATE(-1, -1))(0x7fdf66627fc0))(0x7fdf666364f0)(0x7fdf666256d0)], [(cast(XXXXXX.SEQ_NO(0x7fdf66625ea0), NUMBER(-1,
       -1))(0x7fdf6662b800), XXXXXX.TRAN_DATE(0x7fdf66623ac0))(0x7fdf66636f90) <= (10024360, cast('2024-01-19 00:00:00', DATE(-1, -1))(0x7fdf6662ca50))(0x7fdf666376c0)(0x7fdf6662a440)]), rowset=256
      access([XXXXXX.SEQ_NO(0x7fdf66625ea0)], [XXXXXX.TRAN_DATE(0x7fdf66623ac0)], [XXXXXX.CHANNEL_SEQ_NO(0x7fdf6662dbe0)], [XXXXXX.SUB_SEQ_NO(0x7fdf6662e1e0)],
       [XXXXXX.TAE_SUB_SEQ_NO(0x7fdf6662e7e0)]), partitions(p[0-14])
      is_index_back=false, is_global_index=false, filter_before_indexback[false,false], 
      range_key([XXXXXX.SEQ_NO(0x7fdf66625ea0)], [XXXXXX.TRAN_DATE(0x7fdf66623ac0)]), range(MIN,MIN ; MAX,MAX)always trueUsed Hint:-------------------------------------
  /*+
      
  */Qb name trace:-------------------------------------
  stmt_id:0, stmt_type:T_EXPLAIN 
  stmt_id:1, SEL$1Outline Data: 
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1" "XXXXXX"."XXXXXX"@"SEL$1")
      OPTIMIZER_FEATURES_ENABLE('4.2.1.0')
      END_OUTLINE_DATA
  */Optimization Info:-------------------------------------
  XXXXXX:
      table_rows:11531798
      physical_range_rows:11531798
      logical_range_rows:11531798
      index_back_rows:0
      output_rows:1281310
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[IDX_XXXXXX_3, IDX_XXXXXX_15, IDX_XXXXXX_11, IDX_XXXXXX_7, IDX_XXXXXX_9, IDX_XXXXXX_4, IDX_XXXXXX_2, IDX_XXXXXX_12, IDX_XXXXXX_6, IDX_XXXXXX_13, IDX_XXXXXX_ORIG_TRAN_TIMESTAMP, IDX_XXXXXX_10, IDX_XXXXXX_14, IDX_XXXXXX_8, IDX_XXXXXX_5, IDX_XXXXXX_1, XXXXXX]
      stats version:1729088173113156
      dynamic sampling level:0
  Plan Type:      DISTRIBUTED
  Note:
      Degree of Parallelisim is 1 because of table propert

过滤逻辑步骤如下:

WHERE ("SEQ_NO", "TRAN_DATE") >= (10023931,'2024-01-19 00:00:00') AND ("SEQ_NO", "TRAN_DATE") <= (10024360,'2024-01-19 00:00:00');

1.过滤出SEQ_NO 大于值为10023931的所有数据;

2.过滤出SEQ_NO 等于值为10023931时,再看TRAN_DATE过滤条件;

3.过滤出SEQ_NO 小于值为10024360的所有数据;

4.过滤出SEQ_NO 等于值为10024360时,再看TRAN_DATE过滤条件;

取交集后,查询结果如下:

1729579314

是不是不太好理解呢?那我们看下,实际写成and的方式,结果会是什么样子的。

WHERE SEQ_NO >= 10023931 AND SEQ_NO <= 10024360 AND TRAN_DATE>='2024-01-19 00:00:00' AND TRAN_DATE<='2024-01-19 00:00:00';

结果如下:

1729580079

SEQ_NO=10024358的数据不在了。这个很好理解。

我们再来看下,要是不加TRAN_DATE 条件,结果会是什么样子的。

WHERE SEQ_NO >= 10023931 AND SEQ_NO <= 10024360;

结果如下:

1729580347

与结果时3条的对比,第四行,是不符合向量的过滤,所以去掉了。

是否走索引

and方式,走了索引。如下图。   向量方式,是full scan,如一开始的explain。

========================================================|ID|OPERATOR        |NAME        |EST.ROWS|EST.TIME(us)|--------------------------------------------------------|0 |TABLE RANGE SCAN|XXXXXX      |1       |4           |========================================================Outputs & filters:-------------------------------------
  0 - output([XXXXXX.SEQ_NO(0x7fb3f86244d0)], [XXXXXX.CHANNEL_SEQ_NO(0x7fb3f862af00)], [XXXXXX.SUB_SEQ_NO(0x7fb3f862b500)], [XXXXXX.TAE_SUB_SEQ_NO(0x7fb3f862bb00)],
       [XXXXXX.TRAN_DATE(0x7fb3f8623500)]), filter([XXXXXX.TRAN_DATE(0x7fb3f8623500) <= cast('2024-01-19 00:00:00', DATE(19, 0))(0x7fb3f8627e10)(0x7fb3f8629270)],
       [XXXXXX.TRAN_DATE(0x7fb3f8623500) >= cast('2024-01-19 00:00:00', DATE(19, 0))(0x7fb3f8627e10)(0x7fb3f8627680)]), rowset=16
      access([XXXXXX.SEQ_NO(0x7fb3f86244d0)], [XXXXXX.TRAN_DATE(0x7fb3f8623500)], [XXXXXX.CHANNEL_SEQ_NO(0x7fb3f862af00)], [XXXXXX.SUB_SEQ_NO(0x7fb3f862b500)],
       [XXXXXX.TAE_SUB_SEQ_NO(0x7fb3f862bb00)]), partitions(p14)
      is_index_back=false, is_global_index=false, filter_before_indexback[false,false], 
      range_key([XXXXXX.SEQ_NO(0x7fb3f86244d0)], [XXXXXX.TRAN_DATE(0x7fb3f8623500)]), range[10023931,2024-01-19 00:00:00 ; 10024360,2024-01-19 
      00:00:00], 
      range_cond([XXXXXX.SEQ_NO(0x7fb3f86244d0) >= cast('10023931', VARCHAR2(1048576 ))(0x7fb3f8624a80)(0x7fb3f8623d80)], [XXXXXX.SEQ_NO(0x7fb3f86244d0) 
      <= cast('10024360', VARCHAR2(1048576 ))(0x7fb3f8626590)(0x7fb3f8625b70)])Used Hint:-------------------------------------
  /*+
      
  */Qb name trace:-------------------------------------
  stmt_id:0, stmt_type:T_EXPLAIN 
  stmt_id:1, SEL$1Outline Data: 
-------------------------------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1" "XXXXXX"."XXXXXX"@"SEL$1")
      OPTIMIZER_FEATURES_ENABLE('4.2.1.0')
      END_OUTLINE_DATA
  */Optimization Info:-------------------------------------
  XXXXXX:
      table_rows:10215491
      physical_range_rows:3
      logical_range_rows:3
      index_back_rows:0
      output_rows:0
      table_dop:1
      dop_method:Table DOP
      avaiable_index_name:[IDX_XXXXXX_3, IDX_XXXXXX_15, IDX_XXXXXX_11, IDX_XXXXXX_7, IDX_XXXXXX_9, IDX_XXXXXX_4, IDX_XXXXXX_2, IDX_XXXXXX_12, IDX_XXXXXX_6, IDX_XXXXXX_13, IDX_XXXXXX_ORIG_TRAN_TIMESTAMP, IDX_XXXXXX_10, IDX_XXXXXX_14, IDX_XXXXXX_8, IDX_XXXXXX_5, IDX_XXXXXX_1, XXXXXX]
      pruned_index_name:[IDX_XXXXXX_3, IDX_XXXXXX_15, IDX_XXXXXX_11, IDX_XXXXXX_9, IDX_XXXXXX_4, IDX_XXXXXX_2, IDX_XXXXXX_12, IDX_XXXXXX_6, IDX_XXXXXX_13, IDX_XXXXXX_ORIG_TRAN_TIMESTAMP, IDX_XXXXXX_10, IDX_XXXXXX_14, IDX_XXXXXX_8, IDX_XXXXXX_5, IDX_XXXXXX_1]
      stats version:1728914914354723
      dynamic sampling level:0
  Plan Type:      LOCAL
  Note:
      Degree of Parallelisim is 1 because of table property

总结:

禁止使用WHERE   ("SEQ_NO", "TRAN_DATE") >(?, ?)   AND ("SEQ_NO", "TRAN_DATE") <=(?, ?) 向量方式过滤查询数据库, 性能低并且 结果集大


请使用浏览器的分享功能分享到微信等