PostgreSQL 源码解读(208)- 查询#121(Valid OUTER JOIN Optimizations)

Valid OUTER JOIN Optimizations
OUTER JOIN 优化手段

The planner’s treatment of outer join reordering is based on the following
identities:
规划器基于以下恒等式处理外连接:


1.  (A leftjoin B on (Pab)) innerjoin C on (Pac)
  = (A innerjoin C on (Pac)) leftjoin B on (Pab)
where Pac is a predicate referencing A and C, etc (in this case, clearly
Pac cannot reference B, or the transformation is nonsensical).
这里,Pac是依赖A和C的谓词.
2.  (A leftjoin B on (Pab)) leftjoin C on (Pac)
  = (A leftjoin C on (Pac)) leftjoin B on (Pab)
3.  (A leftjoin B on (Pab)) leftjoin C on (Pbc)
  = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

Identity 3 only holds if predicate Pbc must fail for all-null B rows
(that is, Pbc is strict for at least one column of B). If Pbc is not
strict, the first form might produce some rows with nonnull C columns
where the second form would make those entries null.
恒等式3只在谓词Pbc对所有为null的B的行时才有效(也就是说,Pbc至少在B某个列上是严格的).
如果Pbc并不是严格的,那么第一种形式可能会产生C列非空的行而第二种形式则会是null值.

RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
tables, so the same identities work for right joins.
在交换了输入表之后,右连接和左连接是等价的,因此对于右连接相同的标识同样是有效的.

An example of a case that does not work is moving an innerjoin into or
out of the nullable side of an outer join:
不生效的一个例子是移动innerjoin组合到外连接的null一则.


  A leftjoin (B join C on (Pbc)) on (Pab)
  != (A leftjoin B on (Pab)) join C on (Pbc)

SEMI joins work a little bit differently. A semijoin can be reassociated
into or out of the lefthand side of another semijoin, left join, or
antijoin, but not into or out of the righthand side. Likewise, an inner
join, left join, or antijoin can be reassociated into or out of the
lefthand side of a semijoin, but not into or out of the righthand side.
半连接有些不同.半连接可以重新组合到另一个半连接、左连接或反连接的左边或外面,但不能组合到右边或外面。
同样,可以将内连接、左连接或反连接重新组合到半连接的左边或外面,但不能组合到右边或外面。

ANTI joins work approximately like LEFT joins, except that identity 3
fails if the join to C is an antijoin (even if Pbc is strict, and in
both the cases where the other join is a leftjoin and where it is an
antijoin). So we can’t reorder antijoins into or out of the RHS of a
leftjoin or antijoin, even if the relevant clause is strict.
ANTI连接工作原理类似于左连接,但如果与C的连接是反连接,恒等式3会失效.
(就算Pbc是严格的,在其他连接是左连接或者是反连接这两种情况均如此)

The current code does not attempt to re-order FULL JOINs at all.
FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
translating the jointree to “joinlist” representation. Other types of
JOIN nodes are normally collapsed so that they participate fully in the
join order search. To avoid generating illegal join orders, the planner
creates a SpecialJoinInfo node for each non-inner join, and join_is_legal
checks this list to decide if a proposed join is legal.
当前的实现不会尝试重排FULL JOINs.
在转换jointree为”joinlist”时,通过不折叠FULL JOIN的方式实现FULL JOIN排序.
其他类型的JOIN节点通常会折叠以便参与JOIN排序搜索.为了避免生成不合法的连接顺序,
规划器为每一个非inner连接创建SpecialJoinInfo结构体,函数join_is_legal检查
该链表来确定将要生成的连接是否有效.

What we store in SpecialJoinInfo nodes are the minimum sets of Relids
required on each side of the join to form the outer join. Note that
these are minimums; there’s no explicit maximum, since joining other
rels to the OJ’s syntactic rels may be legal. Per identities 1 and 2,
non-FULL joins can be freely associated into the lefthand side of an
OJ, but in some cases they can’t be associated into the righthand side.
So the restriction enforced by join_is_legal is that a proposed join
can’t join a rel within or partly within an RHS boundary to one outside
the boundary, unless the proposed join is a LEFT join that can associate
into the SpecialJoinInfo’s RHS using identity 3.
存储在SpecialJoinInfo中的信息是需要构造外连接的最小Relids集合.
注意这些是最小值;没有明确的最大值,因为将其他规则加入到外连接(OJ)的语法规则中可能是合法的.
对于恒等式1和1,非FULL连接可以放心的与OJ的左端连接,但在某些情况下,不能与右端连接.
因此,通过join_is_legal函数可以做到的是将要进行的连接不能将RHS边界内或部分RHS边界内的rel
连接到边界外的rel,除非该连接是一个左连接,可以使用恒等式3关联到SpecialJoinInfo中的RHS.

The use of minimum Relid sets has some pitfalls; consider a query like
A leftjoin (B leftjoin (C innerjoin D) on (Pbcd)) on Pa
where Pa doesn’t mention B/C/D at all. In this case a naive computation
would give the upper leftjoin’s min LHS as {A} and min RHS as {C,D} (since
we know that the innerjoin can’t associate out of the leftjoin’s RHS, and
enforce that by including its relids in the leftjoin’s min RHS). And the
lower leftjoin has min LHS of {B} and min RHS of {C,D}. Given such
information, join_is_legal would think it’s okay to associate the upper
join into the lower join’s RHS, transforming the query to
B leftjoin (A leftjoin (C innerjoin D) on Pa) on (Pbcd)
which yields totally wrong answers. We prevent that by forcing the min RHS
for the upper join to include B. This is perhaps overly restrictive, but
such cases don’t arise often so it’s not clear that it’s worth developing a
more complicated system.
最小Relid集合的使用有一些缺陷;考虑形如下面的查询
A leftjoin (B leftjoin (C innerjoin D) on (Pbcd)) on Pa
这里Pa与B/C/D都没有关系.
在这种情况下,通过简单计算会给出上层左连接的最小LHS是{A},最小RHS是{C,D}
(因为我们知道内连接不能与左连接的RHS关联,通过在左连接最小RHS包含该relids)
同时,低层的左连接有最小的LHS{B}和最小的RHS{C,D}.
给出这样的信息,join_is_legal会认为与上层连接使用低层的RHS进行关联是可以的,因此转换查询为
B leftjoin (A leftjoin (C innerjoin D) on Pa) on (Pbcd)
这导致了错误的输出.
通过强制上层连接最小RHS包含B来避免出现该错误.这可能过于严格,但不会这样的问题,因此是可行的.

参考资料
README

请使用浏览器的分享功能分享到微信等