设想如下数据库场景:两张表通过外键关联,外键字段为 NOT NULL,且约束有效并被严格执行。在执行两表 JOIN 查询时,却返回空结果。该场景看似不可能出现,但实际生产环境中出现了一个未被预期的边界情况。
问题场景
假设以车辆所有权管理场景为例,包含两张表,分别是车主表与车辆表:
db=# CREATE TABLE owner ( id int PRIMARY KEY, name text NOT NULL ); CREATE TABLE db=# CREATE TABLE car ( id int PRIMARY KEY, owner_id int NOT NULL, CONSTRAINT car_owner_id_fk FOREIGN KEY (owner_id) REFERENCES owner(id) ); CREATE TABLE
初始化数据:
db=# INSERT INTO owner (id, name) VALUES (1, 'haki'), (2, 'jerry'), (3, 'george') RETURNS *; id │ name ────┼─────── 1 │ haki 2 │ jerry 3 | george (3 rows) INSERT 0 2 db=# INSERT INTO car (id, owner_id) VALUES(1, 1) RETURNS *; id │ owner_id ────┼────────── 1 │ 1 (1 row) INSERT 0 1
当前状态:
- 存在三个 owner:haki、jerry、george
- car(id=1) 的 owner 为 haki
所有权变更流程
标准流程如下:
- 查询 car 及其 owner,并对 car 行加锁;
- 校验当前 owner;
- 更新 owner_id 并提交。
示例:
db=# BEGIN; BEGIN db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 1 id │ 1 name │ haki
加锁可避免并发修改,校验完成后执行更新:
db=*# UPDATE car SET owner_id = 2 WHERE id = 1 RETURNING *; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 db=*# COMMIT; COMMIT
并发场景下的权属变更
当前车辆所有权变更流程如下:
- 查询车辆及其当前所有者,并对相关记录加锁。
- 对所有者信息进行必要校验。
- 更新车辆所有权并提交事务。
在实际的车管系统中,此类请求通常会被高并发处理。下面基于相同流程,分析两个并发会话同时尝试更新同一车辆所有权时的执行情况:
-- First session db=# BEGIN; BEGIN db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 id │ 2 name │ jerry db=*# UPDATE car SET owner_id = 3 WHERE id = 1 RETURNING *; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 db=*# COMMIT; COMMIT
-- Second session db=# BEGIN; BEGIN; db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; -- Session is blocked -- -- -- -- -- -- -- -- -- -- -- -- -- Lock is released by first session (0 rows)
第一会话加锁并更新车辆车主,第二会话阻塞等待锁释放,锁释放后查询无结果。车辆与车 主 数 据均存在,内关联查询却无返回值。
替换为左外关联查询可观察到异常数据:
db=# BEGIN; BEGIN db=*# SELECT * FROM car LEFT JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 id │ 2 name │ jerry db=*# UPDATE car SET owner_id = 3 WHERE id = 1 RETURNS *; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 db=*# COMMIT; COMMIT
db=# BEGIN; BEGIN db=*# SELECT * FROM car LEFT JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; -- Session is blocked -- -- -- -- -- -- -- -- -- -- -- -- -- Lock is released by first session ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 id │ ¤ name │ ¤
查询返回车辆最新数据,但车主表字段为空。车辆表存在非空约束与外键约束,理论上内关联查询不会出现无结果场景。
并发问题根源
在 Read Committed 隔离级别下,查询只能基于“语句开始时已提交的数据”,但在遇到行锁时,会等待并在锁释放后基于最新行版本重新评估条件。
─[ RECORD 1 ]─┬──────── id │ 1 -- car.id owner_id │ 3 -- car.owner_id id │ ¤ -- owner.id name │ ¤ -- owner.name
本例中,第一个会话将 car.owner 从 1 更新为 3 并提交;第二个会话原本读取到 car=1、owner=1,但因锁等待,锁释放后重新校验时,发现 owner 已变为 3,而原匹配条件(owner=1)失效,因此不会重新关联 owner=3,而是直接丢弃该结果(或返回 NULL)。
关键在于:查询并非整体重跑,而是执行过程中被中断并局部重评估(EvalPlanQual),从而导致 car 表为最新值,而 join 表数据缺失。
结论:在 Read Committed 下,涉及 JOIN 且伴随外键更新的并发场景,可能出现“部分结果”或“不一致结果”。
解决方案
在明确问题根因后,可以对不同方案进行对比分析:
锁定关联行(无效方案)
同时锁定 car 与 owner 无法保证正确性。第二个会话获得锁时,car.owner_id 已变化,JOIN 条件失效,本质是锁定了“旧 owner”。
db=# BEGIN; BEGIN; db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 id │ 2 name │ jerry db=*# UPDATE car SET owner_id = 3 WHERE id = 1 RETURNING *; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 db=*# COMMIT; COMMIT
db=# BEGIN; BEGIN; db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner; -- Session is locked -- -- -- -- -- -- -- -- -- -- -- -- -- Lock is released by first session (0 rows)
db=# BEGIN; BEGIN db=*# SELECT * FROM car LEFT JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner; ERROR: FOR NO KEY UPDATE cannot be applied to the nullable side of an outer join
提高隔离级别(可行但有代价)
使用 Repeatable Read / Serializable 可避免不一致,但会引入序列化冲突错误,影响并发能力。
db=# BEGIN ISOLATION LEVEL REPEATABLE READ; BEGIN db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 id │ 2 name │ jerry db=*# UPDATE car SET owner_id = 3 WHERE id = 1 RETURNING *; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 db=*# COMMIT; COMMIT
db=# BEGIN ISOLATION LEVEL REPEATABLE READ; BEGIN db=*# SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car; -- Session is blocked -- -- -- -- -- -- -- -- -- -- -- -- -- Lock is released by first session ERROR: could not serialize access due to concurrent update
拆分查询(简单可靠)
先锁 car,再查 owner,保证读取最新 owner_id;逻辑简单,但需要两次数据库往返。
db=# BEGIN; BEGIN db=*# SELECT * FROM car WHERE car.id = 1 FOR NO KEY UPDATE; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 db=*# SELECT * FROM owner WHERE id = 2; ─[ RECORD 1 ]─┬──────── id │ 2 name │ jerry db=*# UPDATE car SET owner_id = 3 WHERE id = 1 RETURNING *; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 db=*# COMMIT; COMMIT
db=# BEGIN; BEGIN db=*# SELECT * FROM car WHERE car.id = 1 FOR NO KEY UPDATE; -- Session is blocked -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Lock is released by first session ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 3 db=*# SELECT * FROM owner WHERE id = 3; ─[ RECORD 1 ]─┬──────── id │ 3 name │ george
子查询 / CTE(推荐方案)
核心思路:将加锁前移到 JOIN 之前,避免读取过期关联关系,同时避免额外往返。
执行计划显示:原查询在 JOIN 后才通过 LockRows 加锁。
db=# BEGIN; BEGIN db=*# EXPLAIN SELECT * FROM car JOIN owner ON car.owner_id = owner.id WHERE car.id = 1 FOR NO KEY UPDATE OF car QUERY PLAN ──────────────────────────────────────────────────────────────────────────────────── LockRows (cost=0.31..16.37 rows=1 width=56) -> Nested Loop (cost=0.31..16.36 rows=1 width=56) -> Index Scan using car_pkey on car (cost=0.15..8.17 rows=1 width=14) Index Cond: (id = 1) -> Index Scan using owner_pkey on owner (cost=0.15..8.17 rows=1 width=42) Index Cond: (id = car.owner_id)
通过 CTE,将锁提前到数据源阶段:
db=# BEGIN; BEGIN; db=*# EXPLAIN WITH c AS ( SELECT * FROM car WHERE id = 1 FOR NO KEY UPDATE ) SELECT * FROM c JOIN owner ON c.owner_id = owner.id; QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────── Nested Loop (cost=8.33..16.39 rows=1 width=44) CTE c -> LockRows (cost=0.15..8.18 rows=1 width=14) -> Index Scan using car_pkey on car (cost=0.15..8.17 rows=1 width=14) Index Cond: (id = 1) -> CTE Scan on c (cost=0.00..0.02 rows=1 width=8) -> Index Scan using owner_pkey on owner (cost=0.15..8.17 rows=1 width=36) Index Cond: (id = c.owner_id)
效果:先锁 car,再 JOIN,读取到最新 owner。
db=# BEGIN; BEGIN db=*# WITH c AS ( SELECT * FROM car WHERE id = 1 FOR NO KEY UPDATE ) SELECT * FROM c JOIN owner ON c.owner_id = owner.id; ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 1 id │ 1 name │ haki db=*# UPDATE car SET owner_id = 2 WHERE id = 1; UPDATE 1 db=*# COMMIT; COMMIT
db=# BEGIN; BEGIN db=*# WITH c AS ( SELECT * FROM car WHERE id = 1 FOR NO KEY UPDATE ) SELECT * FROM c JOIN owner ON c.owner_id = owner.id; -- Session is blocked -- -- -- -- -- -- -- -- -- -- -- Lock is released by first session ─[ RECORD 1 ]─┬──────── id │ 1 owner_id │ 2 id │ 2 name │ jerry
无法使用 WITH 时,可用等价子查询实现:
db=# BEGIN; BEGIN; db=*# EXPLAIN SELECT * FROM ( SELECT * FROM car WHERE id = 1 FOR NO KEY UPDATE ) c JOIN owner ON c.owner_id = owner.id; QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────── Nested Loop (cost=0.31..16.38 rows=1 width=44) -> Subquery Scan on c (cost=0.15..8.19 rows=1 width=8) -> LockRows (cost=0.15..8.18 rows=1 width=14) -> Index Scan using car_pkey on car (cost=0.15..8.17 rows=1 width=14) Index Cond: (id = 1) -> Index Scan using owner_pkey on owner (cost=0.15..8.17 rows=1 width=36) Index Cond: (id = c.owner_id)
结论:通过子查询/CTE 前移锁获取时机,是兼顾一致性与性能的最优解。
预防措施
该问题并非理论推演,而是在生产环境中真实发生。触发原因也并非高并发场景,而是用户在 UI 中快速重复点击按钮,短时间内产生了多次相同请求,从而形成并发条件。由于系统中已设置断言机制,未对业务和用户产生实际影响。
在定位问题模式后,开始评估预防手段。此类问题难以通过常规规则检测:不仅依赖具体 SQL 结构,还与执行上下文和并发时序密切相关,对代码理解要求较高。
通过结合大模型辅助分析,对潜在风险点进行了系统性排查与修复。最终统一采用更稳妥的策略:避免在加锁场景中使用 JOIN,改为拆分查询。虽然增加了一次数据库交互,但能够确保结果一致性,属于可接受的权衡。