Ora-NNNNN错误后续:报错信息可读性

         前几天我写了一篇《Oceanbase不建议模仿Oracle的错误编号(ORA-NNNNN)!》,出乎意料的是,这篇文章在公众号上的阅读量非常高。一个知名的网红公众号引用了我的文章,并起了一个吸引眼球的标题《真狠!!!OceanBase被骂惨了!!!》,到目前为止,那篇阅读量已经接近3万, 这让我深感惊讶。

        我撰写技术博客已有十余年,但始终未能获得太多关注。期间也有发现有小伙伴将我网站上的笔记原文,利用小拇指与食指舞动两下就搬到CSDN等站点,因为百度的排名浏览量也可观,虽然有的未标记出处,如果真能帮到DBA节约时间处理了某个问题,也是值得的。

        另外我一直有一个偏见,觉得微信公众号不适合写技术类文章,例如实验代码格式、搜索引擎的收录、手机屏幕与大篇幅阅读性差等,但如果只是写一些观点的小作文除外。然而,不得不感叹的是,技术用户已经转移到了移动互联网终端。

         回到前一篇文章,我的初衷并不是要抨击OB,而是希望它能变得更好。比如,在排查故障时,是否能够通过错误识别来区分问题。前几天看到有OB用户留言,提到OceanBase厂家很可能是出于对他们需求的考虑增加的设计,因为他们的应用中有以前对ORACLE报错的依赖。这表明现在数据库厂家在满足各种甲方要求时也颇为无奈,在应用的兼容性上做了种种让步。

         在Oracle数据库版本迭代中相同的错误代码,除了资源类限制可能报错信息变化外,Oracle在报错信息的可读性上也是下足了功夫,比如:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
在23c以后阅读性更加友好
ORA-12514: Cannot connect to database. Service ANBOBpdb12 is not registered with the listener at host 172.20.xx.166 port 1521.

Oracle甚至还上线了Error Help网站官方公开一些处理的方法,当然没有MOS中那么具体. 如 https://docs.oracle.com/en/error-help/db/ora-12514/,  这些都是国产数据库可以参考的方向。当然如果OB可以迭代和ORACLE一样的错误代码并加以特殊标记,让Oracle DBA分析思路即可以复用oracle时代经验,又能区分报错数据源可能会更加完美吧。

     今天就对于会话1事务中,会话2 select for update nowait相同的报错场景,我也简单测试一下在其它国产库上的表现,是否还有不如Oceanbase的处理. 为了方便横向对比,这里我再简单的附上ORACLE 与OB的报错。

oracle 23c

SQL> select BANNER from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 23c Enterprise Edition Release 23.0.0.0.0 - Beta

# session 1
SQL> create table test1(id int);
Table created.

SQL> insert into test1 values(1);
1 row created.

SQL> commit;
Commit complete.

SQL> update test1 set id=2;
1 row updated.

# session 2
SQL> select * from test1 for update nowait;
select * from test1 for update nowait
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired


SQL> alter table test1 add name varchar2(20);
-- wait
-- ctrl +c
^Calter table test1 add name varchar2(20)
*
ERROR at line 1:
ORA-01013: User requested cancel of current operation.

Oceanbase

Server version: OceanBase 3.2.4.1 

# session 1
obclient [ANBOB]> alter session set autocommit=0; -- OB for oracle default autocommit off
Query OK, 0 rows affected (0.002 sec)

obclient [ANBOB]> create table test101(id int);
obclient [ANBOB]> insert into test101 values(100);

obclient [ANBOB]> update test101 set id=2;
Query OK, 1 row affected (0.018 sec)

# session 2
obclient [ANBOB]> alter session set autocommit=0;
obclient [ANBOB]> select * from test101 for update nowait;
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

obclient [ANBOB]> alter table test101 add name varchar2(10);
Query OK, 0 rows affected (0.064 sec)

obclient [ANBOB]> desc test101;
+-------+--------------+------+-----+---------+-------+
| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |
+-------+--------------+------+-----+---------+-------+
| ID | NUMBER(38) | YES | NULL | NULL | NULL |
| NAME | VARCHAR2(10) | YES | NULL | NULL | NULL |
+-------+--------------+------+-----+---------+-------+

# session 1
obclient [ANBOB]> desc test101;
+-------+--------------+------+-----+---------+-------+
| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |
+-------+--------------+------+-----+---------+-------+
| ID | NUMBER(38) | YES | NULL | NULL | NULL |
| NAME | VARCHAR2(10) | YES | NULL | NULL | NULL |
+-------+--------------+------+-----+---------+-------+
2 rows in set (0.005 sec)

Note:
Oceanbase 在线DDL 并不会被DML事务堵塞,并且事务中的会话在结束当前的事务前,就可以查到变化后的表结构, 这点与TIDB不同.

TIDB

Your MySQL connection id is 473
Server version: 5.7.25-TiDB-v5.4.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible

mysql> SET autocommit=0;
Query OK, 0 rows affected (0.04 sec)

mysql> create table test1(id int);
Query OK, 0 rows affected (0.18 sec)

mysql> insert into test1 values(1);
Query OK, 1 row affected (0.12 sec)

mysql> update test1 set id=3;
Query OK, 0 rows affected (0.04 sec)

# session 2
mysql> select * from test1 for update nowait;
ERROR 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.
mysql>

mysql> select * from test1 for update;
-- waiting
-- timeout
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

# session 3
mysql> show processlist;
+------+------+--------------------+------+---------+------+----------------+--------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+--------------------+------+---------+------+----------------+--------------------------------+
| 471 | root | 172.20.xx.253:4810 | NULL | Query | 0 | autocommit | show processlist |
| 473 | root | 172.20.xx.253:4832 | test | Query | 12 | in transaction | select * from test1 for update |
| 475 | root | 172.20.xx.253:4836 | test | Sleep | 71 | in transaction | NULL |
+------+------+--------------------+------+---------+------+----------------+--------------------------------+
3 rows in set (0.04 sec)

# session 2
mysql> alter table test1 add name varchar(20);
Query OK, 0 rows affected (0.32 sec)
--- online DDL
mysql> desc test1;
+-------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+------+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
+-------+-------------+------+------+---------+-------+

2 rows in set (0.12 sec)

# session 1
mysql> desc test1;
+-------+---------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+------+---------+-------+
| id | int(11) | YES | | NULL | |
+-------+---------+------+------+---------+-------+

1 row in set (0.04 sec)

Note:
TIDB同样DDL不会被DML事务堵塞,但是事务中的SESSION 1 在SESSION 2做完DDL后,看到的表结构还是事务开始时的样子,而OB是直接可以读取到。哪家更合理?自行判断。不再演示MySQL报错 (类似)。

达梦dameng

SQL> select * from v$version;

行号 BANNER
---------- ---------------------------------
1 DM Database Server 64 V8
2 DB Version: 0x7000c
3 03134283914-20221207-176225-20009

# session 1
SQL> create table test1(id int);
操作已执行
已用时间: 17.681(毫秒). 执行号:143129801.
SQL> insert into test1 values(1);
影响行数 1

已用时间: 1.414(毫秒). 执行号:143129802.
SQL> commit;
操作已执行
已用时间: 1.128(毫秒). 执行号:143129803.
SQL> update test1 set id=2;
影响行数 1
已用时间: 0.735(毫秒). 执行号:143129804.

# session 2
[dmdba@template ~]$ disql sysdba/root123
服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 2.983(ms)
disql V8
SQL> select * from test1;
行号 id
---------- -----------
1 1

已用时间: 2.753(毫秒). 执行号:143131100.
SQL> select * from test1 for update nowait;
select * from test1 for update nowait;
[-6409]:锁等待.
已用时间: 3.940(毫秒). 执行号:0.
SQL> select * from test1 for update;
^C^C
^C
-- 挂死了

# session 3
SQL> select sess_id,substr(sql_text,1,40),STATE,RUN_STATUS from v$sessions;
行号 sess_id substr(sql_text,1,40) STATE RUN_STATUS
---------- -------------------- ---------------------------------------- ------ ----------
..
2 209692816 update test1 set id=2; IDLE IDLE
3 140595797803360 select * from test1 for update; WAIT RUNNING
..
6 196059984 select sess_id,substr(sql_text,1,40),STA ACTIVE RUNNING

SQL> alter table test1 add name varchar2(10);
^C
alter table test1 add name varchar2(10);
[-6407]:锁超时.

Note:
达梦的报错信息提示个人感觉就过于简洁,像他的执行计划显示一样,吐槽一下,可读性不够友好。

人大金仓KINGBASE
–PostgreSQL的其中代表

$ ksql -h 172.xx.xxx.49 -U system -W
Password:
ksql: error: could not connect to server: FATAL: sorry, too many clients already
[kingbase@kingbase1 ~]$ ps -ef|grep kingbase|grep -i idle
kingbase 7270 29215 0 11:49 ? 00:00:02 kingbase: system test 172.xx.xxx.4(24355) idle
kingbase 16027 29215 0 12:34 ? 00:00:00 kingbase: system kingbase 172.xx.xxx.4(44540) idle
kingbase 16819 29215 0 12:35 ? 00:00:01 kingbase: system kingbase 172.xx.xxx.4(45010) idle
kingbase 17618 29215 0 12:35 ? 00:00:00 kingbase: system test 172.xx.xxx.4(46298) idle
kingbase 18127 29215 0 2023 ? 00:44:12 kingbase: esrep esrep 172.xx.xxx.49(23808) idle
kingbase 18135 29215 0 2023 ? 00:22:45 kingbase: esrep esrep 172.xx.xxx.50(39580) idle
kingbase 19257 29215 0 12:36 ? 00:00:00 kingbase: system test 172.xx.xxx.4(48146) idle
kingbase 22622 29215 0 12:39 ? 00:00:00 kingbase: system esrep 172.xx.xxx.4(52032) idle
kingbase 30033 29215 0 2022 ? 00:00:00 kingbase: system test ::1(30695) idle
kingbase 30044 29215 0 2022 ? 01:44:43 kingbase: esrep esrep 172.xx.xxx.49(9785) idle
kingbase 30510 23987 0 12:43 pts/0 00:00:00 grep --color=auto -i idle
[kingbase@kingbase1 ~]$ kill -9 30033
[kingbase@kingbase1 ~]$ kill -9 17618 7270


$ ksql -h 172.xx.xxx.49 -U system -W -d test
Password:
ksql (V8.0)
Type "help" for help.
test=#

# session 1
test=# create table test1(id int);
CREATE TABLE
test=# insert into test1 values(1);
INSERT 0 1

test=# begin;
BEGIN
test=# update test1 set id =2;
UPDATE 1
test=# select * from test1;
id
----
2

# session 2
test=# select * from test1 for update nowait;
ERROR: could not obtain lock on row in relation "test1"

test=# alter table test1 add name varchar(10);
^CCancel request sent
ERROR: canceling statement due to user request
test=

Note:
kingbase(postgresql)的报错感觉能把对象名报出来,个人感觉这更好。

Mogdb
— opengauss的代表

[omm@mogdb1 ~]$ gsql -r
gsql ((MogDB 5.0.1 build ae6d2ada) compiled at 2023-08-16 09:07:43 commit 0 last mr 1804 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

# session 1
db1=# create table test1(id int);
CREATE TABLE
db1=# insert into test1 values(1);
INSERT 0 1

db1=# begin;
BEGIN
db1=# update test1 set id=2;
UPDATE 1
db1=#

# session 2
db1=# select * from test1 for update nowait;
ERROR: could not obtain lock on row in relation "test1"

mogdb(opengauss)的报错同postgresql,毕竟同根, 报错中带对象名。

Summary:
关于for update nowait对比了oracle,ob,tidb,dameng,kingbase,mogdb几个数据库的报错信息,基本是分4类: Oracle式、MySQL式、PostgreSQL式、达梦式。ob报错是“兼容”了oracle, 其它国产是mysql与pg自带,达梦的报错个人感觉过于简洁。希望国产库在报错信息上提示更友好。另外注意在线DDL上OB与TIDB这两家原生分布式对事务中的表结构生效阶段不同。

请使用浏览器的分享功能分享到微信等