PostgreSQL DBA(157) - pgAdmin(OOM & max_locks_per_transaction )

有时候我们可能会在PG的日志发现如下信息:

2020-01-09 16:29:19.062 CST,"pg12","testdb",6193,"[local]",5e16dccd.1831,1,"CREATE TABLE",2020-01-09 15:57:01 CST,2/34,1512004206,ERROR,53200,"out of shared memory",,"You might need to increase max_locks_per_transaction.",,,,"CREATE TABLE a13030 (id int);",,,"psql"
2020-01-09 16:29:19.379 CST,"pg12","testdb",6193,"[local]",5e16dccd.1831,2,"CREATE TABLE",2020-01-09 15:57:01 CST,2/0,1512004206,ERROR,25P02,"current transaction is aborted, commands ignored until end of transaction block",,,,,,"CREATE TABLE a13031 (id int);",,,"psql"

直观上来看,OOM似乎与max_locks_per_transaction扯不上什么关系,为什么PG会提示增加max_locks_per_transaction的值呢?

max_locks_per_transaction
我们先看该参数的解释:

max_locks_per_transaction (integer)
The shared lock table tracks locks on max_locks_per_transaction * (max_connections + max_prepared_transactions) objects (e.g., tables); hence, no more than this many distinct objects can be locked at any one time. This parameter controls the average number of object locks allocated for each transaction; individual transactions can lock more objects as long as the locks of all transactions fit in the lock table. This is not the number of rows that can be locked; that value is unlimited. The default, 64, has historically proven sufficient, but you might need to raise this value if you have queries that touch many different tables in a single transaction, e.g. query of a parent table with many children. This parameter can only be set at server start.
When running a standby server, you must set this parameter to the same or higher value than on the master server. Otherwise, queries will not be allowed in the standby server.

在一个事务中,shared lock table最大可以跟踪max_locks_per_transaction * (max_connections + max_prepared_transactions) 个对象(如数据表),超过的会报OOM错误。注意:锁粒度是object(如relation等),跟行数无关。

OOM场景模拟
下面是一个模拟场景,在同一个事务中创建1w张表:

\pset footer off
\o /tmp/drop.sql
SELECT 'drop table if exists tbl' || id || ' ;' as "--"
       FROM generate_series(1, 20000) AS id;
\i /tmp/drop.sql
\pset footer off
\pset tuples_only
\o /tmp/create.sql
SELECT 'CREATE TABLE tbl' || id || ' (id int);' as "--"
       FROM generate_series(1, 20000) AS id;
\o /tmp/ret.txt
begin;
\i /tmp/create.sql

使用watch监控输出

watch -n1 "psql -c \"select locktype,mode,count(*) from pg_locks group by locktype,mode;\""
Every 1.0s: psql -c "select locktype,mode,count(*) from pg_locks group by locktype,mode;"  Fri Jan 10 14:41:26 2020
Expanded display is used automatically.
   locktype    |        mode         | count
---------------+---------------------+-------
 object        | AccessShareLock     |     1
 relation      | AccessShareLock     |     1
 virtualxid    | ExclusiveLock       |     2
 relation      | AccessExclusiveLock |  3776
 transactionid | ExclusiveLock       |     1
(5 rows)
...
Every 1.0s: psql -c "select locktype,mode,count(*) from pg_locks group by locktype,mode;"  Fri Jan 10 14:41:50 2020
Expanded display is used automatically.
   locktype    |        mode         | count
---------------+---------------------+-------
 object        | AccessShareLock     |     1
 relation      | AccessShareLock     |     1
 virtualxid    | ExclusiveLock       |     2
 relation      | AccessExclusiveLock | 10000
 transactionid | ExclusiveLock       |     1
(5 rows)
...

在执行到tbl13034时报错

2020-01-10 14:44:18.855 CST,"pg12","testdb",32120,"[local]",5e181bea.7d78,3,"CREATE TABLE",2020-01-10 14:38:34 CST,2/106085,1512036258,ERROR,53200,"out of shared memory",,"You might need to increase max_locks_per_transaction.",,,,"CREATE TABLE tbl13034 (id int);",,,"psql"
2020-01-10 14:44:19.202 CST,"pg12","testdb",32120,"[local]",5e181bea.7d78,4,"CREATE TABLE",2020-01-10 14:38:34 CST,2/0,1512036258,ERROR,25P02,"current transaction is aborted, commands ignored until end of transaction block",,,,,,"CREATE TABLE tbl13035 (id int);",,,"psql"

相关源码
搜索You might need to increase max_locks_per_transaction.该错误信息出现在lock.c中

 /*
  * LockAcquireExtended - allows us to specify additional options
  *
  * reportMemoryError specifies whether a lock request that fills the lock
  * table should generate an ERROR or not.  Passing "false" allows the caller
  * to attempt to recover from lock-table-full situations, perhaps by forcibly
  * cancelling other lock holders and then retrying.  Note, however, that the
  * return code for that is LOCKACQUIRE_NOT_AVAIL, so that it's unsafe to use
  * in combination with dontWait = true, as the cause of failure couldn't be
  * distinguished.
  *
  * If locallockp isn't NULL, *locallockp receives a pointer to the LOCALLOCK
  * table entry if a lock is successfully acquired, or NULL if not.
  */
 LockAcquireResult
 LockAcquireExtended(const LOCKTAG *locktag,
                     LOCKMODE lockmode,
                     bool sessionLock,
                     bool dontWait,
                     bool reportMemoryError,
                     LOCALLOCK **locallockp)
 {
     ...
     /*
      * If this lock could potentially have been taken via the fast-path by
      * some other backend, we must (temporarily) disable further use of the
      * fast-path for this lock tag, and migrate any locks already taken via
      * this method to the main lock table.
      */
     if (ConflictsWithRelationFastPath(locktag, lockmode))
     {
         uint32      fasthashcode = FastPathStrongLockHashPartition(hashcode);
         BeginStrongLockAcquire(locallock, fasthashcode);
         if (!FastPathTransferRelationLocks(lockMethodTable, locktag,
                                            hashcode))
         {
             AbortStrongLockAcquire();
             if (locallock->nLocks == 0)
                 RemoveLocalLock(locallock);
             if (locallockp)
                 *locallockp = NULL;
             if (reportMemoryError)
                 ereport(ERROR,
                         (errcode(ERRCODE_OUT_OF_MEMORY),
                          errmsg("out of shared memory"),
                          errhint("You might need to increase max_locks_per_transaction.")));
             else
                 return LOCKACQUIRE_NOT_AVAIL;
         }
     }
     ...

源码的解读,敬请关注后续文章。

参考文档
PostgreSQL: You might need to increase max_locks_per_transaction

请使用浏览器的分享功能分享到微信等