[20190103]设置pre_page_sga=true启动缓慢的问题.txt
--//节前的问题,别人的系统以前就遇到启动缓慢的问题,问过我,我也没太关注.
--//正好他们节前重启数据库,我远程连上看一下,才发现对方pre_page_sga=true,sga_target=sga_max_size=5XG,
--//而且没有打开hugepages,启动异常缓慢.
--//实际上发现很偶然,我执行top看才发现,oracle进程相关列RES不断在增长.我自己一直有1个疑问,为什么打开hugepage.
--//后在pre_page_sga=true的情况下,有一些系统res看到就没有这个大接近sga大小.
--//另外注意一个问题,12cR2下缺省pre_page_sga=true,这样设置hugepages在安装时更加有必要,不然可能启动很慢,
--//特别是sga很大的情况下.
--//我在自己的测试环境演示看看.
1.环境:
--//oracle 版本11.2.0.4.0 64位版本.
--//OS :Oracle Linux Server release 5.9
$ cat /proc/version
Linux version 2.6.39-300.26.1.el5uek (mockbuild@ca-build56.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-52)) #1 SMP Thu Jan 3 18:31:38 PST 2013
2.测试:
--//建立参数文件.
$ cat initxxxx.ora
db_name=xxxx
instance_name=xxxx
sga_target=50G
sga_max_size=50G
pre_page_sga=true
$ export ORACLE_SID=xxxx
$ rlsql
SQL*Plus: Release 11.2.0.4.0 Production on Thu Jan 3 10:24:11 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SYS@xxxx> startup nomount
--//使用top -u oracle观察
# top -u oracle
Tasks: 257 total, 2 running, 255 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.6%sy, 0.0%ni, 99.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132261196k total, 50676620k used, 81584576k free, 268184k buffers
Swap: 31455264k total, 131660k used, 31323604k free, 41556392k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
65426 oracle 20 0 50.2g 24g 24g R 99.7 19.3 0:18.98 oracle
--//你可以发现RES列在增加.整个启动缓慢.
--//查看alert文件:
Thu Jan 03 10:24:35 2019
~~~~~~~~~~~~~~~~~~~~~~~~~
Starting ORACLE instance (normal)
.....
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 51 GB
Total Shared Global Region in Large Pages = 122 MB (0%)
Large Pages used by this instance: 61 (122 MB)
Large Pages unused system wide = 43 (86 MB)
Large Pages configured system wide = 104 (208 MB)
Large Page size = 2048 KB
RECOMMENDATION:
Total System Global Area size is 50 GB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 25499 (page size 2048 KB, total size 50 GB) system wide to
get 100% of the System Global Area allocated with large pages
********************************************************************
--//无法使用hugepages.
MMON started with pid=17, OS id=65513
Thu Jan 03 10:28:27 2019
MMNL started with pid=18, OS id=65515
Thu Jan 03 10:28:27 2019
~~~~~~~~~~~~~~~~~~~~~~~~~~
ORACLE_BASE from environment = /u01/app/oracle
--//启动到mount状态需要将近4分钟.使用top参看:
# top -u oracle
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
...
65426 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:38.29 oracle
65448 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.25 oracle
65452 oracle -2 0 50.2g 49g 49g S 0.0 39.5 0:15.57 oracle
65456 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.52 oracle
65459 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:10.74 oracle
65461 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.54 oracle
65466 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.77 oracle
65468 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:15.61 oracle
65472 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:11.73 oracle
65474 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.29 oracle
65501 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:11.98 oracle
65503 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:11.94 oracle
65505 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.93 oracle
65507 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:11.87 oracle
65511 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:12.61 oracle
65513 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:10.95 oracle
65515 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:10.80 oracle
65522 oracle 20 0 50.2g 49g 49g S 0.0 39.5 0:11.60 oracle
--//RES,SHR占用很大,接近virt大小.
$ cat /proc/meminfo | grep -i page
AnonPages: 313752 kB
PageTables: 1868324 kB
AnonHugePages: 0 kB
HugePages_Total: 104
HugePages_Free: 47
HugePages_Rsvd: 4
HugePages_Surp: 0
Hugepagesize: 2048 kB
--//你可以发现没有使用hugepages,PageTables到达 1868324 kB.
--//我感到奇怪的是我们有1台18c 数据库,设置pre_page_sga=true.
--//18c的数据库看到的情况如下(Red Hat Enterprise Linux Server release 7.5 (Maipo)):
Tasks: 532 total, 1 running, 531 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 2.5 sy, 0.0 ni, 96.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65709508 total, 27079916 free, 4909680 used, 33719912 buff/cache
KiB Swap: 16773116 total, 16773116 free, 0 used. 30605328 avail Mem
PID PPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND USED SWAP
8643 1 oracle 20 0 29.2g 14.0g 14.0g S 0.0 22.4 10:03.25 ora_dbw0_orclcd 14.0g 0
9245 1 oracle 20 0 29.2g 2.9g 2.8g S 0.0 4.6 3:41.57 ora_w006_orclcd 2.9g 0
8867 1 oracle 20 0 29.2g 2.2g 2.1g S 0.0 3.5 3:18.60 ora_w004_orclcd 2.2g 0
8751 1 oracle 20 0 29.2g 2.0g 1.9g S 0.0 3.1 2:15.33 ora_w002_orclcd 2.0g 0
8663 1 oracle 20 0 29.2g 1.9g 1.8g S 0.0 3.0 2:25.90 ora_w001_orclcd 1.9g 0
9128 1 oracle 20 0 29.2g 1.9g 1.8g S 0.0 3.0 1:48.32 ora_w005_orclcd 1.9g 0
8717 1 oracle 20 0 29.2g 1.9g 1.8g S 0.0 3.0 15:07.65 ora_p002_orclcd 1.9g 0
--//RES就没有这么大,而且是缓慢增长的.实际上pagetables很大,再次强调设置hugepages的必要性,作为安装的必要步骤.
# grep -i page /proc/meminfo
AnonPages: 2761200 kB
PageTables: 723276 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
--//PageTables达到723276KB.
# ps -ef |grep ora[_]|wc
83 664 5395
--//仅仅才83个进程
# ps -eLf |grep ora[_]|wc
85 850 6460
3.回到当前测试环境,打开hugepages看看:
SYS@xxxx> shutdown abort ;
ORACLE instance shut down.
--//修改 /etc/sysctl.conf加入如下:
vm.nr_hugepages = 26000
# sysctl -p
SYS@xxxx> startup nomount
ORACLE instance started.
Total System Global Area 5.3447E+10 bytes
Fixed Size 2265864 bytes
Variable Size 5100276984 bytes
Database Buffers 4.8318E+10 bytes
Redo Buffers 26480640 bytes
--//alert.log
Thu Jan 03 10:37:47 2019
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 51 GB
Total Shared Global Region in Large Pages = 50 GB (100%)
Large Pages used by this instance: 25601 (50 GB)
Large Pages unused system wide = 399 (798 MB)
Large Pages configured system wide = 26000 (51 GB)
Large Page size = 2048 KB
********************************************************************
--//使用hugepages.
...
MMNL started with pid=18, OS id=406
Thu Jan 03 10:38:10 2019
ORACLE_BASE from environment = /u01/app/oracle
--//仅仅需要23秒.
# top -u oracle
Tasks: 273 total, 1 running, 272 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.6%sy, 0.0%ni, 99.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132261196k total, 63923704k used, 68337492k free, 149588k buffers
Swap: 31455264k total, 4079604k used, 27375660k free, 2341820k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
375 oracle -2 0 50.2g 15m 13m S 3.9 0.0 0:01.82 oracle
369 oracle 20 0 50.2g 16m 14m S 0.0 0.0 0:16.16 oracle
373 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.34 oracle
379 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.25 oracle
381 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.27 oracle
383 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.28 oracle
385 oracle 20 0 50.2g 18m 13m S 0.0 0.0 0:00.40 oracle
387 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:03.88 oracle
389 oracle 20 0 50.2g 21m 13m S 0.0 0.0 0:00.27 oracle
391 oracle 20 0 50.2g 21m 13m S 0.0 0.0 0:00.24 oracle
393 oracle 20 0 50.2g 21m 13m S 0.0 0.0 0:00.25 oracle
395 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.26 oracle
397 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.27 oracle
399 oracle 20 0 50.2g 15m 13m S 0.0 0.0 0:00.24 oracle
--//为什么pre_page_sga=true,使用hugepages的情况下,res占用这么小呢?有点不理解,那位解析看看.
# cat /proc/meminfo | grep -i page
AnonPages: 216620 kB
PageTables: 34232 kB
AnonHugePages: 0 kB
HugePages_Total: 26000
HugePages_Free: 473
HugePages_Rsvd: 74
HugePages_Surp: 0
Hugepagesize: 2048 kB
--//PageTables仅仅使用34232 kB.对比就可以发现pagetables使用大大减少.
--//如果你注解pre_page_sga=true,再次启动数据库几乎马上完成.
Thu Jan 03 10:44:43 2019
Starting ORACLE instance (normal)
....
MMON started with pid=17, OS id=504
Thu Jan 03 10:44:45 2019
MMNL started with pid=18, OS id=506
ORACLE_BASE from environment = /u01/app/oracle
--//仅仅2秒.
总结:
1.使用pre_page_sga=true,必须打开hugepages.
2.注意12cR2缺省pre_page_sga=true.
3.再次强调安装时配置hugepages必要性,现在的服务器内存都很大.
4.pre_page_sga=true,使用hugepages的情况下,res占用这么小呢?不理解.
5.实际上我们有一台rh4.3的机器,即使设置pre_page_sga=false,打开hugepages的情况下.
--//rh 4.2的机器.
# top -u oracle
top - 10:48:37 up 362 days, 18:21, 1 user, load average: 0.01, 0.03, 0.00
Tasks: 116 total, 1 running, 115 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 99.5% id, 0.5% wa, 0.0% hi, 0.0% si
Mem: 4045276k total, 4024284k used, 20992k free, 135488k buffers
Swap: 3911788k total, 140k used, 3911648k free, 2731800k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12629 oracle 16 0 838m 626m 624m S 0.0 15.9 0:13.40 oracle
12631 oracle 16 0 836m 625m 623m S 0.0 15.8 0:02.09 oracle
12633 oracle -2 0 836m 625m 623m S 0.0 15.8 0:00.04 oracle
12637 oracle 16 0 836m 625m 623m S 0.0 15.8 0:00.03 oracle
12639 oracle 16 0 836m 625m 623m S 0.0 15.8 0:00.03 oracle
12641 oracle 16 0 836m 631m 628m S 0.0 16.0 0:00.07 oracle
--//RES也很大,等于sga大小.
SYS@bookdg> show sga
Total System Global Area 634732544 bytes
Fixed Size 2255792 bytes
Variable Size 197133392 bytes
Database Buffers 427819008 bytes
Redo Buffers 7524352 bytes
SYS@bookdg> show parameter pre_page
NAME TYPE VALUE
------------ ------- ------
pre_page_sga boolean FALSE
# cat /proc/meminfo | grep -i page
PageTables: 16460 kB
HugePages_Total: 320
HugePages_Free: 15
Hugepagesize: 2048 kB
--//那位给出解析,到底是os还是oracle的问题,出现这样的情况.
6.收尾还原略.