aio-max-nr/aio-nr
最大允许的aio请求数/当前aio请求数
2.6之前的版本还有aio-max-size,自2.6起AIO成为Linux默认选项;
当aio-max-nr设置过小oracle可能遭遇ORA-27090,网上有类似案例:
运行于exadata上的11203,最大支持8000数据库连接,经常遭遇ORA-27090,经检查其aio-max-nr设置为3145728,但aio-nr已经为3145726;
作者使用systemtap调试系统试图找出aio请求消耗如此多的原因,
stap -ve '
global allocated, allocatedctx, freed
probe syscall.io_setup {
allocatedctx[pid()] += maxevents; allocated[pid()]++;
printf("%d AIO events requested by PID %d (%s)\n",
maxevents, pid(), cmdline_str());
}
probe syscall.io_destroy {freed[pid()]++}
probe kprocess.exit {
if (allocated[pid()]) {
printf("PID %d exited\n", pid());
delete allocated[pid()];
delete allocatedctx[pid()];
delete freed[pid()];
}
}
probe end {
foreach (pid in allocated) {
printf("PID %d allocated=%d allocated events=%d freed=%d\n",
pid, allocated[pid], allocatedctx[pid], freed[pid]);
}
}
'
输出结果如下
Pass 1: parsed user script. and 76 library script(s) using 147908virt/22876res/2992shr kb, in 130usr/10sys/146real ms.
Pass 2: analyzed script. 4 probe(s), 10 function(s), 3 embed(s), 4 global(s) using 283072virt/49864res/4052shr kb, in 450usr/140sys/586real ms.
Pass 3: using cached /root/.systemtap/cache/11/stap_111c870f2747cede20e6a0e2f0a1b1ae_6256.c
Pass 4: using cached /root/.systemtap/cache/11/stap_111c870f2747cede20e6a0e2f0a1b1ae_6256.ko
Pass 5: starting run.
128 AIO events requested by PID 32885 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 32885 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 32885 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
...
(and when control-C is pressed):
PID 99043 allocated=6 allocatedevents=12672 freed=3
PID 37074 allocated=12 allocatedevents=25344 freed=6
PID 99039 allocated=18 allocatedevents=38016 freed=9
PID 69142 allocated=24 allocatedevents=50688 freed=12
PID 32885 allocated=36 allocatedevents=76032 freed=18
PID 69099 allocated=6 allocatedevents=12672 freed=3
Oracle进程占用了大量的AIO,有些进程一次就请求4096,最后经oracle技术支持协商,将aio-max-nr设置为5000万,自此ORA-27090再也没有出现过
http://www.pythian.com/blog/troubleshooting-ora-27090-async-io-errors/
如何检查系统是否使用AIO?
justin_$ cat /proc/slabinfo | grep kio
kioctx 579 920 384 10 1 : tunables 54 27 8 : slabdata 92 92 1
kiocb 35 45 256 15 1 : tunables 120 60 8 : slabdata 3 3 0
如何确定oracle是否链接了AIO?
justin_$ /usr/bin/ldd $ORACLE_HOME/bin/oracle | grep libaio
libaio.so.1 => /usr/lib64/libaio.so.1 (0x00007fca30cb4000)
如果没有返回结果,则关闭数据库编译binary,具体文件为$ORACLE_HOME/rdbms/lib/ins_rdbms.mk
make PL_ORALIBS=-laio -f ins_rdbms.mk async_on
justin_$ more aio-max-nr
1048576
justin_$ more aio-nr
98482
file-max/nr_open
内核支持的最大file handle数量/一个进程最多使用的file handle数
justin_$ more file-max
6815744
justin_$ more nr_open
1048576
file-nr
3列分别为:已分配的文件handle数量/已分配但没有使用的/最大文件handle;
Linux 2.6起第2列一直为0 ,表示所有以分配的file handle都在使用,但第1列应该经常变化
justin_$ more file-nr
786048 0 6815744
http://space.itpub.net/15480802/viewspace-734062
inode-nr/ inode-state
Inode-max:最大inode数量,通常为file-max的3-4倍,因为stdin/stdout/socket都需要inode,但2.6已经废弃;
Inode-nr:列出inode-state的前两个item,可以跳过不看
Inode-state:前3个列为nr_inodes/nr_free_inodes/preshrink,而前两个分别表示已分配inode数/空闲inode数;当nr_inodes > inode_max时preshirnk = nr_inodes – inode_max,此时系统需要清除排查inode列表;
justin_$ more inode-state
134123 41514 0 0 0 0 0
justin_$ more inode-nr
134123 41514
Overflowgid/ overflowuid
Linux的UID/GID为32位,但有些文件系统只支持16位的UID/GID,此时若进行写操作会出错;
当UID/GID超过65535时会自动被转换为一个固定值,即上述两值
justin_$ more overflowgid
65534
justin_$ more overflowuid
65534
leases-enable/lease-break-time
linux也拥有文件锁,详情参照
http://blog.csdn.net/yebanghua/article/details/7301904
justin_$ more lease-break-time
45
justin_$ more leases-enable
1
justin_$ more suid_dumpable
0
/proc/sys/fs还包含一些子目录,诸如mqueue/quota/nfs/inotify
Mqueue目录
POSIX消息队列用于进程间交换数据,与System V消息队列类似,以下3个参数用于其基本设置;
可通过ipcs –q查看当前系统的使用情况
justin_$ ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
msg_max
一个消息队列的最大消息数,默认为10;
justin_$ more msg_max
10
msgsize_max
单个消息最大尺寸
justin_$ more msgsize_max
8192
queues_max
最大消息队列数
justin_$ more queues_max
256