[20221103]奇怪的mail信息(整理版本).txt

[20221103]奇怪的mail信息(整理版本).txt

--//生产系统服务器root登录,我发现快下班是出现如下信息.
--//you have mail,快下班时出现严重影响当时的工作心态,抽时间看看到底怎么回事.

# mail
Heirloom Mail version 12.5 7/5/10.  Type ? for help.
"/var/spool/mail/root": 42060 messages 42041 unread
    1 (Cron Daemon)         Mon Nov 30 12:00  29/923   "Cron /home/del_log/del_arc.sh"
    2 (Cron Daemon)         Mon Nov 30 12:01  29/923   "Cron /home/del_log/del_arc.sh"
    3 (Cron Daemon)         Mon Nov 30 12:02  29/923   "Cron /home/del_log/del_arc.sh"
    4 (Cron Daemon)         Mon Nov 30 12:03  29/923   "Cron /home/del_log/del_arc.sh"
    5 (Cron Daemon)         Mon Nov 30 12:04  29/923   "Cron /home/del_log/del_arc.sh"
    6 (Cron Daemon)         Mon Nov 30 12:05  29/923   "Cron /home/del_log/del_arc.sh"
    7 (Cron Daemon)         Mon Nov 30 12:06  29/923   "Cron /home/del_log/del_arc.sh"
    8 (Cron Daemon)         Mon Nov 30 12:07  29/923   "Cron /home/del_log/del_arc.sh"
    9 (Cron Daemon)         Mon Nov 30 12:08  29/923   "Cron /home/del_log/del_arc.sh"
   10 (Cron Daemon)         Mon Nov 30 12:09  29/923   "Cron /home/del_log/del_arc.sh"
   11 (Cron Daemon)         Mon Nov 30 12:10  29/923   "Cron /home/del_log/del_arc.sh"
   12 (Cron Daemon)         Mon Nov 30 12:11  29/923   "Cron /home/del_log/del_arc.sh"
   13 (Cron Daemon)         Mon Nov 30 12:12  29/923   "Cron /home/del_log/del_arc.sh"
   14 (Cron Daemon)         Mon Nov 30 12:13  29/923   "Cron /home/del_log/del_arc.sh"
   15 (Cron Daemon)         Mon Nov 30 12:14  29/923   "Cron /home/del_log/del_arc.sh"
>U 16 (Cron Daemon)         Mon Nov 30 12:15  29/922   "Cron /home/del_log/del_arc.sh"
 U 17 (Cron Daemon)         Mon Nov 30 12:16  29/922   "Cron /home/del_log/del_arc.sh"
--//每分钟一次调用,正好在12点上下,怪不得正好在这个时间段出现.

& n
Message 16:
From root@LIS-DB.localdomain  Mon Nov 30 12:15:01 2020
Return-Path:
X-Original-To: root
Delivered-To: root@LIS-DB.localdomain
From: "(Cron Daemon)"
To: root@LIS-DB.localdomain
Subject: Cron /home/del_log/del_arc.sh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
Date: Mon, 30 Nov 2020 12:15:01 +0800 (CST)
Status: RO

Message file RMAN.msb not found
Verify that ORACLE_HOME is set properly
--//很明显,调用/home/del_log/del_arc.sh脚本时以root用户执行,缺乏一些环境变量.
--//我自己不会使用这个命令mail的接口,n好像是下一封信, n 加上数字 好像是看第几封:
& n 42060
Message 42060:
From root@LIS-DB.localdomain  Mon Oct 31 12:59:01 2022
Return-Path:
X-Original-To: root
Delivered-To: root@LIS-DB.localdomain
From: "(Cron Daemon)"
To: root@LIS-DB.localdomain
Subject: Cron /home/del_log/del_arc.sh
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
Date: Mon, 31 Oct 2022 12:59:01 +0800 (CST)
Status: RO

Message file RMAN.msb not found
Verify that ORACLE_HOME is set properly

& n
At EOF

--//很明显运维人员在root下建立执行了/home/del_log/del_arc.sh脚本.而且脚本还没有写在oracle目录下.真不知道对方怎么想的.
--//从时间间隔看这个问题一直存在,不知道运维认真测试没有检查没有.
--//最后是为什么时间间隔是1分钟做一次(12点).为什么调用这么频繁.

# ls -l  /home/del_log/del_arc.sh
-rwxrwxrwx. 1 root root 193 2021-04-09 10:48:04 /home/del_log/del_arc.sh
--//owner,group都是root.

# cat /home/del_log/del_arc.sh
source ~/.bash_profile
exec >> /home/del_log/log/del_arch`date +%F-%H`.log
/u01/app/oracle/product/19/db_1/bin/rman target / <delete noprompt archivelog until time 'sysdate-10';
exit;
EOF
--//很明显脚本一开始时root用户执行是不行的.另外我检查发现oracle用户也有类似的信息,也是报错.我已经解决了!!

# cd /etc
# grep -r del_arc.sh *

--//恩!! 没有相关信息在/etc目录下,我个人工作习惯在手工写crontab内容放在/etc/cron.d/目录下.

# stat /home/del_log/del_arc.sh
  File: '/home/del_log/del_arc.sh'
  Size: 193             Blocks: 8          IO Block: 4096   regular file
Device: f901h/63745d    Inode: 104907001   Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:home_root_t:s0
Access: 2022-10-31 12:27:01.970203358 +0800
Modify: 2021-04-09 10:48:04.329182000 +0800
Change: 2021-04-09 10:48:04.329182000 +0800
 Birth: -

# crontab -l
* 12 * * * /home/del_log/del_arc.sh
--//看了一下crontab的格式,实际上对方写错了,它相当于12点的每分钟运行1次,13点停止执行.相当于每天12点后每分钟运行1次.共60
--//次.

# strace -e open crontab -l
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libpam.so.0", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libaudit.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libpcre.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libcap-ng.so.0", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3
open("/var/spool/cron/root", O_RDONLY)  = 4
* 12 * * * /home/del_log/del_arc.sh
+++ exited with 0 +++
--//哦,执行脚本放在var/spool/cron/目录下root文件,顺便说一下我自己很少使用cronatb命令,我一般编辑好脚本.
--//调用放在/etc/crond.d目录,建立相关crontab文件.

# ls -l /var/spool/cron
total 8
-rw-------. 1 oracle oinstall 36 2021-04-08 10:14:26 oracle
-rw-------. 1 root   root     36 2020-11-30 11:14:10 root
--//很明显从时间看对方发现错误以后在2021-04-08 另外写一个oracle用户执行的cronatb格式的文件.
--//以前的错误没有清除,明显工作的责任心太差了.

# cat /var/spool/cron/oracle
* 23 * * * /home/del_log/del_arc.sh
--//修改到23点,还是写错了!!相当于每分钟运行1次.明显做工作完成后缺乏必要的测试与检查.
--//如何取消呢?我感觉直接删除文件应该是ok的.
# crontab -e  --//编辑.
# crontab -l  --//显示.
# ls -l /var/spool/cron/
total 4
-rw-------. 1 oracle oinstall 36 2021-04-08 10:14:26 oracle
-rw-------. 1 root   root      0 2022-11-01 11:24:56 root
--//root文件大小变成0,删除它应该也没有问题.

# rm /var/spool/cron/root
rm: remove regular empty file '/var/spool/cron/root'? y

--//使用V$RMAN_OUTPUT查看:
SYS@192.168.100.235:1521/orcl> SELECT distinct sid
     , stamp
     , session_stamp
     , rman_status_stamp
  FROM V$RMAN_OUTPUT
 WHERE lower(output) like '%deleted archived log%'
 order by 4;
       SID      STAMP SESSION_STAMP RMAN_STATUS_STAMP
---------- ---------- ------------- -----------------
      4265 1119481204    1119481202        1119481203
      4265 1119481205    1119481202        1119481203
      1140 1119567604    1119567602        1119567603
      1140 1119567605    1119567602        1119567603

SYS@192.168.100.235:1521/orcl> @ stamp 1119481203
     STAMP STAMP_CONV_TIME
---------- -------------------
1119481203 2022-10-30 23:00:03

SYS@192.168.100.235:1521/orcl> @ stamp 1119567603
     STAMP STAMP_CONV_TIME
---------- -------------------
1119567603 2022-10-31 23:00:03

--//视乎输出仅仅执行1次,出现的时间戳也能对上,后面间隔1分钟的调用,因为没有archive log的删除.没有显示.
--//注: 主要原因是有一个lower(output) like '%deleted archived log%'条件限制显示信息.

SYS@192.168.100.235:1521/orcl> select * from (select distinct stamp  fROM V$RMAN_OUTPUT where stamp>=1119481205) where rownum<=10;
     STAMP
----------
1119481205
1119481206
1119481262
1119481263
1119481322
1119481323
1119481382
1119481383
1119481441
1119481442
10 rows selected.

SYS@192.168.100.235:1521/orcl> @ stamp 1119481322
     STAMP STAMP_CONV_TIME
---------- -------------------
1119481322 2022-10-30 23:02:02

SYS@192.168.100.235:1521/orcl> @ stamp 1119481382
     STAMP STAMP_CONV_TIME
---------- -------------------
1119481382 2022-10-30 23:03:02

SYS@192.168.100.235:1521/orcl> @ stamp 1119481441
     STAMP STAMP_CONV_TIME
---------- -------------------
1119481441 2022-10-30 23:04:01

--//相差1分钟.基本可以确定对方设计的是12点时,分钟调用1次/home/del_log/del_arc.sh执行.共60次
--//可以通过其对应的log文件确认:
$ grep "Recovery Manager complete" del_arch2022-10-31-23.log|wc
     60     180    1620
--//正好60次!!

--//利用crontab -e编辑文件,更正其错误.
$ crontab -e
$ crontab -l
5 23 * * * /home/del_log/del_arc.sh

--//顺便记录一下oracle用户执行的错误:
$ mail
Heirloom Mail version 12.5 7/5/10.  Type ? for help.
"/var/spool/mail/oracle": 2 messages 2 new
>N  1 (Cron Daemon)         Tue Nov  1 23:05  25/924   "Cron /home/del_log/del_arc.sh"
 N  2 (Cron Daemon)         Wed Nov  2 23:05  25/924   "Cron /home/del_log/del_arc.sh"
&
Message  1:
From oracle@LIS-DB.localdomain  Tue Nov  1 23:05:02 2022
Return-Path:
X-Original-To: oracle
Delivered-To: oracle@LIS-DB.localdomain
From: "(Cron Daemon)"
To: oracle@LIS-DB.localdomain
Subject: Cron /home/del_log/del_arc.sh
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
Date: Tue,  1 Nov 2022 23:05:02 +0800 (CST)
Status: R

/home/oracle/.bashrc: line 14: `==': not a valid identifier

--//在我.bashrc中我定义函数,导致oracle用户下的执行也报错,不知道为什么.我的函数定义如下在/home/oracle/.bashrc文件中.
== ()
{
    local in="$(echo "$@" | sed -e 's/\[/(/g' -e 's/\]/)/g')";
    echo $in | bc -lq | tr -d '\\\r' | sed -e "s/\.\([0-9]*[1-9]\)0\+$/.\1/" -e "s/\.0\+$//"
}

--//就是一个简易的计算器.
--//理论执行没有问题的,我手工执行/home/del_log/del_arc.sh ok的,不知道为什么通过crontab调用会报错.
--//个人不建议source ~/.bash_profile 这样的调用模式,应该直接把对应的环境变量写入脚本中.
--//我当前采用的方式是注解该函数,下次将等号换成js 看看,crontab调用是否报错.

--//附上stamp.sql脚本:
$ cat stamp.sql
SELECT &&1 stamp,to_date(yyyy||'/'||mm||'/'||dd||' '||hh||':'||mi||':'||ss,'yyyy-mm-dd hh24:mi:ss') stamp_conv_time  from (
SELECT &&1
        ,FLOOR (&&1 / (86400*31*12))+1988 yyyy
        ,FLOOR (MOD (&&1 / (86400*31),12))+1 mm
        ,FLOOR (MOD (&&1 / 86400, 31))+1 dd
        ,FLOOR (MOD (&&1 / 3600, 24)) hh
        ,FLOOR (MOD (&&1 / 60, 60)) mi
        ,MOD (&&1, 60) ss
        from dual);


请使用浏览器的分享功能分享到微信等