主机名 |
版本 |
IP |
安装软件 |
用途 |
linuxserver |
RHEL 6.5 |
192.168.230.136 |
nagios-4.0.8 nagios-plugins-2.0.3 nrpe-2.15 |
监控服务器 |
linuxclient |
RHEL 6.5 |
192.168.230.137 |
nagios-plugins-2.0.3 nrpe-2.15 |
被监控客户端 |
参照http://blog.itpub.net/28536251/viewspace-1444918/配置好YUM。
服务器端网络配置:
[root@linuxserver ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:0c:29:68:ab:26
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=192.168.230.136
NETMASK=255.255.255.0
GATEWAY=192.168.230.2
DNS1=192.168.230.2
[root@linuxserver ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=linuxserver
[root@linuxserver ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.230.136 linuxserver
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
客户端网络配置:
[root@linuxclient ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:0c:29:ba:4f:56
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=192.168.230.137
NETMASK=255.255.255.0
GATEWAY=192.168.230.2
DNS1=192.168.230.2
[root@linuxclient ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=linuxclient
[root@linuxclient ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.230.137 linuxclient
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
为方便测试,暂时先关闭服务器及客户端的防火墙,可在系统搭建完成后,再开启并设置防火墙策略。
服务器端防火墙关闭:
[root@linuxserver ~]# /etc/init.d/iptables stop
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
[root@linuxserver ~]# /etc/init.d/iptables status
iptables: Firewall is not running.
[root@linuxserver ~]# chkconfig iptables off
[root@linuxserver ~]# chkconfig --list iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
客户端防火墙关闭:
[root@linuxclient ~]# /etc/init.d/iptables stop
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
[root@linuxclient ~]# /etc/init.d/iptables status
iptables: Firewall is not running.
[root@linuxclient ~]# chkconfig iptables off
[root@linuxclient ~]# chkconfig --list iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
为方便测试,暂时先禁用服务器及客户端的SELinux,可在系统搭建完成后,再开启并设置SELinux策略。
服务器端SELinux禁用:
使用vim编辑/etc/selinux/config,将SELINUX=enforcing修改为SELINUX=disabled,然后重启。
[root@linuxserver ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
[root@linuxserver ~]# init 6
客户端SELinux禁用:
[root@linuxclient ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
[root@linuxclient ~]# init 6
[root@linuxserver ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
[root@linuxserver ~]# yum install -y httpd php
[root@linuxserver ~]# /etc/init.d/httpd start
Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.230.136 for ServerName
[ OK ]
根据上面的提示,修改httpd配置文件,结果如下:
[root@linuxserver ~]# grep ServerName /etc/httpd/conf/httpd.conf | grep 80
ServerName linuxserver:80
重启httpd正常。
[root@linuxserver ~]# /etc/init.d/httpd restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
使用浏览器访问服务器地址http://192.168.230.136/,出现如下页面说明httpd安装ok。
输入如下命令生成php测试页。然后使用浏览器访问http://192.168.230.136/phpinfo.php,出现如下页面说明php安装ok。
[root@linuxserver ~]# echo "" > /var/www/html/phpinfo.php
[root@linuxserver ~]# useradd -s /sbin/nologin nagios
[root@linuxserver ~]# mkdir /usr/local/nagios
[root@linuxserver ~]# chown -R nagios:nagios /usr/local/nagios
[root@linuxserver ~]# ll -d /usr/local/nagios
drwxr-xr-x 2 nagios nagios 4096 Mar 14 11:17 /usr/local/nagios
Nagios涉及的安装包,可在http://sourceforge.jp/projects/sfnet_nagios/releases/#及https://nagios-plugins.org/downloads/上下载。
[root@linuxserver ~]# tar -xvzf nagios-4.0.8.tar.gz
[root@linuxserver ~]# cd nagios-4.0.8
[root@linuxserver nagios-4.0.8]# ./configure --prefix=/usr/local/nagios/
[root@linuxserver nagios-4.0.8]# make all
[root@linuxserver nagios-4.0.8]# make install
# This installs the main program, CGIs, and HTML files
[root@linuxserver nagios-4.0.8]# make install-init
# This installs the init script in /etc/rc.d/init.d
[root@linuxserver nagios-4.0.8]# make install-commandmode
#This installs and configures permissions on the directory for holding the external command file
[root@linuxserver nagios-4.0.8]# make install-config
# This installs *SAMPLE* config files in /usr/local/nagios/etc
[root@linuxserver nagios-4.0.8]# make install-webconf
# This installs the Apache config file for the Nagios web interface
进入安装目录,如出现下表的6个目录,则说明安装ok。
[root@linuxserver nagios-4.0.8]# cd /usr/local/nagios/
[root@linuxserver nagios]# ls
bin etc libexec sbin share var
序号 |
目录名称 |
用途 |
1 |
bin |
可执行程序所在目录 |
2 |
etc |
配置文件所在目录 |
3 |
libexec |
外部插件所在目录 |
4 |
sbin |
CGI 文件所在目录 |
5 |
share |
网页文件所在的目录 |
6 |
var |
日志文件所在的目录 |
添加nagios服务
[root@linuxserver ~]# chkconfig --add nagios
[root@linuxserver ~]# chkconfig nagios on
[root@linuxserver ~]# chkconfig --list nagios
nagios 0:off 1:off 2:on 3:on 4:on 5:on 6:off
由于/etc/httpd/conf.d/nagios.conf定义了访问认证文件,故需要创建访问认证文件及用户名和密码,以便通过web访问nagios进行身份验证,用户名建议采用nagiosadmin,原因后面再讲。
[root@linuxserver ~]# cat /etc/httpd/conf.d/nagios.conf
# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER
#
# This file contains examples of entries that need
# to be incorporated into your Apache web server
# configuration file. Customize the paths, etc. as
# needed to fit your system.
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
Alias /nagios "/usr/local/nagios/share"
# SSLRequireSSL
Options None
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
[root@linuxserver ~]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
启动nagios:
[root@linuxserver ~]# /etc/init.d/nagios start
Starting nagios: done.
重启httpd:
[root@linuxserver ~]# /etc/init.d/httpd restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
使用浏览器访问http://192.168.230.136/nagios/,输入用户名和密码,出现如下界面,说明nagios安装ok。
Nagios的配置文件位于/usr/local/nagios/etc/,各文件具体用途如下表:
[root@linuxserver ~]# tree /usr/local/nagios/etc/
/usr/local/nagios/etc/
├── cgi.cfg
├── htpasswd.users
├── nagios.cfg
├── objects
│ ├── commands.cfg
│ ├── contacts.cfg
│ ├── localhost.cfg
│ ├── printer.cfg
│ ├── switch.cfg
│ ├── templates.cfg
│ ├── timeperiods.cfg
│ └── windows.cfg
└── resource.cfg
序号 |
文件名 |
用途 |
1 |
cgi.cfg |
控制CGI访问的配置文件 |
2 |
nagios.cfg |
主配置文件 |
3 |
resource.cfg |
变量定义文件,定义变量,以便由其他配置文件引用,如$USER1 |
4 |
commands.cfg |
命令定义配置文件,其中定义的命令可以被其他配置文件引用 |
5 |
contacts.cfg |
定义联系人和联系人组 |
6 |
localhost.cfg |
监控本地主机的配置文件 |
7 |
printer.cfg |
定义监控打印机的一个配置文件模板,默认没有启用此文件 |
8 |
switch.cfg |
定义监控路由器的一个配置文件模板,默认没有启用此文件 |
9 |
templates.cfg |
定义主机和服务的一个模板配置文件,可以在其他配置文件中引用 |
10 |
timeperiods.cfg |
定义Nagios 监控时间段的配置文件 |
11 |
windows.cfg |
监控Windows 主机的一个配置文件模板,默认没有启用此文件 |
下面对几个重要的配置文件进行说明。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/nagios.cfg | grep -v '^$'
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
object_cache_file=/usr/local/nagios/var/objects.cache
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
(省略了部分参数)
nagios.cfg是nagios的核心配置文件,其中cfg_file变量用来引用对象配置文件,如果有更多的对象配置文件,须添加到此配置文件才能生效。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/cgi.cfg | grep -v '^$'
main_config_file=/usr/local/nagios/etc/nagios.cfg
physical_html_path=/usr/local/nagios/share
url_html_path=/nagios
show_context_help=0
use_pending_states=1
use_authentication=1
use_ssl_authentication=0
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
(省略了部分参数)
该配置文件中的authorized*参数的值默认均为nagiosadmin,故前面为nagiosadmin生成密码文件,如果是使用其他的用户名,则此处就需要在nagiosadmin后面加上其他的用户名。各参数的含义参考配置文件中的注释。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/resource.cfg | grep -v '^$'
$USER1$=/usr/local/nagios/libexec
该配置文件中的变量$USER1$指定了安装nagios插件的路径。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/localhost.cfg | grep -v '^$'
define host{
use linux-server
host_name localhost
alias localhost
address 127.0.0.1
}
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
members localhost
}
define service{
use local-service
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
define service{
use local-service
host_name localhost
service_description Current Users
check_command check_local_users!20!50
}
define service{
use local-service
host_name localhost
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service
host_name localhost
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use local-service
host_name localhost
service_description Swap Usage
check_command check_local_swap!20!10
}
define service{
use local-service
host_name localhost
service_description SSH
check_command check_ssh
notifications_enabled 0
}
define service{
use local-service
host_name localhost
service_description HTTP
check_command check_http
notifications_enabled 0
}
该配置文件定义本机监控的参数及服务。其中,“linux-server”为在templates.cfg定义的主机模版,“local-service”为在templates.cfg定义的服务模版。
[root@linuxserver ~]# sed -i 's/;.*$//g' /usr/local/nagios/etc/objects/templates.cfg
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/templates.cfg | grep -v '^$'
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
register 0
}
define host{
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
register 0
}
define host{
name linux-server
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 120
notification_options d,u,r
contact_groups admins
register 0
}
define host{
name windows-server
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
hostgroups windows-servers
register 0
}
define host{
name generic-printer
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 30
notification_options d,r
contact_groups admins
register 0
}
define host{
name generic-switch
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
register 0
}
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 10
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 60
notification_period 24x7
register 0
}
define service{
name local-service
use generic-service
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
register 0
}
templates.cfg
该配置文件定义通知,主机及服务模版。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/commands.cfg | grep -v '^$'
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
define command{
command_name check_local_load
command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
}
define command{
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}
define command{
command_name check_local_users
command_line $USER1$/check_users -w $ARG1$ -c $ARG2$
}
define command{
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}
define command{
command_name check_local_mrtgtraf
command_line $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
}
define command{
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_hpjd
command_line $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
define command{
command_name check_dhcp
command_line $USER1$/check_dhcp $ARG1$
}
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}
define command{
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_tcp
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}
define command{
command_name check_udp
command_line $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}
define command{
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
}
define command{
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
}
此配置文件定义监控服务使用的命令名称及命令,引用了resource.cfg中对$USER1$的定义,在localhost.cfg中引用了其中的一些命令。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/contacts.cfg | grep -v '^$'
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
email nagios@localhost
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
此配置文件引用了templates.cfg中generic-contact的定义。
[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/timeperiods.cfg | grep -v '^$'
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
define timeperiod{
timeperiod_name workhours
alias Normal Work Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
}
define timeperiod{
name us-holidays
timeperiod_name us-holidays
alias U.S. Holidays
january 1 00:00-00:00
monday -1 may 00:00-00:00
july 4 00:00-00:00
monday 1 september 00:00-00:00
thursday 4 november 00:00-00:00
december 25 00:00-00:00
}
define timeperiod{
timeperiod_name 24x7_sans_holidays
alias 24x7 Sans Holidays
use us-holidays ; Get holiday exceptions from other timeperiod
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
此配置文件定义监控时间段,目前只使用第一个“24X7”。
安装配置好nagios后,使用浏览器访问可以看到监控服务器上的服务都没有监控到,这是由于/usr/local/nagios/libexec/目录下还没有安装外部插件程序,接下来就安装nagios插件程序nagios-plugins。
[root@linuxserver ~]# ll /usr/local/nagios/libexec/
total 0
[root@linuxserver ~]# tar -xvzf nagios-plugins-2.0.3.tar.gz
[root@linuxserver ~]# cd nagios-plugins-2.0.3
[root@linuxserver nagios-plugins-2.0.3]# ./configure --prefix=/usr/local/nagios/
[root@linuxserver nagios-plugins-2.0.3]# make && make install
再次查看/usr/local/nagios/libexec/目录可以看到增加了很多外部插件程序。
重启nagios后刷新页面,就可以看到监控服务器上面的服务状态了。
如果HTTP服务出现“HTTP WARNING: HTTP/1.1 403 Forbidden”报错,原因是nagios监控HTTP时,会监控到/var/www/html/下面的index.html文件,若没有就会提示错误,创建一个文件即可!
[root@linuxserver ~]# touch /var/www/html/index.html
[root@linuxserver ~]# /etc/init.d/httpd restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
监控服务器通过叫NRPE的附加组件对客户端进行监控。
NRPE 总共由两部分组成:
· check_nrpe 插件,位于监控主机上
· NRPE daemon,运行在远程的Linux主机上(通常就是被监控机)
按照上图,整个的监控过程如下:
当Nagios 需要监控某个远程Linux 主机的服务或者资源情况时:
1. Nagios 会运行check_nrpe 这个插件,告诉它要检查什么;
2. check_nrpe 插件会连接到远程的NRPE daemon,所用的方式是SSL;
3. NRPE daemon 会运行相应的Nagios 插件来执行检查;
4. NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。
注意:NRPE daemon 需要Nagios 插件安装在远程的Linux主机上,否则,daemon不能做任何的监控
[root@linuxserver ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
[root@linuxclient ~]# useradd nagios
[root@linuxclient ~]# passwd nagios
Changing password for user nagios.
New password:
BAD PASSWORD: it is too simplistic/systematic
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
[root@linuxclient ~]# tar -xvzf nagios-plugins-2.0.3.tar.gz
[root@linuxclient ~]# cd nagios-plugins-2.0.3
[root@linuxclient nagios-plugins-2.0.3]# ./configure --prefix=/usr/local/nagios
[root@linuxclient nagios-plugins-2.0.3]# make && make install
[root@linuxclient nagios-plugins-2.0.3]# chown nagios:nagios /usr/local/nagios/
[root@linuxclient nagios-plugins-2.0.3]# chown -R nagios:nagios /usr/local/nagios/libexec/
[root@linuxclient nagios-plugins-2.0.3]# cd
[root@linuxclient ~]# tar -xvzf nrpe-2.15.tar.gz
[root@linuxclient ~]# cd nrpe-2.15
[root@linuxclient nrpe-2.15]# ./configure
[root@linuxclient nrpe-2.15]# make all
[root@linuxclient nrpe-2.15]# make install-plugin
[root@linuxclient nrpe-2.15]# make install-daemon
[root@linuxclient nrpe-2.15]# make install-daemon-config
[root@linuxclient nrpe-2.15]# make install-xinetd
编辑/etc/xinetd.d/nrpe,为only-from参数增加监控服务器地址。
[root@linuxclient nrpe-2.15]# cat /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 192.168.230.136
}
编辑/etc/services,增加NRPE服务。
[root@linuxclient nrpe-2.15]# tail -1 /etc/services
nrpe 5666/tcp # nrpe
重启xinetd服务。
[root@linuxclient nrpe-2.15]# /etc/init.d/xinetd restart
Stopping xinetd: [FAILED]
Starting xinetd: [ OK ]
检查nrpe是否启动成功。
[root@linuxclient nrpe-2.15]# netstat -tunlp | grep 5666
tcp 0 0 :::5666 :::* LISTEN 42522/xinetd
可以看到nrpe服务启动成功,但是是启动在IPv6上,测试会报如下错误:
[root@linuxclient nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H localhost
CHECK_NRPE: Error - Could not complete SSL handshake.
在/etc/modprobe.d/dist.conf中增加如下两行,关闭IPv6,重启后,再进行测试ok。
[root@linuxclient nrpe-2.15]# tail -2 /etc/modprobe.d/dist.conf
alias net-pf-10 off
options ipv6 disable=1
[root@linuxclient nrpe-2.15]# init 6
[root@linuxclient ~]# netstat -tunlp | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 1729/xinetd
[root@linuxclient ~]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.15
NRPE的配置文件为nrpe.cfg,根据实际情况进行修改后内容如下:
[root@linuxclient ~]# grep -v '^#' /usr/local/nagios/etc/nrpe.cfg | grep -v '^$'
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1
dont_blame_nrpe=0
allow_bash_command_substitution=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda3]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
其中增加了swap的监控。客户端到这儿就配置完了,接下来就需要到服务器端增加对客户端的监控内容。
[root@linuxserver ~]# tar -xvzf nrpe-2.15.tar.gz
[root@linuxserver ~]# cd nrpe-2.15
[root@linuxserver nrpe-2.15]# ./configure
[root@linuxserver nrpe-2.15]# make all
[root@linuxserver nrpe-2.15]# make install-plugin
测试以下服务器端的check_nrpe与客户端的nrpe daemon之间的通信。
[root@linuxserver nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H 192.168.230.137
NRPE v2.15
返回版本信息,说明通信正常。
添加check_nrpe命令,命令的用法可以使用check_nrpe –h查看。
[root@linuxserver etc]# tail -5 objects/commands.cfg
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
新增加一个services.cfg文件,添加对linuxclient客户端监控的监控内容。
[root@linuxserver etc]# cat objects/services.cfg
define service{
use local-service
host_name linuxclient
service_description check-host-alive
check_command check-host-alive
}
define service{
use local-service
host_name linuxclient
service_description Current Load
check_command check_nrpe!check_load
}
define service{
use local-service
host_name linuxclient
service_description Check Disk sda3
check_command check_nrpe!check_sda3
}
define service{
use local-service
host_name linuxclient
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use local-service
host_name linuxclient
service_description Current Users
check_command check_nrpe!check_users
}
define service{
use local-service
host_name linuxclient
service_description Check Zombie Procs
check_command check_nrpe!check_zombie_procs
}
define service{
use local-service
host_name linuxclient
service_description Check Swap
check_command check_nrpe!check_swap
}
新增加一个hosts.cfg文件,定义被监控客户端的地址及相关属性信息。
[root@linuxserver etc]# cat /usr/local/nagios/etc/objects/hosts.cfg
define host{
use linux-server
host_name linuxclient
alias linuxclient
address 192.168.230.137
}
define hostgroup{
hostgroup_name bsmart-servers
alias bsmart servers
members linuxclient
}
在nagios.cfg中增加services.cfg和hosts.cfg配置文件条目。
[root@linuxserver etc]# grep 'hosts.cfg' nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
[root@linuxserver etc]# grep 'services.cfg' nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
最终的配置文件关系如下图:
[root@linuxserver etc]# /etc/init.d/nagios restart
Running configuration check...
Stopping nagios: done.
Starting nagios: done.
重启后过一会就可以看到客户端的情况了。
参考了http://www.cnblogs.com/mchina/archive/2013/02/20/2883404.html,谢谢哦!