Nagios监控系统搭建

1 环境设置

1.1 主机信息

1.2 防火墙设置

1.3 SELinux设置

2 Nagios监控linuxserver的安装配置

2.1 基础支持套件安装

2.2 安装httpdphp

2.3 用户及目录设置

2.4 安装Nagios

2.5 配置nagios

2.5.1 nagios.cfg

2.5.2 cgi.cfg

2.5.3 resource.cfg

2.5.4 localhost.cfg

2.5.5 templates.cfg

2.5.6 commands.cfg

2.5.7 contacts.cfg

2.5.8 timeperiods.cfg

2.6 安装Nagios插件

3 Nagios监控linuxclient的安装配置

3.1 原理

3.2 客户端基础支持套件安装

3.3 客户端用户设置

3.4 客户端安装nagios插件

3.5 客户端安装配置NRPE

3.6 客户端NRPE配置文件

3.7 服务器端安装NRPE

3.8 服务器端配置文件修改

3.8.1 command.cfg

3.8.2 services.cfg

3.8.3 hosts.cfg

3.8.4 nagios.cfg

3.9 配置文件关系图

3.10 服务器端重启服务

 

1 环境设置

1.1 主机信息

主机名

版本

IP

安装软件

用途

linuxserver

RHEL 6.5

192.168.230.136

nagios-4.0.8

nagios-plugins-2.0.3

nrpe-2.15

监控服务器

linuxclient

RHEL 6.5

192.168.230.137

nagios-plugins-2.0.3

nrpe-2.15

被监控客户端

参照http://blog.itpub.net/28536251/viewspace-1444918/配置好YUM

服务器端网络配置:

[root@linuxserver ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

HWADDR=00:0c:29:68:ab:26

TYPE=Ethernet

ONBOOT=yes

NM_CONTROLLED=no

BOOTPROTO=none

IPADDR=192.168.230.136

NETMASK=255.255.255.0

GATEWAY=192.168.230.2

DNS1=192.168.230.2

[root@linuxserver ~]# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=linuxserver

[root@linuxserver ~]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

192.168.230.136   linuxserver

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 

客户端网络配置:

[root@linuxclient ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

HWADDR=00:0c:29:ba:4f:56

TYPE=Ethernet

ONBOOT=yes

NM_CONTROLLED=no

BOOTPROTO=none

IPADDR=192.168.230.137

NETMASK=255.255.255.0

GATEWAY=192.168.230.2

DNS1=192.168.230.2

[root@linuxclient ~]# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=linuxclient

[root@linuxclient ~]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

192.168.230.137   linuxclient

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 

1.2 防火墙设置

为方便测试,暂时先关闭服务器及客户端的防火墙,可在系统搭建完成后,再开启并设置防火墙策略。

服务器端防火墙关闭:

[root@linuxserver ~]# /etc/init.d/iptables stop

iptables: Setting chains to policy ACCEPT: filter          [  OK  ]

iptables: Flushing firewall rules:                         [  OK  ]

iptables: Unloading modules:                               [  OK  ]

[root@linuxserver ~]# /etc/init.d/iptables status

iptables: Firewall is not running.

[root@linuxserver ~]# chkconfig iptables off

[root@linuxserver ~]# chkconfig --list iptables

iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off

 

客户端防火墙关闭:

[root@linuxclient ~]# /etc/init.d/iptables stop

iptables: Setting chains to policy ACCEPT: filter          [  OK  ]

iptables: Flushing firewall rules:                         [  OK  ]

iptables: Unloading modules:                               [  OK  ]

[root@linuxclient ~]# /etc/init.d/iptables status

iptables: Firewall is not running.

[root@linuxclient ~]# chkconfig iptables off

[root@linuxclient ~]# chkconfig --list iptables

iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off

 

1.3 SELinux设置

    为方便测试,暂时先禁用服务器及客户端的SELinux,可在系统搭建完成后,再开启并设置SELinux策略。

服务器端SELinux禁用:

使用vim编辑/etc/selinux/config,将SELINUX=enforcing修改为SELINUX=disabled,然后重启。

[root@linuxserver ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

#     enforcing - SELinux security policy is enforced.

#     permissive - SELinux prints warnings instead of enforcing.

#     disabled - No SELinux policy is loaded.

SELINUX=disabled

# SELINUXTYPE= can take one of these two values:

#     targeted - Targeted processes are protected,

#     mls - Multi Level Security protection.

SELINUXTYPE=targeted

[root@linuxserver ~]# init 6

 

客户端SELinux禁用:

[root@linuxclient ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

#     enforcing - SELinux security policy is enforced.

#     permissive - SELinux prints warnings instead of enforcing.

#     disabled - No SELinux policy is loaded.

SELINUX=disabled

# SELINUXTYPE= can take one of these two values:

#     targeted - Targeted processes are protected,

#     mls - Multi Level Security protection.

SELINUXTYPE=targeted

[root@linuxclient ~]# init 6

 

2 Nagios监控linuxserver的安装配置

2.1 基础支持套件安装

[root@linuxserver ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel

 

2.2 安装httpdphp

[root@linuxserver ~]# yum install -y httpd php

[root@linuxserver ~]# /etc/init.d/httpd start

Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.230.136 for ServerName

                                                           [  OK  ]

    根据上面的提示,修改httpd配置文件,结果如下:

[root@linuxserver ~]# grep ServerName /etc/httpd/conf/httpd.conf | grep 80

ServerName linuxserver:80

重启httpd正常。

[root@linuxserver ~]# /etc/init.d/httpd restart

Stopping httpd:                                            [  OK  ]

Starting httpd:                                            [  OK  ]

使用浏览器访问服务器地址http://192.168.230.136/,出现如下页面说明httpd安装ok

clip_image002

    输入如下命令生成php测试页。然后使用浏览器访问http://192.168.230.136/phpinfo.php,出现如下页面说明php安装ok

[root@linuxserver ~]# echo "" > /var/www/html/phpinfo.php

clip_image004

 

2.3 用户及目录设置

[root@linuxserver ~]# useradd -s /sbin/nologin nagios

[root@linuxserver ~]# mkdir /usr/local/nagios

[root@linuxserver ~]# chown -R nagios:nagios /usr/local/nagios

[root@linuxserver ~]# ll -d /usr/local/nagios

drwxr-xr-x 2 nagios nagios 4096 Mar 14 11:17 /usr/local/nagios

 

2.4 安装Nagios

    Nagios涉及的安装包,可在http://sourceforge.jp/projects/sfnet_nagios/releases/#https://nagios-plugins.org/downloads/上下载。

[root@linuxserver ~]# tar -xvzf nagios-4.0.8.tar.gz

[root@linuxserver ~]# cd nagios-4.0.8

[root@linuxserver nagios-4.0.8]# ./configure --prefix=/usr/local/nagios/

[root@linuxserver nagios-4.0.8]# make all

[root@linuxserver nagios-4.0.8]# make install

# This installs the main program, CGIs, and HTML files

[root@linuxserver nagios-4.0.8]# make install-init

# This installs the init script in /etc/rc.d/init.d

[root@linuxserver nagios-4.0.8]# make install-commandmode

#This installs and configures permissions on the directory for holding the external command file

[root@linuxserver nagios-4.0.8]# make install-config

# This installs *SAMPLE* config files in /usr/local/nagios/etc

[root@linuxserver nagios-4.0.8]# make install-webconf

# This installs the Apache config file for the Nagios web interface

 

    进入安装目录,如出现下表的6个目录,则说明安装ok

[root@linuxserver nagios-4.0.8]# cd /usr/local/nagios/

[root@linuxserver nagios]# ls

bin  etc  libexec  sbin  share  var

序号

目录名称

用途

1

bin

可执行程序所在目录

2

etc

配置文件所在目录

3

libexec

外部插件所在目录

4

sbin

CGI 文件所在目录

5

share

网页文件所在的目录

6

var

日志文件所在的目录

 

   添加nagios服务

[root@linuxserver ~]# chkconfig --add nagios

[root@linuxserver ~]# chkconfig nagios on

[root@linuxserver ~]# chkconfig --list nagios

nagios          0:off   1:off   2:on    3:on    4:on    5:on    6:off

 

由于/etc/httpd/conf.d/nagios.conf定义了访问认证文件,故需要创建访问认证文件及用户名和密码,以便通过web访问nagios进行身份验证,用户名建议采用nagiosadmin,原因后面再讲。

[root@linuxserver ~]# cat /etc/httpd/conf.d/nagios.conf

# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER

#

# This file contains examples of entries that need

# to be incorporated into your Apache web server

# configuration file.  Customize the paths, etc. as

# needed to fit your system.

 

ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"

 

#  SSLRequireSSL

   Options ExecCGI

   AllowOverride None

   Order allow,deny

   Allow from all

#  Order deny,allow

#  Deny from all

#  Allow from 127.0.0.1

   AuthName "Nagios Access"

   AuthType Basic

   AuthUserFile /usr/local/nagios/etc/htpasswd.users

   Require valid-user

 

Alias /nagios "/usr/local/nagios/share"

 

#  SSLRequireSSL

   Options None

   AllowOverride None

   Order allow,deny

   Allow from all

#  Order deny,allow

#  Deny from all

#  Allow from 127.0.0.1

   AuthName "Nagios Access"

   AuthType Basic

   AuthUserFile /usr/local/nagios/etc/htpasswd.users

   Require valid-user

 

[root@linuxserver ~]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

New password:

Re-type new password:

Adding password for user nagiosadmin

 

    启动nagios

[root@linuxserver ~]# /etc/init.d/nagios start

Starting nagios: done.

重启httpd

[root@linuxserver ~]# /etc/init.d/httpd restart

Stopping httpd:                                            [  OK  ]

Starting httpd:                                            [  OK  ]

使用浏览器访问http://192.168.230.136/nagios/,输入用户名和密码,出现如下界面,说明nagios安装ok

clip_image006

 

2.5 配置nagios

    Nagios的配置文件位于/usr/local/nagios/etc/,各文件具体用途如下表:

[root@linuxserver ~]# tree /usr/local/nagios/etc/

/usr/local/nagios/etc/

├── cgi.cfg

├── htpasswd.users

├── nagios.cfg

├── objects

   ├── commands.cfg

   ├── contacts.cfg

   ├── localhost.cfg

   ├── printer.cfg

   ├── switch.cfg

   ├── templates.cfg

   ├── timeperiods.cfg

   └── windows.cfg

└── resource.cfg

序号

文件名

用途

1

cgi.cfg

控制CGI访问的配置文件

2

nagios.cfg

主配置文件

3

resource.cfg

变量定义文件,定义变量,以便由其他配置文件引用,如$USER1

4

commands.cfg

命令定义配置文件,其中定义的命令可以被其他配置文件引用

5

contacts.cfg

定义联系人和联系人组

6

localhost.cfg

监控本地主机的配置文件

7

printer.cfg

定义监控打印机的一个配置文件模板,默认没有启用此文件

8

switch.cfg

定义监控路由器的一个配置文件模板,默认没有启用此文件

9

templates.cfg

定义主机和服务的一个模板配置文件,可以在其他配置文件中引用

10

timeperiods.cfg

定义Nagios 监控时间段的配置文件

11

windows.cfg

监控Windows 主机的一个配置文件模板,默认没有启用此文件

下面对几个重要的配置文件进行说明。

2.5.1 nagios.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/nagios.cfg | grep -v '^$'

log_file=/usr/local/nagios/var/nagios.log

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

object_cache_file=/usr/local/nagios/var/objects.cache

precached_object_file=/usr/local/nagios/var/objects.precache

resource_file=/usr/local/nagios/etc/resource.cfg

status_file=/usr/local/nagios/var/status.dat

status_update_interval=10

nagios_user=nagios

nagios_group=nagios

(省略了部分参数)

    nagios.cfgnagios的核心配置文件,其中cfg_file变量用来引用对象配置文件,如果有更多的对象配置文件,须添加到此配置文件才能生效。

 

2.5.2 cgi.cfg

[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/cgi.cfg | grep -v '^$'

main_config_file=/usr/local/nagios/etc/nagios.cfg

physical_html_path=/usr/local/nagios/share

url_html_path=/nagios

show_context_help=0

use_pending_states=1

use_authentication=1

use_ssl_authentication=0

authorized_for_system_information=nagiosadmin

authorized_for_configuration_information=nagiosadmin

authorized_for_system_commands=nagiosadmin

authorized_for_all_services=nagiosadmin

authorized_for_all_hosts=nagiosadmin

authorized_for_all_service_commands=nagiosadmin

authorized_for_all_host_commands=nagiosadmin

(省略了部分参数)

    该配置文件中的authorized*参数的值默认均为nagiosadmin,故前面为nagiosadmin生成密码文件,如果是使用其他的用户名,则此处就需要在nagiosadmin后面加上其他的用户名。各参数的含义参考配置文件中的注释。

 

2.5.3 resource.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/resource.cfg | grep -v '^$'

$USER1$=/usr/local/nagios/libexec

    该配置文件中的变量$USER1$指定了安装nagios插件的路径。

 

2.5.4 localhost.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/objects/localhost.cfg | grep -v '^$'

define host{

        use                     linux-server

        host_name               localhost

        alias                    localhost

        address                 127.0.0.1

        }

define hostgroup{

        hostgroup_name         linux-servers

        alias                   Linux Servers

        members                localhost

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                PING

        check_command                  check_ping!100.0,20%!500.0,60%

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Root Partition

        check_command                  check_local_disk!20%!10%!/

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Current Users

        check_command                  check_local_users!20!50

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Total Processes

        check_command                  check_local_procs!250!400!RSZDT

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Current Load

        check_command                  check_local_load!5.0,4.0,3.0!10.0,6.0,4.0

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                Swap Usage

        check_command                  check_local_swap!20!10

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                SSH

        check_command                  check_ssh

        notifications_enabled              0

        }

define service{

        use                             local-service        

        host_name                       localhost

        service_description                HTTP

        check_command                  check_http

        notifications_enabled              0

        }

    该配置文件定义本机监控的参数及服务。其中,“linux-server”为在templates.cfg定义的主机模版,“local-service”为在templates.cfg定义的服务模版。

 

2.5.5 templates.cfg

[root@linuxserver ~]# sed -i 's/;.*$//g' /usr/local/nagios/etc/objects/templates.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/objects/templates.cfg | grep -v '^$'

define contact{

        name                            generic-contact

        service_notification_period     24x7

        host_notification_period        24x7

        service_notification_options    w,u,c,r,f,s

        host_notification_options       d,u,r,f,s

        service_notification_commands   notify-service-by-email

        host_notification_commands      notify-host-by-email

        register                        0

        }

define host{

        name                            generic-host

        notifications_enabled           1

        event_handler_enabled           1

        flap_detection_enabled          1

        process_perf_data               1

        retain_status_information       1

        retain_nonstatus_information    1

        notification_period             24x7

        register                        0

        }

define host{

        name                            linux-server

        use                             generic-host

        check_period                    24x7

        check_interval                  5

        retry_interval                  1

        max_check_attempts              10

        check_command                   check-host-alive

        notification_period             workhours

 

 

        notification_interval           120

        notification_options            d,u,r

        contact_groups                  admins

        register                        0

        }

define host{

        name                    windows-server

        use                     generic-host

        check_period            24x7

        check_interval          5

        retry_interval          1

        max_check_attempts      10

        check_command           check-host-alive

        notification_period     24x7

        notification_interval   30

        notification_options    d,r

        contact_groups          admins

        hostgroups              windows-servers

        register                0

        }

define host{

        name                    generic-printer

        use                     generic-host

        check_period            24x7

        check_interval          5

        retry_interval          1

        max_check_attempts      10

        check_command           check-host-alive

        notification_period     workhours

        notification_interval   30

        notification_options    d,r

        contact_groups          admins

        register                0

        }

define host{

        name                    generic-switch

        use                     generic-host

        check_period            24x7

        check_interval          5

        retry_interval          1

        max_check_attempts      10

        check_command           check-host-alive

        notification_period     24x7

        notification_interval   30

        notification_options    d,r

        contact_groups          admins

        register                0

        }

define service{

        name                            generic-service

        active_checks_enabled           1

        passive_checks_enabled          1

        parallelize_check               1

        obsess_over_service             1

        check_freshness                 0

        notifications_enabled           1

        event_handler_enabled           1

        flap_detection_enabled          1

        process_perf_data               1

        retain_status_information       1

        retain_nonstatus_information    1

        is_volatile                     0

        check_period                    24x7

        max_check_attempts              3

        normal_check_interval           10

        retry_check_interval            2

        contact_groups                  admins

        notification_options            w,u,c,r

        notification_interval           60

        notification_period             24x7

         register                        0

        }

define service{

        name                            local-service

        use                             generic-service

        max_check_attempts              4

        normal_check_interval           5

        retry_check_interval            1

        register                        0

        }

templates.cfg

    该配置文件定义通知,主机及服务模版。

 

2.5.6 commands.cfg

[root@linuxserver ~]#  grep -v '^#' /usr/local/nagios/etc/objects/commands.cfg | grep -v '^$'

define command{

        command_name    notify-host-by-email

        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$

        }

define command{

        command_name    notify-service-by-email

        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$

        }

define command{

        command_name    check-host-alive

        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5

        }

define command{

        command_name    check_local_disk

        command_line    $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

        }

define command{

        command_name    check_local_load

        command_line    $USER1$/check_load -w $ARG1$ -c $ARG2$

        }

define command{

        command_name    check_local_procs

        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$

        }

define command{

        command_name    check_local_users

        command_line    $USER1$/check_users -w $ARG1$ -c $ARG2$

        }

define command{

        command_name    check_local_swap

        command_line    $USER1$/check_swap -w $ARG1$ -c $ARG2$

        }

define command{

        command_name    check_local_mrtgtraf

        command_line    $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$

        }

define command{

        command_name    check_ftp

        command_line    $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_hpjd

        command_line    $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_snmp

        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_http

        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_ssh

        command_line    $USER1$/check_ssh $ARG1$ $HOSTADDRESS$

        }

define command{

        command_name    check_dhcp

        command_line    $USER1$/check_dhcp $ARG1$

        }

define command{

        command_name    check_ping

        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

        }

define command{

        command_name    check_pop

        command_line    $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_imap

        command_line    $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_smtp

        command_line    $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$

        }

define command{

        command_name    check_tcp

        command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$

        }

define command{

        command_name    check_udp

        command_line    $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$

        }

define command{

        command_name    check_nt

        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$

        }

define command{

        command_name    process-host-perfdata

        command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out

        }

define command{

        command_name    process-service-perfdata

        command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out

        }

    此配置文件定义监控服务使用的命令名称及命令,引用了resource.cfg中对$USER1$的定义,在localhost.cfg中引用了其中的一些命令。

 

2.5.7 contacts.cfg

[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/contacts.cfg | grep -v '^$'

define contact{

        contact_name                    nagiosadmin           

        use                             generic-contact       

        alias                           Nagios Admin          

        email                           nagios@localhost      

        }

define contactgroup{

        contactgroup_name       admins

        alias                   Nagios Administrators

        members                 nagiosadmin

        }

    此配置文件引用了templates.cfggeneric-contact的定义。

 

2.5.8 timeperiods.cfg

[root@linuxserver ~]# grep -v '^#' /usr/local/nagios/etc/objects/timeperiods.cfg | grep -v '^$'

define timeperiod{

        timeperiod_name 24x7

        alias           24 Hours A Day, 7 Days A Week

        sunday          00:00-24:00

        monday          00:00-24:00

        tuesday         00:00-24:00

        wednesday       00:00-24:00

        thursday        00:00-24:00

        friday          00:00-24:00

        saturday        00:00-24:00

        }

define timeperiod{

        timeperiod_name workhours

        alias           Normal Work Hours

        monday          09:00-17:00

        tuesday         09:00-17:00

        wednesday       09:00-17:00

        thursday        09:00-17:00

        friday          09:00-17:00

        }

define timeperiod{

        timeperiod_name none

        alias           No Time Is A Good Time

        }

define timeperiod{

        name                    us-holidays

        timeperiod_name         us-holidays

        alias                   U.S. Holidays

        january 1               00:00-00:00    

        monday -1 may           00:00-00:00    

        july 4                  00:00-00:00    

        monday 1 september      00:00-00:00    

        thursday 4 november     00:00-00:00     

        december 25             00:00-00:00    

        }

define timeperiod{

        timeperiod_name 24x7_sans_holidays

        alias           24x7 Sans Holidays

        use             us-holidays             ; Get holiday exceptions from other timeperiod

        sunday          00:00-24:00

        monday          00:00-24:00

        tuesday         00:00-24:00

        wednesday       00:00-24:00

        thursday        00:00-24:00

        friday          00:00-24:00

        saturday        00:00-24:00

        }

此配置文件定义监控时间段,目前只使用第一个“24X7”。

 

 

2.6 安装Nagios插件

安装配置好nagios后,使用浏览器访问可以看到监控服务器上的服务都没有监控到,这是由于/usr/local/nagios/libexec/目录下还没有安装外部插件程序,接下来就安装nagios插件程序nagios-plugins

[root@linuxserver ~]# ll /usr/local/nagios/libexec/

total 0

clip_image008

[root@linuxserver ~]# tar -xvzf nagios-plugins-2.0.3.tar.gz

[root@linuxserver ~]# cd nagios-plugins-2.0.3

[root@linuxserver nagios-plugins-2.0.3]# ./configure --prefix=/usr/local/nagios/

[root@linuxserver nagios-plugins-2.0.3]# make && make install

再次查看/usr/local/nagios/libexec/目录可以看到增加了很多外部插件程序。

重启nagios后刷新页面,就可以看到监控服务器上面的服务状态了。

clip_image010

如果HTTP服务出现“HTTP WARNING: HTTP/1.1 403 Forbidden”报错,原因是nagios监控HTTP时,会监控到/var/www/html/下面的index.html文件,若没有就会提示错误,创建一个文件即可!

[root@linuxserver ~]# touch /var/www/html/index.html

[root@linuxserver ~]# /etc/init.d/httpd restart

Stopping httpd:                                            [  OK  ]

Starting httpd:                                            [  OK  ]

 

3 Nagios监控linuxclient的安装配置

3.1 原理

    监控服务器通过叫NRPE的附加组件对客户端进行监控。

clip_image011

NRPE 总共由两部分组成:

·         check_nrpe 插件,位于监控主机上

·         NRPE daemon,运行在远程的Linux主机上(通常就是被监控机)

按照上图,整个的监控过程如下:

Nagios 需要监控某个远程Linux 主机的服务或者资源情况时:

1.    Nagios 会运行check_nrpe 这个插件,告诉它要检查什么;

2.    check_nrpe 插件会连接到远程的NRPE daemon,所用的方式是SSL

3.    NRPE daemon 会运行相应的Nagios 插件来执行检查;

4.    NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。

注意:NRPE daemon 需要Nagios 插件安装在远程的Linux主机上,否则,daemon不能做任何的监控

 

3.2 客户端基础支持套件安装

 [root@linuxserver ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel

 

3.3 客户端用户设置

[root@linuxclient ~]# useradd nagios

[root@linuxclient ~]# passwd nagios

Changing password for user nagios.

New password:

BAD PASSWORD: it is too simplistic/systematic

BAD PASSWORD: is too simple

Retype new password:

passwd: all authentication tokens updated successfully.

 

3.4 客户端安装nagios插件

[root@linuxclient ~]# tar -xvzf nagios-plugins-2.0.3.tar.gz

[root@linuxclient ~]# cd nagios-plugins-2.0.3

[root@linuxclient nagios-plugins-2.0.3]# ./configure --prefix=/usr/local/nagios

[root@linuxclient nagios-plugins-2.0.3]# make && make install

[root@linuxclient nagios-plugins-2.0.3]# chown nagios:nagios /usr/local/nagios/

[root@linuxclient nagios-plugins-2.0.3]# chown -R nagios:nagios /usr/local/nagios/libexec/

 

3.5 客户端安装配置NRPE

[root@linuxclient nagios-plugins-2.0.3]# cd

[root@linuxclient ~]# tar -xvzf nrpe-2.15.tar.gz

[root@linuxclient ~]# cd nrpe-2.15

[root@linuxclient nrpe-2.15]# ./configure

[root@linuxclient nrpe-2.15]# make all

[root@linuxclient nrpe-2.15]# make install-plugin

[root@linuxclient nrpe-2.15]# make install-daemon

[root@linuxclient nrpe-2.15]# make install-daemon-config

[root@linuxclient nrpe-2.15]# make install-xinetd

    编辑/etc/xinetd.d/nrpe,为only-from参数增加监控服务器地址。

[root@linuxclient nrpe-2.15]# cat /etc/xinetd.d/nrpe

# default: on

# description: NRPE (Nagios Remote Plugin Executor)

service nrpe

{

        flags           = REUSE

        socket_type     = stream

        port            = 5666

        wait            = no

        user            = nagios

        group           = nagios

        server          = /usr/local/nagios/bin/nrpe

        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd

        log_on_failure  += USERID

        disable         = no

        only_from       = 127.0.0.1 192.168.230.136

}

    编辑/etc/services,增加NRPE服务。

[root@linuxclient nrpe-2.15]# tail -1 /etc/services

nrpe        5666/tcp               # nrpe

重启xinetd服务。

[root@linuxclient nrpe-2.15]# /etc/init.d/xinetd restart

Stopping xinetd:                                           [FAILED]

Starting xinetd:                                           [  OK  ]

 

    检查nrpe是否启动成功。

[root@linuxclient nrpe-2.15]# netstat -tunlp | grep 5666

tcp        0      0 :::5666        :::*            LISTEN      42522/xinetd

    可以看到nrpe服务启动成功,但是是启动在IPv6上,测试会报如下错误:

[root@linuxclient nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H localhost

CHECK_NRPE: Error - Could not complete SSL handshake.

    /etc/modprobe.d/dist.conf中增加如下两行,关闭IPv6,重启后,再进行测试ok

[root@linuxclient nrpe-2.15]# tail -2 /etc/modprobe.d/dist.conf

alias net-pf-10 off

options ipv6 disable=1

[root@linuxclient nrpe-2.15]# init 6

[root@linuxclient ~]# netstat -tunlp | grep 5666

tcp        0      0 0.0.0.0:5666   0.0.0.0:*         LISTEN      1729/xinetd

[root@linuxclient ~]# /usr/local/nagios/libexec/check_nrpe -H localhost

NRPE v2.15

 

3.6 客户端NRPE配置文件

    NRPE的配置文件为nrpe.cfg,根据实际情况进行修改后内容如下:

[root@linuxclient ~]# grep -v '^#' /usr/local/nagios/etc/nrpe.cfg | grep -v '^$'

log_facility=daemon

pid_file=/var/run/nrpe.pid

server_port=5666

nrpe_user=nagios

nrpe_group=nagios

allowed_hosts=127.0.0.1

 

dont_blame_nrpe=0

allow_bash_command_substitution=0

debug=0

command_timeout=60

connection_timeout=300

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_sda3]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200   

command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

    其中增加了swap的监控。客户端到这儿就配置完了,接下来就需要到服务器端增加对客户端的监控内容。

 

3.7 服务器端安装NRPE

[root@linuxserver ~]# tar -xvzf nrpe-2.15.tar.gz

[root@linuxserver ~]# cd nrpe-2.15

[root@linuxserver nrpe-2.15]# ./configure

[root@linuxserver nrpe-2.15]# make all

[root@linuxserver nrpe-2.15]# make install-plugin

    测试以下服务器端的check_nrpe与客户端的nrpe daemon之间的通信。

[root@linuxserver nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H 192.168.230.137

NRPE v2.15

    返回版本信息,说明通信正常。

 

3.8 服务器端配置文件修改

3.8.1 command.cfg

    添加check_nrpe命令,命令的用法可以使用check_nrpe –h查看。

[root@linuxserver etc]# tail -5 objects/commands.cfg

# 'check_nrpe' command definition

define command{

        command_name    check_nrpe

        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

        }

 

3.8.2 services.cfg

    新增加一个services.cfg文件,添加对linuxclient客户端监控的监控内容。

[root@linuxserver etc]# cat objects/services.cfg

define service{

        use                     local-service

        host_name               linuxclient

        service_description     check-host-alive

        check_command           check-host-alive

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Current Load

        check_command           check_nrpe!check_load

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Check Disk sda3

        check_command           check_nrpe!check_sda3

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Total Processes

        check_command           check_nrpe!check_total_procs

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Current Users

        check_command           check_nrpe!check_users

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Check Zombie Procs

        check_command           check_nrpe!check_zombie_procs

        }

 

define service{

        use                     local-service

        host_name               linuxclient

        service_description     Check Swap

        check_command           check_nrpe!check_swap

        }

3.8.3 hosts.cfg

    新增加一个hosts.cfg文件,定义被监控客户端的地址及相关属性信息。

[root@linuxserver etc]# cat /usr/local/nagios/etc/objects/hosts.cfg

define host{

        use                     linux-server

        host_name               linuxclient

        alias                   linuxclient

        address                 192.168.230.137

        }

 

define hostgroup{

        hostgroup_name          bsmart-servers

        alias                   bsmart servers

        members                 linuxclient

        }

 

3.8.4 nagios.cfg

    nagios.cfg中增加services.cfghosts.cfg配置文件条目。

[root@linuxserver etc]# grep 'hosts.cfg' nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/hosts.cfg

[root@linuxserver etc]# grep 'services.cfg' nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/services.cfg

 

3.9 配置文件关系图

    最终的配置文件关系如下图:

clip_image013

 

3.10 服务器端重启服务

[root@linuxserver etc]# /etc/init.d/nagios restart

Running configuration check...

Stopping nagios: done.

Starting nagios: done.

    重启后过一会就可以看到客户端的情况了。

clip_image015

 

参考了http://www.cnblogs.com/mchina/archive/2013/02/20/2883404.html,谢谢哦!

请使用浏览器的分享功能分享到微信等