Child pages
  • Monitoring VMware vSphere infrastructure with check_vmware_v2
Skip to end of metadata
Go to start of metadata

Purpose

The purpose of this article is to describe how OP5 MonitorNaemon or Nagios can be used with the check_vmware_v2 plugin to monitor your VMware vSphere infrastructure.

What can be monitored?

The plugin can be used to monitor the infrastructure managed by a VMware vCenter server or a VMware ESXi server; such as its datacenters, clusters, hosts and virtual machines. It can check a large range of different metrics and statuses depending on the targeted infrastructure; for example CPU load, memory usage, network activity, runtime states, etc..

The plugin uses the Python SDK for the VMware vSphere API, which makes it compatible with the previous four versions of vSphere. The Python SDK does not need to be installed on the OP5 Monitor server for the plugin to work.

Installation

Download the latest RPM package of the plugin to the machine where you are running Monitor from. Install it by running:

yum localinstall naemon-check-vmware-<version>-<release>.<platform>.rpm

Configuration

The service configuration files are installed in /etc/op5/check_vmware.

The service is installed with a working configuration, but you may for example want to configure the port used by the check_vmware service process, or the vCenter authentication details known by the service.

Service configuration

The service.cfg file is used to specify configuration variables for the service WSGI application.

Most of the available configuration variables are documented by the respective library they belong to:

Service-specific configuration

The Check VMware service defines the following additional configuration variables.

  • SERVERS

    The SERVERS configuration can be used to set up default authentication credentials for the service, so that username and password does not have to be passed via the CLI. SERVERS should be set to a Python dict containing the authentication details for the vSphere servers the service should know about. See example below:

    SERVERS = {
        'vcenter.op5.com': {
            'username': 'username',
            'password': 'password',
            'ignore_ssl': False
        }
    }
    

    Specifying a single host in SERVERS makes that host the default host used when the --host option is not specified via the CLI.

  • VSAN_SDK_ENABLED

    The VSAN_SDK_ENABLED configuration controls whether or not the service attempts to load the Virtual SAN Management SDK, which enables more vSAN related checks. By default this configuration is set to False. Set it to True to enable the vSAN SDK.

    Note that the following vSAN related checks do not require the vSAN SDK to function and are always available: vsan.healthvsan.usage and vsan.disk_usage. All the other vSAN related checks depend on the vSAN SDK.

Gunicorn configuration

The gunicorn.cfg file is used to configure the Gunicorn HTTP server that runs the service as a WSGI application. This is where we can configure the host and port the service runs on.

See http://docs.gunicorn.org/en/stable/settings.html for documentation on the available Gunicorn configuration.

Starting the service

The plugin has a service that runs in the background to avoid making a new HTTP connection to VMware for every check. The plugin service also needs a running Redis instance that is used for caching.

Depending on your architecture, run the commands below.

Starting the service on EL6

service redis start

service check_vmware start

And to make sure the services start after a system reboot:

chkconfig redis on

chkconfig check_vmware on

Starting the service on EL7

systemctl start redis

systemctl start check_vmware

And to make sure the services start after a system reboot:

systemctl enable redis

systemctl enable check_vmware

Running the check command

The check command is installed in /opt/plugins/check_vmware_v2.

Run check_vmware_v2 --help to list all the command options.

Available counters

The --list-counters and --list-all-counters options can be used to list all available counters.

The --list-counters option lists counters actually available on a specified managed entity.

The --list-all-counters option lists all counters known by the vCenter server, but they may not necessarily be available on any accessible managed entity.

Add the --verbose option for more information about the listed counters.

Examples

  • Check the current CPU MHz usage of a virtual machine:

    > /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t vm -n pn-dhcp-debian8 cpu.usagemhz.average
    CHECK_VMWARE OK - cpu.usagemhz.average is 10MHz | 'cpu.usagemhz.average'=10MHz
  • Check the current memory usage of a host system:

    > /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com mem.usage.average
    CHECK_VMWARE OK - mem.usage.average is 28.8% | 'mem.usage.average'=28.8%
  • Check the current memory usage of a host system without going through the vCenter:

    > /opt/plugins/check_vmware_v2 --host labesxi1.it.op5.com mem.usage.average
    CHECK_VMWARE OK - mem.usage.average is 28.8% | 'mem.usage.average'=28.8%
  • Check the current storage usage of a host system and list all its datastores:

    > /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage -vvv
    CHECK_VMWARE OK - all 7 results are ok
    ok: labesxi1-local: 14.83% (19.06GB / 128.5GB)
    ok: LabStorage1: 29.32% (600.46GB / 2047.75GB)
    ok: LabStorage2: 24.77% (507.14GB / 2047.75GB)
    ok: LabStorage4: 24.67% (505.27GB / 2047.75GB)
    ok: LabStorage3: 24.13% (494.18GB / 2047.75GB)
    ok: LabStorage5: 25.68% (525.85GB / 2047.75GB)
    ok: LabStorage6: 24.04% (492.2GB / 2047.75GB)
    | 'labesxi1-local'=14.83% LabStorage1=29.32% LabStorage2=24.77% LabStorage3=24.13%   LabStorage4=24.67% LabStorage5=25.68% LabStorage6=24.04%
  • Check the current storage usage of a specific datastore:

    > /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage.LabStorage1
    CHECK_VMWARE OK - LabStorage1: 29.32% (600.46GB / 2047.75GB) | LabStorage1=29.32%
  • Check the current storage usage of a list of specific datastores:

    > /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage.LabStorage1,LabStorage2
    CHECK_VMWARE OK - all 2 results are ok | LabStorage1=29.32% LabStorage2=24.77%