OP5 Monitor - How to troubleshoot argument parsing

Background

Sometimes one needs to troubleshoot the argument parsing or the results of strangely behaving check plugins, notification scripts, etc, that are being executed from within OP5 Monitor. This article explains how to install a generic command wrapper script, pwrap, which simplifies the troubleshooting of things like this.

The pwrap command wrapper script

Installed in a way that temporarily replaces the executable file subject to troubleshooting.
Behaves as if the real script/binary was executed.
Logs received arguments and environment variables (prior to executing the real script/binary).
Logs the stdout and stderr streams of the executed command (while it's running, just like tee).
Logs the return code of the executed command (given that the script was not prematurely terminated).

The script can be found here.

Installation instructions

In the instructions found below, the /opt/plugins/check_nrpe check plugin is used, but this can be replaced with any other executable in the system, such as the /opt/monitor/op5/notify/notify.pl script responsible for sending alert notifications.

Creating a backup of the executable file

Prior to installing the pwrap script, the executable file subject to troubleshooting must be placed in a backup location. The pwrap script supports three different path name variations:

original_executable_name.bak (a .bak filename extension appended)
.bak_original_executable_name (a .bak_ string prepended to the filename)
bak/original_executable_name (the executable file placed in a sub directory called bak)

The executable file can be moved to this location at once, but it is recommended to make a copy instead. For example, in case the executable file is a check plugin that is periodically run by OP5 Monitor, copying the file instead of moving it avoids any temporary "file not found" issues, that could possibly otherwise appear at least until the next step is completed.

# cp -pv /opt/plugins/check_nrpe{,.bak}
'/opt/plugins/check_nrpe' -> '/opt/plugins/check_nrpe.bak'

Installing the wrapper script

The pwrap script should be installed at the original location of the executable that is subject to troubleshooting overwriting the original file in case it was copied (not moved) in the previous step of the instructions. The pwrap script will automatically find the original executable in one of the bak path variations.

# zcat pwrap.sh.gz > /opt/plugins/check_nrpe

Download the attached pwrap.sh.gz file and upload it to your server, prior to running the command above.

Executing the wrapped command

# /opt/plugins/check_nrpe -H localhost -s -c test -a "ooops this "wont be seen as" a single arg"
CHECK_NRPE: Received 0 bytes from daemon. Check remote server logs for error messages.

The log directory

The log files are stored in a /tmp sub directory tree, created like this:

/tmp/pwrap/<basename of wrapped executable>/<year>-<month>/<day>/<hour>/<minute><second>.<nanosecond>/

For example, if check_nrpe was executed June 19th 2014 16:33:56, the resulting directory ends up like this:

# ls -l /tmp/pwrap/check_nrpe/2014-06/19/16/3356.849158453/total 28
-rw-r--r-- 1 monitor apache   50 Jun 19 16:33 1
-rw-r--r-- 1 monitor apache   14 Jun 19 16:33 2
-rw-r--r-- 1 monitor apache   49 Jun 19 16:33 arg
-rw-r--r-- 1 monitor apache   23 Jun 19 16:33 cmd
-rw-r--r-- 1 monitor apache    1 Jun 19 16:33 code
-rw-r--r-- 1 monitor apache 1638 Jun 19 16:33 env
-rw-r--r-- 1 monitor apache 1948 Jun 19 16:33 main
-rw-r--r-- 1 monitor apache   27 Jun 19 16:33 real

This means that a new directory is created for each run (unless two commands are executed in the same nanosecond of course...), containing 7 or 8 files.

1
- The standard output (stdout) data generated by the executed command.
2
- The standard error (stderr) data generated by the executed command, if any.
arg
- All command line arguments, except the name of the called command. Each argument is null terminated, just like the /proc/*/cmdline files.
cmd
- The name of the running command, just like it was called, such as /opt/plugins/check_nrpe, ./check_nrpe or check_nrpe.
code
- The return code of the executed command.
env
- All environment variables, also null terminated, just like the arg file.
main
- All collected information, except the stdin/stderr data, formatted in a human readable manner.
real
- The path to the actual executable file that is run and wrapped.

The standard input (stdin) stream is not collected, but commands executed by nagios/naemon are not fed anything on stdin, anyway.

Contents of the log files

Thanks to the log files, it's easy to determine what the command line arguments actually looked like...

# cat /tmp/pwrap/check_nrpe/2014-06/19/16/3356.849158453/main
cmd(/opt/plugins/check_nrpe -H localhost -s -c test -a ooops this wont be seen as a single arg)

arg000(-H)
arg001(localhost)
arg002(-s)
arg003(-c)
arg004(test)
arg005(-a)
arg006(ooops this wont)
arg007(be)
arg008(seen)
arg009(as a single arg)

env(SHELL=/bin/bash)env(TERM=xterm)...
env(SHLVL=2)
env(LOGNAME=root)

realbin(/opt/plugins/check_nrpe.bak)

ret(3)

# cat /tmp/pwrap/check_nrpe/2014-06/19/16/3356.849158453/1
CHECK_NRPE: Received 0 bytes from daemon. Check remote server logs for error messages.

Re-executing the command

In some cases it could be useful to re-execute the command which was previously executed by the wrapper script. Perhaps some system issue has been resolved and now you would like to find out if the command works better this time around.

The following example shows how to use the xargs tool to execute the command again, exactly the same way as before, but without the wrapping. The -0 argument means that the list of arguments read on stdin (from the arg file) are null separated. The -t argument means that the resulting command line executed by xargs is displayed.

# cd /tmp/pwrap/check_nrpe/2014-06/19/16/3356.849158453 && xargs -0t $(<real) < arg/opt/plugins/check_nrpe.bak -H localhost -s -c test -a ooops this wont be seen as a single argCHECK_NRPE: Received 0 bytes from daemon. Check remote server logs for error messages.

Using the real file this way means that the actual backed up executable file will be run. The $(<real) part can replaced with another executable file path to run some other program with the same argument set.

Restoring the wrapped executable

Once the troubleshooting is complete, simply move the executable from the backup location back to its original location (overwriting the wrapper script).

# mv -v /opt/plugins/check_nrpe{.bak,}
'/opt/plugins/check_nrpe.bak' -> '/opt/plugins/check_nrpe'

Articles in this section

OP5 Monitor - How to troubleshoot argument parsing

Background

The pwrap command wrapper script

Installation instructions

Creating a backup of the executable file

Installing the wrapper script

Executing the wrapped command

The log directory

Contents of the log files

Re-executing the command

Restoring the wrapped executable

Comments

Articles in this section

Background

The pwrap command wrapper script

Installation instructions

Creating a backup of the executable file

Installing the wrapper script

Executing the wrapped command

The log directory

Contents of the log files

Re-executing the command

Restoring the wrapped executable

Related articles