Child pages
  • Synchronize and reinsert report data
Skip to end of metadata
Go to start of metadata

This how-to will describe how to refresh and rebuild the back-end data that is used in creating reports of different kinds (availability, SLA, alert history, etc.). This process collects the naemon log data of all configured op5 Monitor nodes (masters, peers and pollers) and uses this data to rebuild the report_data database table.

This process can possibly destroy data, and should only be performed if recommended by a technical contact at op5.

Prerequisites

  • A basic understanding of Linux, SSH and the command line interface.
  • A planned service window where all monitoring performed by op5 Monitor will be temporarily disabled (including all peers/pollers).
  • No scheduled downtime of host or service objects being active at the time of the service window.
  • The currently installed version of op5 Monitor is 7.0.3 or later.
  • The "op5 community" package repository and "support tools" are installed and ready. See this FAQ entry for more information.

 

Process

  1. Verify that the currently installed op5 Monitor version is 7.0.3 or later. Example below.

    # cat /etc/op5-monitor-release
    VERSION=7.0.3



  2. Verify that no hosts or services are currently within a scheduled downtime, nor will enter a scheduled downtime anytime soon. This information can be found via the Scheduled Downtime page in the web interface.

    Recurring scheduled downtime entries, if any, are inserted around midnight – make sure the process is not started any time near midnight.



  3. Log on to all op5 Monitor nodes (masters, peers and pollers) via SSH, and perform the steps below at each and every node.
     

    1. Make sure that the "op5 community" package repository and the "support tools" are installed.

    2. Shut down some of the system services using the commands below.

      service op5kad stop
      service httpd stop
      mon stop



    3. Back up the current report data, using the command below. Make sure that there is sufficient free space in the current working directory prior to executing this command.

      mysqldump merlin report_data | gzip > ~/merlin.report_data.sql.gz



    4. Back up naemon's state information using the command below.

      cp -pv /opt/monitor/var/status.sav ~/status.sav.bak



  4. Using SSH, log on to the op5 Monitor node that is used in generating reports.
     
  5. Launch the tool that rebuilds the report data, using the command below. In this example, log data from September 1st, 2014 and onward is collected and processed.

    mon mt report-data-reinsert -s 2014-09-01 

    Follow the instructions displayed on-screen upon executing the command. An example of what the process looks like can be found at the end of this article.

    This tool should not be used to process log data generated by op5 Monitor version 6.0.x. In most cases, this means that the date should not be set to any earlier than June 2013. However, this depends on what previous versions have been in use, and at what time. If in doubt, consult your technical contact at op5.



  6. Start the system services again, using the commands below.

    mon start
    service httpd start
    service op5kad start

    Start up the services only at the node where the op5 mt report-data-reinsert tool was just run.



  7. Generate a report that contained invalid data the last time around – does it look better this time? Then go ahead and start the system services on all op5 Monitor nodes. If it doesn't look any better, please consult your technical contact at op5.

 

Restoring the backup

In case of trouble, you can restore the backup files that were created in the instructions above.

Unless the op5 mt report-data-reinsert tool hit the final stage of its process (where it says Deleting old report data entries), no live data has been modified and restoring the backup files should not be needed – you can just start up the system services again and everything should continue working like usual.

 

 

  1. Shut down some of the system services using the commands below.

    service op5kad stop
    service httpd stop
    mon stop



  2. Restore the old report data into the database, using the command below.

    zcat ~/merlin.report_data.sql.gz | mysql merlin



  3. Restore naemon's previous state data, using the command below.

    cp -pv ~/status.sav.bak /opt/monitor/var/status.sav



  4. Start the system services again, using the commands below.

    mon start
    service httpd start
    service op5kad start

 

Example of report-data-reinsert execution

root@master01:~# op5 mt report-data-reinsert -s 2014-09-01

Verifying node command execution capabilities...
Testing (master01)... ok
Testing (master02)... ok
Testing (poller01)... ok

Next up is:
1) Collect alert log data (since 2014-09-01 00:00:00) from the listed nodes.
2) Sort and deduplicate the data.
3) Write the results to file: /tmp/alerts.1409522400.1423572510.p7lQYZ.log

Please be advised that the sorting will create additional temporary files in
/tmp (or $TMPDIR if set via environ), which might require large amounts of
disk space. This amount cannot be pre-determined.

The process might also be very time consuming, depending on the amount of data
to read at each node. Only the alert events found in the log data at each node
is downloaded (compressed) via the network (in case of remote nodes).

Shutting down Monitor at all listed nodes before continuing is recommended!

Continue (y/N)? y

Tue Feb 10 13:48:56 CET 2015 (master01) starting...
Tue Feb 10 13:51:51 CET 2015 (master01) done (entries: 104168) (errors: 0)
Tue Feb 10 13:51:51 CET 2015 (master02) starting...
Tue Feb 10 13:53:08 CET 2015 (master02) done (entries: 114406) (errors: 0)
Tue Feb 10 13:53:08 CET 2015 (poller01) starting...
Tue Feb 10 13:53:43 CET 2015 (poller01) done (entries: 23153) (errors: 0)

Final number of collected log entries: 126370

Next up is:
1) Delete all current report data entries that are timestamped 2014-09-01 00:00:00 or more recently.
2) Insert the report data entries found in '/tmp/alerts.1409522400.1423572510.p7lQYZ.log'

Continue (y/N)? y

Verifying MySQL connectivity... ok
Deleting old report data entries... ok

Importing 17.15 MiB of data from 1 files
Importing data: 100.00% (17.15 MiB) done
17.15 MiB, 126370 lines imported in 9.689s.
Creating sql table indexes. This will likely take ~16 seconds
788923 database entries indexed in 12 seconds

All done!