Radagast Monitoring and Maintenance

From Cnwiki

Jump to: navigation, search

Contents

Monitoring

The following is monitored continously by the nagios3 daemon on radagast (Web interface: http://radagast.netlab.eng.au.dk/nagios3/):

  • Radagast
    • CPU load
    • Disk space
    • HTTP
    • SSH

The following also need to be monitored:

  • Radagast
    • Hardware errors

The following persons subscribe to a daily (07:00) status report: egki

The following persons subscribe to all alerts: egki

The follwoing persons subscribe to alerts on the following hosts and services:

  • Radagast:
    • CPU load:
    • Disk space:
    • HTTP:
    • SSH:

Installation guide for nagios3

Installation:

   $ sudo apt-get install nagios3

Configuration:

   $ cd /etc/nagios3/conf.d/
   
   # The host/service to monitor 
   $ sudo cp localhost_nagios2.cfg localhost_nagios2.cfg.factory-defaults
   $ sudo chmod -w localhost_nagios2.cfg.factory-defaults
   $ $EDITOR localhost_nagios2.cfg
   
   # The contacts to alert
   $ sudo cp contacts_nagios2.cfg contacts_nagios2.cfg.factory-defaults
   $ sudo chmod -w contacts_nagios2.cfg.factory-defaults
   $ sudo $EDITOR contacts_nagios2.cfg
   
   # Add a read only user
   sudo htpasswd /etc/nagios/htpasswd.users rouser
   # grant rouser read access:
   $ sudo emacs /etc/nagios3/cgi.cfg
   $ cat /etc/nagios3/cgi.cfg
   ...
   authorized_for_all_services=nagiosadmin,rouser
   ...
   authorized_for_all_hosts=nagiosadmin,rouser
   ...
   $ sudo service nagios3 restart

Periodic report by email

A cron task runs the following script (`send-nagios-report.sh`):

   #!/bin/bash
   NAGIOS_USER=rouser # user on the web interface
   NAGIOS_PASS= # NAGIOS_USERs password
   NAGIOS_PAGE=http://radagast.netlab.eng.au.dk/cgi-bin/nagios3/status.cgi # page to retrieve
   RCVR_EMAILS='BUM@eng.au.dk, BUM@eng.au.dk, BUM@eng.au.dk, ' #list of comma seperated email addresses to receive mail
   LOCAL_FILE=/tmp/nagios-report.html # name to use for tmp file
   
   # retrieve report
   wget --quiet --convert-links --user=$NAGIOS_USER --password=$NAGIOS_PASS $NAGIOS_PAGE -O $LOCAL_FILE
   
   #Send email
   IFS=","
   for adr in $RCVR_EMAILS ; do
       cat $LOCAL_FILE | mail -a "Content-type: text/html" -s "[Nagios report] Radagast" $adr
   done
   
   rm $LOCAL_FILE

Maintenance

Automated

NOT YET IMPLEMENTED: These task are run automatically on midnight of first saturday every month:

  • Check file system
  • Defragmentate file system

Manual

The following two sets of task are to be performed every month/every six months and registered in the log below:

Every month

  • One week in advance: Inform service owners of virtual machines about scheduled downtime
  • Update installed packages by entering the following commands in the shell on radagast:
   sudo apt-get update 
   sudo apt-get upgrade 
  • Check if a reboot is required by checking for existence of /var/run/reboot-required:
   cat /var/run/reboot-required
   sudo ./virsh-save-all.sh
  • Reboot:
   sudo reboot
   sudo ./virsh-restore-all.sh

Be ware: guest time is frozen from virsh-save-all.sh to virsh-restore-all.sh. Can be fixed afterwards with sudo hwclock --hctosys.

Every six months

  • Server access reviewed within the last 6 months.
  • Firewall rules reviewed in the last 6-12 months.
  • Unused packages have been removed.

Logs for manual maintenance

Monthly maintenance log
Date Name Comment
2015-03-06 Egon Kidmose Updated, restarted.
2015-04-07 Egon Kidmose Updated, restarted.
2015-04-29 Sergi Rotger Griful Updated, restarted.
2015-06-04 Sergi Rotger Griful Updated, restarted.
2015-07-07 Sergi Rotger Griful Updated, restarted.
2015-08-04 Sergi Rotger Griful Updated.
2015-10-12 Sergi Rotger Griful Updated.
2015-12-07 Sergi Rotger Griful Updated.
2016-05-25 Jacob Høxbroe Jeppesen Updated, restarted.
Biannual maintenance log
Date Name Comment
TEST TEST TEST

Backup

Access Control

The tables below lists which persons have access to which resource. The tables just lists persons with usernames within the OS and with SSH access.

Gandalf

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2013  ? nNone
Rune s s s date  ?
Søren  ?

Radagast HOST (radagast.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Radagast GUEST 1 (radagast1.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Radagast GUEST 2 (radagast2.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Radagast GUEST 3 (radagast3.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Radagast GUEST 4 (radagast4.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Radagast GUEST 5 (radagast5.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Radagast GUEST 6 (radagast6.netlab.eng.au.dk)

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Beckhoff CX-2030

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?

Golem

Person User Name E-mail Granted By Access from Access until Comments
Sergi an example srgr@eng.au.dk Sergi 10/2015  ? nNone
Rune  ?
Søren  ?
Personal tools