Learn Technology with Monit

Over the past few days I’ve been playing with software called Monit.

Monit is a utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.

Translated to a simpler phrasing, Monit sits in the background and runs tests that you tell it to on your computer, and sends you an email about the results of those tests. Optionally, it can restart programs that stop working, or do any kind of trick you can dream up based on the results of the tests.

Monit comes with it’s own email sender, so you don’t have to set up anything extra to get it to send you an alert. You will need to specify an email server, though.

Getting Monit to run is very simple. Thanks to no-names.biz, I’ve modified their howto posting to show you how to just get it running on Ubuntu 8.04 (Hardy), and I’ve used nano instead of vim as an easy-to-use editor for the configuration files. Before using this, get familiar with nano. I’ve highlighted any portion where you need to substitute anything unique to you, like your email address:

#sudo aptitude install monit

#sudo cp /etc/monit/monitrc /etc/monit/monitrc_orginal

#sudo nano -w /etc/default/monit

startup=1
CHECK_INTERVALS=60

Ctrl-O to save the file, Ctrl-X to exit nano.

#sudo nano -w /etc/monit/monitrc

set daemon 60
set logfile syslog facility log_daemon

# If you run your own mailserver (use this or the next entry):
set mailserver mail.mycompany.com

#For gmail instead of your own mailserver (all on one line):
set mailserver smtp.gmail.com port 587 username “you@gmail.com” password “password” using tlsv1 with timeout 30 seconds

set mail-format { from: monit@$HOST.mycompany.com }
set alert you@mycompany.com
set httpd port 2812
use address localhost
allow localhost
allow you:password
## Services
## You put your tests here.

Ctrl-O to save the file, Ctrl-X to exit nano.

#sudo invoke-rc.d monit start

———–
If all goes right, you should get an email shortly with the subject “monit alert — Monit instance changed localhost”. Because we used the $HOST variable in the mail-format section, you can tell which computer sent you this by looking at the from: address of the email. If you don’t get an email within a few minutes, well, the aggravation can start now while you fix the /etc/monit/monitrc file, probably by monkeying with the mailserver line.

# tail /var/log/daemon.log

The above command will give you some clues if it’s not working right, as monit will log the errors.

Now the fun begins, as we add tests to the end of the /etc/monit/monitrc file.

#sudo nano -w /etc/monit/monitrc
Scroll down to the end of the file, you can just mash the down-arrow button until you get there.
## Services
## You put your tests here.
check host mycompany.com with address mycompany.com
if failed port 80 proto http for 3 times within 5 cycles then alert
#
check host example.com with address example.com
if failed port 80 proto http for 3 times within 5 cycles then alert

Ctrl-O to save the file, Ctrl-X to exit nano.

#sudo invoke-rc.d monit restart
——
What this will do is check your remotely-hosted website, as well as the little website at example.com. If your website isn’t up in three out of five minutes, monit will email you an alert. I’m also including a check against example.com, because there’s the possibility that your computer might not be connecting to the internet properly. So if you get an email that both are failing, then it’s a good chance your website is still up, but your internal network’s got a boo-boo.

A huge amount of tests are available, and many different technologies have tests written for them. By playing these tests and researching what they do, you will get a huge dose of technology learning across many different topics. Guaranteed.

Configuration examples from the monit wiki
Service test documentation

I’m currently running this one and trying to figure out how best to tweak it to my in-house server:

## Check the general system resources such as load average,
## cpu and memory usage. Each rule specifies the tested resource,
## the limit and the action which will be performed in the case
## that the test failed.
#
check system localhost
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert