Thanks to a post from Steve over at debian-administration.org, I finally got around to setting up monit, the little monitoring app we use at work to keep things sane. I was getting around to installing it at home, but it became more urgent when Varnish went down last week; without it running there’s nothing to handle requests on :80, so as a webserver it’s dead. So here’s my monitrc for the webserver Lighttpd fronted by Varnish, acting in the reverse proxy/http accel role. Varn is listening on 80, then, if things aren’t cached, it forwards things on to Lighttpd listening on 82. Lighty also listens on the standard 443 for HTTPS requests, so we check that as well.
check process varnish with pidfile /var/run/varnishd.pid start program = "/etc/init.d/varnish start" stop program = "/etc/init.d/varnish stop" if cpu > 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 200.0 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop if failed host 127.0.0.1 port 80 protocol http then restart if 3 restarts within 5 cycles then timeout check process lighttpd with pidfile /var/run/lighttpd.pid start program = "/etc/init.d/lighttpd start" stop program = "/etc/init.d/lighttpd stop" if cpu > 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 200.0 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop if failed host 127.0.0.1 port 82 protocol http then restart if failed host 127.0.0.1 port 443 type tcpssl protocol http with timeout 15 seconds then restart if 3 restarts within 5 cycles then timeout
So now we have monit watching Lighttpd, Varnish, Postifx, MySQL and OpenSSH – restarting things if they fail, and emailing me the status when they do. Next on to some long term trending with Cacti providing some rrd graphing and then we’ll really have an idea of what this box is doing and be able to tune it accordingly.
Hi.
Nice article. However my lighttpd instance only throws a 400 error.
monit[9262]: HTTP error: Server returned status 400
My app server is monitored the exact same way as lighttpd but no error.
check process lighttpd with pidfile /var/run/lighttpd.pid
group root
start program = “/etc/init.d/lighttpd start”
stop program = “/etc/init.d/lighttpd stop”
if failed host 192.168.10.6 port 80 protocol http
then restart
alert XXXX@XXXX with the mail-format { subject: Alarm Webserver is down! }
if 5 restarts within 5 cycles then timeout
Any ideas ?
So, from lighty it looks like the error is:
ErrorDocument 400 /error/invalidSyntax.htmlAt work I have monit watching Lighttpd without Varnish in the way, so its block in monitrc is this:
check process lighttpd with pidfile /var/run/lighttpd.pidstart program = "/etc/init.d/lighttpd start"
stop program = "/etc/init.d/lighttpd stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host 127.0.0.1 port 80
protocol HTTP request /monit_check.php
timeout 20 seconds
then restart
if 3 restarts within 5 cycles then timeout
depends on lighttpd_bin
depends on lighttpd_rc
group system
So, from your server, can you do something like:
wget http://127.0.0.1and get a successful response? If not, you may want to place a ‘dummy’ file for it to grab (touch /var/www/test.html; chown www-data /var/www/test.html) so it’ll give you a 200 – Success response:
mbglinuxdevsrv01:/etc/monit# wget http://127.0.0.1/test.html--2008-05-01 10:13:24-- http://127.0.0.1/test.html
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
[...]
I use a PHP file for monit to check so it can ensure that Fastcgi is still running, since a restart of Lighttpd would restart the Fastcgi process, although I’m going to start running that as a separate process so monit will watch it directly.
Let me know, that should work.
monit 4.9 on Centos 5 (monit from rpmforge) tries to monitor nginx 0.7.17 on the server with IP 1.2.3.4. with file /etc/monit.d/nginx:
check process nginx with pidfile /var/run/nginx.pid
start program = “/etc/init.d/nginx start”
stop program = “/etc/init.d/nginx stop”
if failed host 1.2.3.4 port 80 protocol HTTP request / then restart
if 5 restarts with 5 cycles then timeout
When trying to start:
# /etc/init.d/monit start
Starting Process Monitor (monit): HTTP error: Server returned status 400
‘nginx’ failed protocol test [HTTP] at INET[1.2.3.4:80] via TCP
‘nginx’ trying to restart
‘nginx’ stop: /etc/init.d/nginx
‘nginx’ failed to stop
and kills nginx
wget http://1.2.3.4/ returns:
# wget http://208.113.97.31
–21:31:49– http://1.2.3.4/
Connecting to 1.2.3.4… connected.
HTTP request sent, awaiting response… 302
Location: /UnsupportedBrowser?returnUrl= [following]
–21:31:49– http://1.2.3.4/UnsupportedBrowser?returnUrl=
Connecting to 1.2.3.4:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: `UnsupportedBrowser?returnUrl=.2′
[ ] 1,427 –.-K/s in 0s
21:31:49 (136 MB/s) – `UnsupportedBrowser?returnUrl=.2′ saved [1427]
Maybe something wrong with monit’s http client? Any idea?
Hey there, I *just* saw this issue at work the other day when I turned on some monit features to monitor some of our websites, and I came across a similar error. While you have:
if failed host 1.2.3.4 port 80 protocol HTTP request / then restart if 5 restarts with 5 cycles then timeoutI’m using:
check host DOMAIN.COM with address xxx.xxx.xxx.xxx if failed port 80 protocol http with timeout 45 seconds then alertI know it’s only a slight syntax change, but that fixed it for me. Try it out, if it doesn’t fix it let me know and I’ll plugin your syntax on my box and see what it does. Also, for my nginx block I have more checks that you may want to consider:
check process nginx with pidfile /var/run/nginx.pid start program = "/etc/init.d/nginx start" stop program = "/etc/init.d/nginx stop" if cpu > 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 200.0 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop:wq!
I made it daemon
"set daemon"in monit.conf and it works OK with that config. Looks strange