Tag Archives: varnish

HOWTO: serve jpeg2000 images with a scalable infrastructure

page1At the Biodiversity Heritage Library, we have replaced a proprietary jpeg2000 image server, that was straining under the load, with a new, open source jpeg2000 server, djatoka. Chris Freeland and Chris Moyers cover the background in far more detail on the BHL Blog, so here I’ll cover my rationale and decisions I made to provide a scalable, stable infrastructure to provide the images as efficiently as possible.

When I started sketching out how I wanted to run djatoka, I knew I wanted it to provide security, caching for performance and scalability and fault tolerance. Our server runs Tomcat, which I didn’t want to be public facing. Because of this I proxy Tomcat requests through Apache with the use of ajp_proxy, the successor to the old mod_jk. Initially I was using nginx in place of Apache, but after reading about all the functionality and performance improvements ajp_proxy offered, it was a no brainier; this is how to present Tomcat in a production environment.

Varnish web accelerator: the right tool for the job?

Varnish logoNote: The following testing and writeup occurred in the Fall of 2007 following months of research and conjecture.  I repost it now because it was not originally posted publicly, and because the results are still a driving factor in how I architect systems for web production.  This week I am implementing Varnish to enhance an image server’s ability to scale and serve images online.

The Job

A client’s new ‘enterprise’ content management system proves to be far too slow to serve the multiple dynamic web sites that it’s scheduled to handle, and a reverse proxy was recommended by the company that sells the CMS to remedy the issue.

The Tool

Varnish is an Open Source, state-of-the-art, high-performance HTTP accelerator. Varnish is targeted primarily at the FreeBSD 6+ and Linux 2.6 platforms, and will take full advantage of the virtual memory system and advanced I/O features offered by these operating systems. Unlike other reverse proxy solutions such as Squid, Varnish was written from the ground up to be a high performance caching reverse proxy, and has been praised for its speed, stability under load and use of system resources when compared to other solutions. At the client’s site we have setup Varnish, as well as Squid as reverse proxy solutions, both pointing to the same backend CMS, which in turn talks to an Oracle database. After many stress tests, the pattern remained the same. Typical results can be seen in the following graph which shows the results of stress test of 2500 requests with 40 concurrent users. The first column shows the CMS on its own for a base reading, then with Squid as a reverse proxy, and finally and with Varnish.

HOWTO: log the user's IP, not the proxy's, in nginx access log

nginx

nginx

So back in January I had a post about HOWTO: log the user’s IP, not the proxy’s, in Lighttpd access log, but today I switched that system to run nginx (actually nginx has been running since early this year, I just got lazy on running Varnish) fronted again by Varnish. I had the same issue, but not much trouble solving it. Since I often refer to my own notes on fak3r, I’m recording it here for myself, and anyone streaming in from Google. So, as I talked about before, when you run a webserver behind Varnish doing http acceleration, the webserver access logs will display the IP of the proxy (generally 127.0.0.1) instead of the end user’s IP. This not only breaks any kind of tracking or reporting you want to run against your webserver logs. Since this server runs Varnish in front of nginx, and it reveals the end user’s IP in the header as X-Forwarded-For, so it’s just a matter of making nginx use that variable in its access logs instead of the default variable defining the referring IP. Once we know that, the configuration is simple.  Edit your nginx.conf file:

Reasons to use a web proxy in a production environment

NOTE: at work I installed a web proxy to separate internal user traffic from external traffic hitting our production servers.  While I’m not part of the network team, they asked me to do this because of my prior experience and interest in such things.  The idea of this was to be a temporary fix until they get a new line installed providing greater bandwidth, but my argument is for the continuation of this segmentation even after the new line is installed.  Below is a slightly sanitized version of my arguments for this. Note that my thoughts and comments are driven by years of running networks, thus it is something I care about and have spent years thinking about, so it is wordy.  I’d be very happy to discuss this, or other solutions, via the comments below because I never want to stop learning.

I’d like to share my thoughts in as to why I think the network is better served with keeping internal traffic and public traffic separate.  Regardless of if you use the existing web proxy server, or another one with different network topology, I care less about the tool, and more about making the network and user experience better for both internal and external users

HOWTO: use monit to keep Lighttpd and Varnish running

IgnignoktThanks to a post from Steve over at debian-administration.org, I finally got around to setting up monit, the little monitoring app we use at work to keep things sane.  I was getting around to installing it at home, but it became more urgent when Varnish went down last week; without it running there’s nothing to handle requests on :80, so as a webserver it’s dead. So here’s my monitrc for the webserver Lighttpd fronted by Varnish, acting in the reverse proxy/http accel role. Varn is listening on 80, then, if things aren’t cached, it forwards things on to Lighttpd listening on 82. Lighty also listens on the standard 443 for HTTPS requests, so we check that as well.

Page 1 of 212