Tag Archives: reverse proxy

HOWTO: serve jpeg2000 images with a scalable infrastructure

page1At the Biodiversity Heritage Library, we have replaced a proprietary jpeg2000 image server, that was straining under the load, with a new, open source jpeg2000 server, djatoka. Chris Freeland and Chris Moyers cover the background in far more detail on the BHL Blog, so here I’ll cover my rationale and decisions I made to provide a scalable, stable infrastructure to provide the images as efficiently as possible.

When I started sketching out how I wanted to run djatoka, I knew I wanted it to provide security, caching for performance and scalability and fault tolerance. Our server runs Tomcat, which I didn’t want to be public facing. Because of this I proxy Tomcat requests through Apache with the use of ajp_proxy, the successor to the old mod_jk. Initially I was using nginx in place of Apache, but after reading about all the functionality and performance improvements ajp_proxy offered, it was a no brainier; this is how to present Tomcat in a production environment.

Reasons to use a web proxy in a production environment

NOTE: at work I installed a web proxy to separate internal user traffic from external traffic hitting our production servers.  While I’m not part of the network team, they asked me to do this because of my prior experience and interest in such things.  The idea of this was to be a temporary fix until they get a new line installed providing greater bandwidth, but my argument is for the continuation of this segmentation even after the new line is installed.  Below is a slightly sanitized version of my arguments for this. Note that my thoughts and comments are driven by years of running networks, thus it is something I care about and have spent years thinking about, so it is wordy.  I’d be very happy to discuss this, or other solutions, via the comments below because I never want to stop learning.

I’d like to share my thoughts in as to why I think the network is better served with keeping internal traffic and public traffic separate.  Regardless of if you use the existing web proxy server, or another one with different network topology, I care less about the tool, and more about making the network and user experience better for both internal and external users

HOWTO: log the user's IP, not the proxy's, in Lighttpd access log

Lighttpd - fly lightWhen you run a webserver behind a reverse proxy or HTTP accelerator like Squid or Varnish, the webserver access logs will display the IP of the proxy (generally 127.0.0.1) instead of the end user’s IP.  This not only breaks any kind of tracking or reporting you want to run against your webserver logs, but it also takes away a datapoint I’ve had use for in general server admin tasks. This server runs Varnish in front of Lighttpd, and it reveals the end user’s IP in the header as X-Forwarded-For, so it’s just a matter of making Lighttpd (lighty) use that variable in its access logs instead of the default variable defining the referring IP. Once we know that, the configuration is simple; in lighttpd.conf, enter this: