Tag Archives: caching

HOWTO: serve jpeg2000 images with a scalable infrastructure

page1At the Biodiversity Heritage Library, we have replaced a proprietary jpeg2000 image server, that was straining under the load, with a new, open source jpeg2000 server, djatoka. Chris Freeland and Chris Moyers cover the background in far more detail on the BHL Blog, so here I’ll cover my rationale and decisions I made to provide a scalable, stable infrastructure to provide the images as efficiently as possible.

When I started sketching out how I wanted to run djatoka, I knew I wanted it to provide security, caching for performance and scalability and fault tolerance. Our server runs Tomcat, which I didn’t want to be public facing. Because of this I proxy Tomcat requests through Apache with the use of ajp_proxy, the successor to the old mod_jk. Initially I was using nginx in place of Apache, but after reading about all the functionality and performance improvements ajp_proxy offered, it was a no brainier; this is how to present Tomcat in a production environment.

Varnish web accelerator: the right tool for the job?

Varnish logoNote: The following testing and writeup occurred in the Fall of 2007 following months of research and conjecture.  I repost it now because it was not originally posted publicly, and because the results are still a driving factor in how I architect systems for web production.  This week I am implementing Varnish to enhance an image server’s ability to scale and serve images online.

The Job

A client’s new ‘enterprise’ content management system proves to be far too slow to serve the multiple dynamic web sites that it’s scheduled to handle, and a reverse proxy was recommended by the company that sells the CMS to remedy the issue.

The Tool

Varnish is an Open Source, state-of-the-art, high-performance HTTP accelerator. Varnish is targeted primarily at the FreeBSD 6+ and Linux 2.6 platforms, and will take full advantage of the virtual memory system and advanced I/O features offered by these operating systems. Unlike other reverse proxy solutions such as Squid, Varnish was written from the ground up to be a high performance caching reverse proxy, and has been praised for its speed, stability under load and use of system resources when compared to other solutions. At the client’s site we have setup Varnish, as well as Squid as reverse proxy solutions, both pointing to the same backend CMS, which in turn talks to an Oracle database. After many stress tests, the pattern remained the same. Typical results can be seen in the following graph which shows the results of stress test of 2500 requests with 40 concurrent users. The first column shows the CMS on its own for a base reading, then with Squid as a reverse proxy, and finally and with Varnish.

Allow Varnish to reuse its shared object

Varnish logoBACKGROUND:  The following is a proposal I submitted to the Varnish developers in order to make it simpler to integrate Varnish (an HTTP accelerator for web sites) into production environments.  fak3r uses Varnish in front of its webserver, Lighttpd, so it’s likely that the page you’re now reading was served to you not by the webserver, but via Varnish.

Currently Varnish requires a C compiler to be present on the machine it’s running on, since it needs to compile the VCL config file into a shared object each time it starts. During shutdown, Varnish removes this shared object since it will be rebuilt during the next start. This routine repeats regardless of if anything has changed in the VCL config file, and serves as a road bump to getting Varnish into certain production environments since traditionally development applications (such as the C compiler) are not allowed in such instances. For now I am putting aside the arguments as to why it’s is acceptable to have development applications in production instances, since that argument’s outcome will vary in different situations, and I am aiming for a solution that will cover all instances.