HOWTO: serve jpeg2000 images with a scalable infrastructure
At the Biodiversity Heritage Library, we have replaced a proprietary jpeg2000 image server, that was straining under the load, with a new, open source jpeg2000 server, djatoka. Chris Freeland and Chris Moyers cover the background in far more detail on the BHL Blog, so here I’ll cover my rationale and decisions I made to provide a scalable, stable infrastructure to provide the images as efficiently as possible.
When I started sketching out how I wanted to run djatoka, I knew I wanted it to provide security, caching for performance and scalability and fault tolerance. Our server runs Tomcat, which I didn’t want to be public facing. Because of this I proxy Tomcat requests through Apache with the use of ajp_proxy, the successor to the old mod_jk. Initially I was using nginx in place of Apache, but after reading about all the functionality and performance improvements ajp_proxy offered, it was a no brainier; this is how to present Tomcat in a production environment.
HOWTO: Configure nginx for Debian / Ubuntu
UPDATE: I’m reworking my config blending in the security ideas found on camomel.org they’re really thought things through on this, this should make for a very secure environment.
I’m always trying new software, and with the webserver I’ve moved from Apache 1.3 to 2.0 to 2.2, and then later I moved everything over to Lighttpd, which I’ve liked, save for some memory issues that popped up. Now, enter a web server named nginx (engine x), written by a Russian hacker. It’s already proved it’s meddle by running some of the largest Russian sites for years now. It has the speed of Lighttpd, but with none of that memory weirdness, plus it uses a fraction of the CPU, so scaling should be smooth for highly visited sites. It also does cool things like load balancing, reverse proxy, IMAP and POP proxy, etc, so I can see it being used in a variety of ways on a network. It took me some time to understand how to configure it, which was a case of me just making it harder than it really is, so I wanted to post it here. Look for updates as we go along, but this is currently backing a Production site I manage.
user www-data www-data;
worker_processes 5;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr $host $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" "$http_user_agent" '
'"$request_time" "$gzip_ratio"';
access_log /var/log/nginx/access.log main;
error_log /var/log/nginx/error.log;
sendfile on;
tcp_nopush on;
tcp_nodelay off;
keepalive_timeout 65;
gzip on;
gzip_http_version 1.1;
gzip_vary on;
gzip_comp_level 6;
gzip_buffers 16 8k;
#gzip_proxied expired no-cache no-store private auth;
gzip_proxied any;
gzip_min_length 1000;
gzip_types text/plain text/html text/css application/json application/x-javascript
text/xml application/xml application/xml+rss text/javascript;
server {
listen 80;
client_max_body_size 50M;
server_name server.domain.com;
root /var/www;
index index.html index.php;
access_log /var/log/nginx/access.log main;
error_page 500 502 503 504 /500.html;
location = /500.html {
root /var/www;
}
location ~* ^.+.(jpg|jpeg|gif)$ {
root /var/www;
expires 30d;
}
location ~ \.php$ {
include /etc/nginx/fastcgi_params;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /var/www$fastcgi_script_name;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
}
}
}Software support must evolve with Open Source
As a IT contractor I’m enjoying giving my opinion when asked, and sometimes even when I’m not asked; I have the confidence to be open and honest with everyone and want them to know that. Because of this I’ve been getting to do things I otherwise would not have since they would not have known I was interested or experienced in such things. One of the things I was hired for was to setup Apache on Linux to work with their web instances. It’s been fun, and while I’ve used Apache for over 10 years, there’s always new things to learn. Recently they asked for my opinion on ’support’ options for Apache. Keeping in mind, they already have support for the hardware and support for the Linux distribution, they still think they need another support channel for Apache. To me this a big waste of money and have somewhat made my case to them. While I understand their position that this support is a way to cover themselves if Apache ‘breaks’, the fact that this software is Open Source has to change the way they have traditionally considered support.








