Tag Archives: apache

HOWTO sort web-server logs to find top users

The other day I came across a situation where a web-server was getting hammered, and we needed to know who the offend(ers) were. While watching a logfile tail by at high speeds is always fun, we wanted to be able to sort the web-server access log and find top users, to be able to narrow down where the traffic was coming from. While we don’t want to block users that want to access our data, sometimes we need to throttle things back so one requester doesn’t overwhelm all the available bandwidth and make the site unusable for others. So after some playing around and digging on Google, we came up with a nice, succinct one liner to do this, here it is:

cat /path/to/access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail

HOWTO: serve jpeg2000 images with a scalable infrastructure

page1At the Biodiversity Heritage Library, we have replaced a proprietary jpeg2000 image server, that was straining under the load, with a new, open source jpeg2000 server, djatoka. Chris Freeland and Chris Moyers cover the background in far more detail on the BHL Blog, so here I’ll cover my rationale and decisions I made to provide a scalable, stable infrastructure to provide the images as efficiently as possible.

When I started sketching out how I wanted to run djatoka, I knew I wanted it to provide security, caching for performance and scalability and fault tolerance. Our server runs Tomcat, which I didn’t want to be public facing. Because of this I proxy Tomcat requests through Apache with the use of ajp_proxy, the successor to the old mod_jk. Initially I was using nginx in place of Apache, but after reading about all the functionality and performance improvements ajp_proxy offered, it was a no brainier; this is how to present Tomcat in a production environment.

HOWTO: Configure nginx for Debian / Ubuntu

nginxUPDATE: I’m reworking my config blending in the security ideas found on camomel.org they’re really thought things through on this, this should make for a very secure environment.

Software support must evolve with Open Source

SupportAs a IT contractor I’m enjoying giving my opinion when asked, and sometimes even when I’m not asked; I have the confidence to be open and honest with everyone and want them to know that. Because of this I’ve been getting to do things I otherwise would not have since they would not have known I was interested or experienced in such things. One of the things I was hired for was to setup Apache on Linux to work with their web instances. It’s been fun, and while I’ve used Apache for over 10 years, there’s always new things to learn. Recently they asked for my opinion on ‘support’ options for Apache. Keeping in mind, they already have support for the hardware and support for the Linux distribution, they still think they need another support channel for Apache. To me this a big waste of money and have somewhat made my case to them. While I understand their position that this support is a way to cover themselves if Apache ‘breaks’, the fact that this software is Open Source has to change the way they have traditionally considered support.