Tag Archives: performance

Distributing biodiversity data globally

My current project at work will take me far into next year, and that’s good because I’m facing an unprecedented amount of data, that will only continue to grow.  Because of this I’m finally getting to put my money where my mouth is.  For years I’ve talked about my ideas and theories about how I could network disparate systems together and have them leverage each other to keep everything in sync.  So, while working with Open Source to push boundaries I seem to find more ways to do more complex things.  One basic idea that I’m working on now is that data sets are huge, and are only going to get huger (and hugerer) as time goes on, how to handle this has been solved a few different ways.  Usually it’s someone like the Internet Archive who have 1000s of computers networked together to share the data (they are using some parts of hadoop for the distributed file system, and then nutch for search indexing) – but it’s still working from one central point of failure.  I started doing research to find out how this has been solved before, and if my idea of building a BitTorrent network was sound – and I found some great information to build on.  As I’m setting up my demo BitTorrent tracker in Debian, this info keeps me thinking of the best ways to implement my ideas.  Much of my progress is due to the very helpful advice of Paul at Geograph Torrent Archive, a project that has somewhat similar goals.

Reasons to use a web proxy in a production environment

NOTE: at work I installed a web proxy to separate internal user traffic from external traffic hitting our production servers.  While I’m not part of the network team, they asked me to do this because of my prior experience and interest in such things.  The idea of this was to be a temporary fix until they get a new line installed providing greater bandwidth, but my argument is for the continuation of this segmentation even after the new line is installed.  Below is a slightly sanitized version of my arguments for this. Note that my thoughts and comments are driven by years of running networks, thus it is something I care about and have spent years thinking about, so it is wordy.  I’d be very happy to discuss this, or other solutions, via the comments below because I never want to stop learning.

I’d like to share my thoughts in as to why I think the network is better served with keeping internal traffic and public traffic separate.  Regardless of if you use the existing web proxy server, or another one with different network topology, I care less about the tool, and more about making the network and user experience better for both internal and external users