Tag Archives: biodiversity

Distributing biodiversity data globally

My current project at work will take me far into next year, and that’s good because I’m facing an unprecedented amount of data, that will only continue to grow.  Because of this I’m finally getting to put my money where my mouth is.  For years I’ve talked about my ideas and theories about how I could network disparate systems together and have them leverage each other to keep everything in sync.  So, while working with Open Source to push boundaries I seem to find more ways to do more complex things.  One basic idea that I’m working on now is that data sets are huge, and are only going to get huger (and hugerer) as time goes on, how to handle this has been solved a few different ways.  Usually it’s someone like the Internet Archive who have 1000s of computers networked together to share the data (they are using some parts of hadoop for the distributed file system, and then nutch for search indexing) – but it’s still working from one central point of failure.  I started doing research to find out how this has been solved before, and if my idea of building a BitTorrent network was sound – and I found some great information to build on.  As I’m setting up my demo BitTorrent tracker in Debian, this info keeps me thinking of the best ways to implement my ideas.  Much of my progress is due to the very helpful advice of Paul at Geograph Torrent Archive, a project that has somewhat similar goals.

Nomina meeting in Woods Hole

Woods Hole, MA