look out honey 'cause I'm using technology

Posts Tagged ‘unix’

File system full, but why?

0101010101UPDATE: posted my workaround code below, good feedback already from Ryan (djatoka dev) and I’ll be testing the proper fix on the server soon.
I’ve got a server that keeps filling up its disk space and failing to serve images after it gets to the file system full error message.  First of all let me say, I don’t blame it in the least, if the admin (aka me) doesn’t do enough to secure the server enough disk space to do its job, I say, let me have it.  But after I’ve set the suspect daemon to use a *reasonable* amount of space I stopped thinking of it as the culprit, so when this issue arose again, I looked elsewhere for the cause.  Fast forward to today, the server’s file system filled up again, and refused to serve any more data, again, I totally understand where the server is coming from, if it doesn’t have enough disk space to do its job, it shouldn’t have to apologize to anyone; it’s all on the admin (again, aka, me), but what was going on? (more…)


Resolving LSIDs with URL resolvers and CouchDB

346483297_c4cb93ab4e_mRecently I’ve been looking at ways to solve some of biodiversities’ long standing issues with LSIDs, which are, “Life Science Identifiers are a way to name and locate pieces of information on the web. Essentially, an LSID is a unique identifier for some data, and the LSID protocol specifies a standard way to locate the data (as well as a standard way of describing that data). They are a little like DOIs used by many publishers.“  I posted my thoughts to the TDWG discussion mailing list on the topic, and am reprinting it here to allow for further community commentary; Code4lib, I’m looking at you. While much of it is theoretical, it is doable, and if it covers all that needs to be addressed, would be a cool, sustainable way forward for link resolvers for all kinds of usage.

I’m with Tim on this one, and taking one of Rod’s other posts (“LSIDs, disaster or opportunity“) a bit further, I think coming up with a simple, extend-able URL resolver would give us many benefits and allow LSIDs with extra, added information around them for all to use. Looking at his example, a URL would get permanent tracking that would also post referrers, location and traffic. A summary of the link could even be a page in itself, a cached version, a screenshot, or just a scrape of the code – pulling out the HTML tags, for future reference in case the real link goes down. We could use the ability to create a customizable prefix (ie- http://someresolvr.com/bhl/SDFoijF), to somewhat follow DOI conventions, but could even save old DOIs or handles for historical purposes in a field attached to the new URL, or for reuse, making the new URL resolve to a current DOI with a simple post at the end of the new URL (ie- http://someresolvr.com/bhl/SDFoijF/DOI). In the same way we could use user input, data pulled about the URL semantically to generate RDFa (by using pyRdfa), then exposing that for all newly created URLS, and coming up with a standard to make it predictable (ie- http://someresolvr.com/bhl/SDFoijF/RDF). The example at bit.ly shows the use of Open Calais to get more background information on the original link to provide more information, but it could also be pointed to other services we provide/use in biodiversity to provide a snapshot across the board of more context/content. Users of the service could login to examine/add/edit the data by hand if desired, so they would still retain ultimate control over how their record is presented. Thus, from a simple URL, we could build a complete summary that would build on what we’re given while sharing it all back out.

Then the architecture (aka, the fun part) would be simple and distributed. A webserver able to process PHP, running the database CouchDB would be all that is needed to run the resolver. CouchDB is schema-less, so the way it handles replication is very simple, and is built to be distributed, only handing out the bits that have changed during replication, as well as scale in this manner. Having a batch of main servers behind a URL in a pooled setup (think of a simplified/smaller version of the Pool of Unix networked time servers) would allow a round-robin DNS, or a ucarp setup (“urcarp allows a couple of hosts to share common virtual IP addresses in order to provide automatic failover“), so if one main server went down, another would automatically take over, without the user needing to change the URL. Plus, if we wanted to, to battle heavy usage of the main servers we could use the idea of Primary and Secondary servers as outlined in the pool.ntp.org model, so an institution with heavy usage could become a Secondary host and run their own resolver simply, with almost no maintenance. They would just need the PHP files, which would be a versioned project, and then have a cron task to replicate the database from a pool of the main servers. The institution’s resolver could be customized to appear as their own, (ie- http://someresolvr.bhl.org/bhl/SDFoijF) and for simplicity could be read-only. This way a link like http://someresolvr.com/bhl/SDFoijF could be resolvable against any institution’s server, like http://someresolvr.bhl.org/bhl/SDFoijF or http://someresolvr.ebio.org/bhl/SDFoijF – as all of the databases would be the same, although maybe a day behind, depending on the replication schedule. New entries would only be entered on a main server, or in ‘the pool’ (ie- http://pool.someresolvr.com/), then those changes would be in the database to be handed out to all on the next replication (I won’t add my P2P ideas in this email – it may not be needed for the deltas that would need to be transfered daily or weekly). Add to all of this that CouchDB is designed as “…a distributed, fault-tolerant and schema-free document-oriented database” which would fit into what we want to do; build a store of documents (data) about a URL that we can serve, while being a permanent, sustainable resolver to the original document. If the service ever died, it could be resurrected from anyone’s copy of the database (think LOCKSS (Lots of Copies Keep Stuff Safe)), so that no data (original or accumulated) would be lost. The data could be exported from the database in XML, and then migrated from that to a desired platform.

I have not been dealing with LSIDs as long as most on this list so I expect I’m glossing over (or missing) some of the concepts, so please let me know what I am lacking. This is a needed service, and is a project I’d like to be involved in building.


How to become a hacker

Glider - ESR's hacker emblem

The Glider: A Universal Hacker Emblem

There has long been a movement in the geek community to expunge the negative thoughts attached to the word hacker, the image to the right The Glider, being one of the latest and most visible. In the beginning there were hackers (people who worked on computers, programmed and made things work) and crackers (people who would use computers for nefarious purposes, crimes, viruses, etc), these were two distinct camps, with some miscreants jumping the fence back and forth to confuse the issue. Regardless, somewhere along the way popular culture (movies, news, your teachers probably) began to equate hacking as being the bad, crime ridden activity that cracker was supposed to cover. I think it’s a moot point now, as even my Dad was shocked when he learned my annual DefCon trip is billed as “largest hacking conference in the world”.  I gave him the above explanation, but I’m unsure if he really believes it. Regardless, the original “How to become a hacker” paper written by Eric S Raymond is always cited as the quintessential word on the use of the word hacker. I found it mirrored online, and it’s a worthwhile read if you have any interest in the topic, or want to cement your own views of your hobby.  For now, if you don’t want to read the entire verbiage, here’s the intro to learn and take with you. (more…)


HOWTO: Configure nginx for Debian / Ubuntu

nginxUPDATE: I’m reworking my config blending in the security ideas found on camomel.org they’re really thought things through on this, this should make for a very secure environment.

I’m always trying new software, and with the webserver I’ve moved from Apache 1.3 to 2.0 to 2.2, and then later I moved everything over to Lighttpd, which I’ve liked, save for some memory issues that popped up.  Now, enter a web server named nginx (engine x), written by a Russian hacker. It’s already proved it’s meddle by running some of the largest Russian sites for years now.  It has the speed of Lighttpd, but with none of that memory weirdness, plus it uses a fraction of the CPU, so scaling should be smooth for highly visited sites.  It also does cool things like load balancing, reverse proxy, IMAP and POP proxy, etc, so I can see it being used in a variety of ways on a network.  It took me some time to understand how to configure it, which was a case of me just making it harder than it really is, so I wanted to post it here.  Look for updates as we go along, but this is currently backing a Production site I manage.

user					www-data www-data;
worker_processes  			5;
pid 					/var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include				/etc/nginx/mime.types;
default_type			application/octet-stream;
log_format main 		'$remote_addr $host $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" "$http_user_agent" '
'"$request_time" "$gzip_ratio"';
access_log			/var/log/nginx/access.log  main;
error_log			/var/log/nginx/error.log;
sendfile 			on;
tcp_nopush        		on;
tcp_nodelay      		off;
keepalive_timeout		65;
gzip				on;
gzip_http_version		1.1;
gzip_vary			on;
gzip_comp_level 		6;
gzip_buffers			16 8k;
#gzip_proxied			expired no-cache no-store private auth;
gzip_proxied 			any;
gzip_min_length			1000;
gzip_types			text/plain text/html text/css application/json application/x-javascript
text/xml application/xml application/xml+rss text/javascript;
server {
listen			80;
client_max_body_size	50M;
server_name 		server.domain.com;
root 			/var/www;
index  			index.html index.php;
access_log  		/var/log/nginx/access.log  main;
error_page   		500 502 503 504  /500.html;
location = /500.html {
root		/var/www;
}
location ~* ^.+.(jpg|jpeg|gif)$ {
root		/var/www;
expires         30d;
}
location ~ \.php$ {
include /etc/nginx/fastcgi_params;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /var/www$fastcgi_script_name;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
}
}
}

We like








We support


EFF - Electronic Frontier Foundation       TOR - The Onion Router       HRC - Human Rights Campaign