look out honey 'cause I'm using technology

HOWTO build your own open source Dropbox clone

I KAN HAZ OPEN-SRC DROPBX?UPDATE: Thanks to everyone who has contributed to this, and the Reddit thread, as it has provided some great ideas building off of my concept.  I’m starting to rethink about how we could have version control on top of things, and I’ll update things when I have more to share.  Also, does anyone have iFolder (thanks for the proper linksalubrium) working?  It looks like you need SUSE Linux, which I don’t have access to, plus I know most Novell projects need a *ton* of Mono dependencies installed to have any of their stuff working, at least on the server side; but it sounds like they have Mac, Linux and Windows clients, which is encouraging.  While for my needs something a bit more ‘close to the bone’ (as below) might be better for the server side, having it be inter-operable with something like iFolder could provide a lot more functionality for others.

First off, if you haven’t tried Dropbox, you should check it out; sync all of your computers via the Dropbox servers, their basic free service gives you 2Gigs of space and works cross-platform (Windows, Mac, Linux).  I use it daily at home and work, and just having a live backup of my main data for my work workstation, my home netbook, and any other computer I need to login to is a huge win.  Plus, I have various ’shared’ folders that distribute certain data to certain users that I’ve granted access to, this means work details can be updated and automatically distributed to the folks I want to review/use the data.  I recommend everyone try it out, and see how useful it is, it’s turned into a game changer for me.  So a few months ago they made headlines on supporting Linux as they released the client as open source. While this got hopes up for many, it was only the client that was open source, the server is still proprietary.  While slightly disappointing, this is fine, they’re a company trying to make money.  I don’t fault them for this, it’s just that a free, portable service like that would be a killer app.

Meanwhile at work I’m working on a solution to sync large data clusters online and the project manager described it as the need for ‘Dropbox on steroids’.  Before I had thought it was more complicated, but after thinking about it, I realized he was right.  Look, Dropbox is a great idea, but it obviously is just a melding of rsync, with something watching for file changes to initiate the sync, along with an easy to use front end.  From there I just started looking at ways this could work, and there are more than a few; here’s how I made it work.

Linux now includes inotify, which is a kernel subsystem that provides file system event notification.  From there all it took was to find an application that listens to inotify and then kicks off a command when it hears of a change.  I tried a few different applications like inocron, inosync and iwatch, before going with lsyncd.   While all of them could work, lsyncd seemed to be the most mature, simple to configure and fast.  Lsyncd uses inotify to watch a specified directory for any new, edited or removed files or directories, and then calls rsync to take care of business.  So let’s get started in making our own open source Dropbox clone with Debian GNU/Linux (lenny)

Ladies and gentlemen, start your engines servers!

First, you need 2 severs; one being the server and the other the client. (you could do this on one host if you wanted to see how it works for a proof of concept)

Install OpenSSH server

First you’ll need to install OpenSSH Server on the remote system:
apt-get install openssh-server

Configure SSH for Passwordless Logins

You’ll need to configure passwordless logins between the two hosts you want to use, this is how rsync will pass the files back and forth.  I’ve previously written a HOWTO on this topic, so we’ll crib from there.

First, generate a key:

ssh-keygen -t rsa

(Enter)

You shouldn’t have a key stored there yet, but if you do it will prompt you now; make sure you overwrite it.

Enter passphrase (empty for no passphrase):

(Enter)

Enter same passphrase again:

(Enter)

We’re not using passphrases so logins can be automated, this should only be done for scripts or applications that need this functionality, it’s not for logging into servers lazily, and it should not be done as root!

Now, replace REMOTE_SERVER with the hostname or IP that you’re going to call when you SSH to it, and copy the key over to the server:

cat ~/.ssh/id_rsa.pub | ssh REMOTE_SERVER 'cat - >> ~/.ssh/authorized_keys2'

Set the permissions to a sane level:

ssh REMOTE_SERVER 'chmod 700 .ssh'

Lastly, give it a go to see if it worked:

ssh REMOTE_SERVER

You should be dropped to a prompt on the remote server. If not you may need to redo your .ssh directory, so on both servers:

`mv ~/.ssh ~/.ssh-old`

and goto 10

Install rsync and lsyncd

Next up is to install rsync and lsyncd.  First, rsync is simple, and could already be installed (you don’t need to run it as a server, just the client), make sure you have it with:

apt-get install rsync

Next is lsyncd.  There is no official Debian package yet, but it’s simple to build from source and install.  First off, if you don’t have build essentials you’ll need them, as well as libxml2-dev to build the lsyncd source.  Installing those is as simple as:

apt-get install libxml2-dev build-essentials

Now we’ll get the lsyncd code (you can check for a newer version at http://lsyncd.googlecode.com) and build that:

wget http://lsyncd.googlecode.com/files/lsyncd-1.26.tar.gz
tar -zxf lsyncd-1.26.tar.gz
cd lsyncd-1.26
./configure
make; make install

This install does not install the configuration file, so we’ll do that manually now:

cp lsyncd.conf.xml /etc/

Configure lsyncd

Next up, we’ll edit the configuration file now located in /etc  The file is a simple, well documented XML file, and mine ended up like so – just be sure to change the source and target hosts and paths to work with your systems:

<lsyncd version="1.25">     
    <settings> 	
        <logfile filename="/var/log/lsyncd"/>  	
        <!--Specify the rsync (or other) binary to call--> 	
        <binary filename="/usr/bin/rsync"/>  	
        <!--uncomment to create a file containing pid of the daemon--> 	
        <!--pidfile      filename="/tmp/pid"/-->  	
        <!--this specifies the arguments handled to the rsync (or other) binary. 
        option is the default literal. only '%r' will be replaced with r when recursive
	operation is wanted, d when not. exclude file will be replaced with -exclude-from FILE 
	source will be the source path to sync from destination will be the
        destination path to sync to --> 	
        <callopts> 		
            <option text="-lt%r"/> 		
            <option text="--delete"/> 		
            <exclude -file/> 		
        <source /> 		
        <destination /> 	
        </callopts>     
    </settings> 
    <directory> 	
        <source path="/var/www/sync_test"/> 	
        <target path="desthost::module/"/> 	
        <!--  	or it can also be an absolute path for localhost  	
        <target path="/absolute/path/to/target"> 	--> 
    </directory> 
</lsyncd> 

Launch lsyncd in debug for testing

We’re ready to give it a go, may as well run it in debug for fun and to learn how lsyncd does what it does:

lsyncd --conf /etc/lsyncd.conf.xml --debug

Watch for errors, if none are found, continue.

Add files and watch them sync

Now we just need to copy some files into this directory on the source box:

/var/www/sync_test

And again, watch for any errors on the screen, if these come back as a failed connection it’ll be an SSH/key issue, common, and not too difficult to solve. From here add some directories and watch how they’re queued up, and then take a look at them on the remote box: from this point out it “just works”. Now give it more to do by adding files and directories, and then the logging for errors while they sync. As it stands the system uses the source system as the preferred environment, so any files that change, or are added or removed, will be processed on the remote system. This is analogous to how Dropbox works, you can use multiple sources (your laptop, your desktop, etc) and their server serves as the remote system, keeping all the clients in line.

Conclusion

You should now have a basic, working Dropbox style setup for your own personal use. I had this running and used it to sync my netbook back to my home server, and then have my work desktop sync to my home server, so both the netbook and the desktop would stay in sync without me doing anything besides putting files in the specfied folder. For my week long test I ran a directory alongside my Dropbox directory just to see how they both acted, and I didn’t have any failures along the way.

Now we have is a simple Dropbox style app that is lightweight, with a functional back-end running rsync, which is a known stable app that will scale, and while it doesn’t provide the front-end and web view that Dropbox does, that could be an easy part for a UX developer to tackle. The cool thing is, we have a solution that works, and other options like the apps I described in the beginning, can be dropped in and replace the functionality lsyncd provides in case they can do something better. For now, I’m playing around with it to learn the ins and outs of the system to see how it will behave long term under a much larger store (50Gig to start) to keep in check. I will also work on better integrating this solution it into a working system, and update this tread with init scripts, reports, or maybe even a web view beyond just an index view from Apache or nginx. Ideally we could have a web front end that would intelligently report if a file is complete on the server, and if the file is completely mirrored on another server or client. P2P or Bitorrent would also be really cool to consider with this, and I’m sure there will be more applications for a setup like this once we’ve it around as a resource for a time. Can you think of more applications for this? Did you get it to work? Can you think of a better way to do this?



Related posts

    
Private
  • I too would be very interested in some form of a Windows compatible client.
    Have there been any advances in this area?
  • chongopants
    oh yeah lsync not available in url given btw.....
  • chongopants
    Hey man awesome how - to, I am trying to do the same thing but on a remote server to win/mac. Most of us are on mac, I dunno what your using, but once you set this up how do you setup the clients? Right now I am rsyncing everything up, but once done I wanna put it all on there. I have found the problem with other services like webdav, etc, they tend to freak out with large files, or large amounts of files.... and everything else is lacking in speed. All help is appreciated.
  • Arthur Czuma
    Sorry to leave an unrelated comment, but I couldn't figure out your email; I'm embarrassed to say I spent a good twenty minutes on it and barely got anywhere.

    I'm interested in advertising on your site, specifically in this post. Let me know if you're interested.
  • pantro
    apt-get install build-essential not build-essentials
  • Hi
    i am looking for someone to install and configure, i saw you wrote about this solution.

    we are a company based on remote working
    we would like to have Isyncd or any kind of "dropbox" clone installed and configured on our servers
    to be used by our workers and out side people related to us.

    i would appriciate if someone could get back to me
    regard a price quote and terms

    best regards,
    Yaniv
    yanivhanina@gmail.com
  • I just ITPed lsyncd, see http://bugs.debian.org/cgi-bin/bugreport.cgi?bu... that will make your setup even easier with Debian.
  • gramps
    Thanks for your research on this. I have your Dropbox Clone now working with the data from my laptop going to my server. I was this data to also exist on my netbook. So if I set the netbook to use the same directories on the server as laptop when run will it pickup the data on the server and send it to my netbook.
  • akaihola
    A few notes:

    - Unison provides two-way synchronization and has both a command-line and a GTK user interface

    - there are issues with rsync on Cygwin; it's difficult to find exact versions which work flawlessly

    - git won't be good for very large datasets
  • Got some coverage at Linux Magazine: http://www.linux-magazine.com/Online/Blogs/Prod...
  • Adam
    Did you look into the multiple client idea? If I made a change on one client and it gets synced to the server, how would my other client know about it? I'd like to use this to sync some files between my iMac with my MacBook, so any ideas of how to set it up would be great!
  • The other client would know about it when it rsync'd to the server and found newer things that it didn't have. While this wouldn't happen automatically, it would happen the next time the other client had a file change locally, or if it called it via cron every so often. Not ideal, but it would work. The rsyc command just always targets the "server" as the authoritative copy unless something on the client has changed earlier.
  • Athas
    Is there a WebDAV client that handles caching and synchronization of mounted WebDAV servers? If so, setting up subversion as an HTTP server and enabling autoversioning would seem to fix almost all of the problems.

    It wouldn't be true version control, because the last change would be committed without reporting potential conflicts, but it's pretty close.
  • georgygoshin
    Didn't you looked to Novell iFolder? It's cross platform and make the synchronization but does not plays with versions.
  • Didn't check the whole comment thread but versions could be realised by taking a look at rsnapshot. Just an idea...
  • Yes, I still need to investigate how rsnapshot could fit into this, thanks for reminding me.
  • As git seems like a complicated canon to me rsnapshot can simply be used as an rsync shell/replacement whenever you fire off your sync. rsnapshot simply wraps around rsync to create snapshots instead of simple syncs. Every snapshot is complete via hard links.
  • Cool, this sounds ideal, plus lsyncd is sort of a wrapper, or at least a watcher, that calls rsync anytime it sees action in a watched directory. We'll now have the disk space to store versioning info, so this seems like a good fit. Thanks
  • Push a link to this article into the rsnapshot mailinglist and ask for the best integration. Or better: Find out how to replace the rsync calls in lsyncd with the appropriate rsnapshot calls which then will fire the rsync calls.
  • Jordan
  • Cool Idea, although I didn't see if this was possible to set up using windows on the server.
  • No, since my way is using inotify which is built into the Linux kernel. However, as mentioned above, kqueue provides the same functionality under OS X and FreeBSD, so it's likely there's a similar utility in Windows. After you discover that, maybe we can figure out a way to utilize rsync in a similar fashion. I suspect installing cygwin (http://sources.redhat.com/cygwin/) would make this task far easier, but I'm sure a Windows guru could do this within Windows only, I don't really know.
  • Jordan
    Hey there,

    just wanted to let you know there is an ubuntu installation guide for iFolder which may be of help. There's quite a few mono dependancies though!

    https://help.ubuntu.com/community/iFolderEnterp...
  • This is pretty neat. Good idea, really. I hope this is going to grow and I'd be happy to help out in case you want to start an official project out of it.
  • Do any of the linux tools do the equivalent of dedup while transferring the blocks? I think dropbox does: http://sharevm.wordpress.com/2009/09/17/dropbox...
  • In theory this looks good and I haven't tried it out yet. Would be nice to have version control in there too, so adding an SVN commit command onto the hook wouldn't be a bad idea if you wanted it - this also might help solve the problem of conflict resolution.
    Also, there's no Windows client which is a slight shame - although I'm sure some stuff could be done with cygwin etc.
  • git and incron on the server side may be an option for your versioning idea, see http://andrew.mcmillan.net.nz/blog/using_incron...
  • salubrium
    Novell had a product called ifolder, which they open sourced and I used it for about 2 years but it was a badly managed project and clients ended up not building on it. When Dropbox came out, I swapped to that but it seems that somebody has picked up on ifolder again and it's an active project. Ifolder does exactly what dropbox does including versioning etc. It was a mono-based project and written in C#, so that might turn a few people off it but check it out. http://www.kablink.org/
  • I did find this while I was doing research, but it looked partially abandoned. Additionally I've worked on a Novell sponored project before, Hula, and was not happy with how the project was run, and eventually dropped/sold. Still, if this code works and is being developed I think it looks very promising, plus with the web interface would be a boon to others who don't want to get their hands dirty ;) Will be installing this soon and trying it out.
  • salubrium
    You're right about the whole heap of mono dependencies for ifolder. It was a pain in the ass to build and get working and it got to the stage that it became all too hard for me to work with. One thing I really wish Dropbox would do is to detect other Dropbox clients on the local network and synch directly to them and one of them synch to the server. As it is now, I think Dropbox synchs to the server and then syncs the same set of files back to each of the clients in the same network. This is unnecessary web traffic.

    I forgot that I actually went from using Ifolder to Mindquarry, which was a Java based oss project using SVN as it's source control. It also had a few other bits attached ie: A wiki & tasks. Check-ins were manual but the timeline versioning system was great. The company couldn't get it's 2nd round of funing and the founders had to go and get jobs. I haven't used it or try and build it since then. http://code.google.com/p/mindquarry/
  • Jakob Stoeck
    "I really wish Dropbox would do is to detect other Dropbox clients on the local network and synch directly"

    Dropbox actually does this (I think two months ago it was in the beta version)
  • Check it out, new version of Dropbox box includes:

    - LAN sync

    I think that's what we're talking about here.
    http://forums.dropbox.com/topic.php?id=15519&re...
  • I believe this is a coming feature, I could really see it being useful to share documents in an office; we already do this, but don't have anything larger than word/powerpoint docs, so those transfer up and back pretty quickly. Still, a bon-jour type service would be very cool, make it a 'read-only' server for local clients that aren't hooked into Dropbox for example
  • furicle
    But one of the big benefits to Dropbox is multiple clients. You can make a change on any client and it propagates out to all the clients - not just master slave like this is...
  • True, I haven't tested it this way yet, but with Rsync you can set it up to function as a 'hub' - with the center being the server, and the clients all preferring whatever new things the server has. Someone tell me if I'm right on this, but yeah, it's definitely a good use case that has to work. I'll look at it closer.
  • File versioning could be done with rdiff-backup or even git. A better way however would be to use an opensolaris server with ZFS and create a snapshot of the filesystem with each revision. This moves all the processing to the server and the client can simply upload the files.
  • Funny, I looked at rdiff-backup first, but using git seems like a great idea. As for ZFS, heck ya, I know FreeBSD supports it too, either of those would just rock - I need to play around with it to better unstand it, and OpenSolaris may be the best/next path.. I'm still bummed they dropped ZFS support from Snow Leopard, that will be a killer feature if OS X gets that. Great suggestions, thanks!
  • David Graves
    This is very interesting, but what about previous file versioning like what dropbox has?
  • Good point, this (sourcebox?) does not address that at all, I was talking last week about the idea of making the mirrored directory be a git repository, and have any revisions handled this way along with the adds and deletes, but I haven't worked any more on it.
  • Right you are, kqueue does the same job as inotify, and I thought about that when I was writing this; how this would work crossplatform. Not that I'm trying to reverse engineer Dropbox ;) but it's just interesting to see how this could be handled on the different platforms. So we know how to do it in Linux, then both OS X (http://developer.apple.com/mac/library/document...) and FreeBSD (http://www.madison-gurkha.com/publications/kqueue/) can use kqueue to provide the same function as inotify does here. Then it's just a matter of finding an interface to interact with it, and almost without looking I've found some python bindings out there (http://www.freebsdsoftware.org/devel/py-kqueue....). So, I suspect making this work in a similar way on OS X and FreeBSD wouldn't be hard at all, if someone wants to give it a go, let me know, I'm happy to help and "play along at home". I actually miss working in FreeBSD, it's a great OS.
blog comments powered by Disqus

We like








We support


EFF - Electronic Frontier Foundation       TOR - The Onion Router       HRC - Human Rights Campaign