look out honey 'cause I'm using technology

HOWTO build your own open source Dropbox clone

I KAN HAZ OPEN-SRC DROPBX?UPDATE: Thanks to everyone who has contributed to this, and the Reddit thread, as it has provided some great ideas building off of my concept.  I’m starting to rethink about how we could have version control on top of things, and I’ll update things when I have more to share.  Also, does anyone have iFolder (thanks for the proper linksalubrium) working?  It looks like you need SUSE Linux, which I don’t have access to, plus I know most Novell projects need a *ton* of Mono dependencies installed to have any of their stuff working, at least on the server side; but it sounds like they have Mac, Linux and Windows clients, which is encouraging.  While for my needs something a bit more ‘close to the bone’ (as below) might be better for the server side, having it be inter-operable with something like iFolder could provide a lot more functionality for others.

First off, if you haven’t tried Dropbox, you should check it out; sync all of your computers via the Dropbox servers, their basic free service gives you 2Gigs of space and works cross-platform (Windows, Mac, Linux).  I use it daily at home and work, and just having a live backup of my main data for my work workstation, my home netbook, and any other computer I need to login to is a huge win.  Plus, I have various ‘shared’ folders that distribute certain data to certain users that I’ve granted access to, this means work details can be updated and automatically distributed to the folks I want to review/use the data.  I recommend everyone try it out, and see how useful it is, it’s turned into a game changer for me.  So a few months ago they made headlines on supporting Linux as they released the client as open source. While this got hopes up for many, it was only the client that was open source, the server is still proprietary.  While slightly disappointing, this is fine, they’re a company trying to make money.  I don’t fault them for this, it’s just that a free, portable service like that would be a killer app.

Meanwhile at work I’m working on a solution to sync large data clusters online and the project manager described it as the need for ‘Dropbox on steroids’.  Before I had thought it was more complicated, but after thinking about it, I realized he was right.  Look, Dropbox is a great idea, but it obviously is just a melding of rsync, with something watching for file changes to initiate the sync, along with an easy to use front end.  From there I just started looking at ways this could work, and there are more than a few; here’s how I made it work.

Linux now includes inotify, which is a kernel subsystem that provides file system event notification.  From there all it took was to find an application that listens to inotify and then kicks off a command when it hears of a change.  I tried a few different applications like inocron, inosync and iwatch, before going with lsyncd.   While all of them could work, lsyncd seemed to be the most mature, simple to configure and fast.  Lsyncd uses inotify to watch a specified directory for any new, edited or removed files or directories, and then calls rsync to take care of business.  So let’s get started in making our own open source Dropbox clone with Debian GNU/Linux (lenny)

Ladies and gentlemen, start your engines servers!

First, you need 2 severs; one being the server and the other the client. (you could do this on one host if you wanted to see how it works for a proof of concept)

Install OpenSSH server

First you’ll need to install OpenSSH Server on the remote system:
apt-get install openssh-server

Configure SSH for Passwordless Logins

You’ll need to configure passwordless logins between the two hosts you want to use, this is how rsync will pass the files back and forth.  I’ve previously written a HOWTO on this topic, so we’ll crib from there.

First, generate a key:

ssh-keygen -t rsa

UPDATE: actually, it’s easier to do it this way

ssh-keygen -N '' -f ~/.ssh/id_dsa

(Enter)

You shouldn’t have a key stored there yet, but if you do it will prompt you now; make sure you overwrite it.

Enter passphrase (empty for no passphrase):

(Enter)

Enter same passphrase again:

(Enter)

We’re not using passphrases so logins can be automated, this should only be done for scripts or applications that need this functionality, it’s not for logging into servers lazily, and it should not be done as root!

Now, replace REMOTE_SERVER with the hostname or IP that you’re going to call when you SSH to it, and copy the key over to the server:

cat ~/.ssh/id_rsa.pub | ssh REMOTE_SERVER 'cat - >> ~/.ssh/authorized_keys2'

UPDATE: now you can use ssh-copy-id for this instead (hat tip briealeida)

ssh-copy-id REMOTE_SERVER

Set the permissions to a sane level:

ssh REMOTE_SERVER 'chmod 700 .ssh'

Lastly, give it a go to see if it worked:

ssh REMOTE_SERVER

You should be dropped to a prompt on the remote server. If not you may need to redo your .ssh directory, so on both servers:

`mv ~/.ssh ~/.ssh-old`

and goto 10

Install rsync and lsyncd

Next up is to install rsync and lsyncd.  First, rsync is simple, and could already be installed (you don’t need to run it as a server, just the client), make sure you have it with:

apt-get install rsync

Next is lsyncd.  There is no official Debian package yet, but it’s simple to build from source and install.  First off, if you don’t have build essentials you’ll need them, as well as libxml2-dev to build the lsyncd source.  Installing those is as simple as:

apt-get install libxml2-dev build-essential

Now we’ll get the lsyncd code (you can check for a newer version at http://lsyncd.googlecode.com) and build that:

wget http://lsyncd.googlecode.com/files/lsyncd-1.26.tar.gz
tar -zxf lsyncd-1.26.tar.gz
cd lsyncd-1.26
./configure
make; make install

This install does not install the configuration file, so we’ll do that manually now:

cp lsyncd.conf.xml /etc/

Configure lsyncd

Next up, we’ll edit the configuration file now located in /etc  The file is a simple, well documented XML file, and mine ended up like so – just be sure to change the source and target hosts and paths to work with your systems:

<lsyncd version="1.25">     
    <settings> 	
        <logfile filename="/var/log/lsyncd"/>  	
        <!--Specify the rsync (or other) binary to call--> 	
        <binary filename="/usr/bin/rsync"/>  	
        <!--uncomment to create a file containing pid of the daemon--> 	
        <!--pidfile      filename="/tmp/pid"/-->  	
        <!--this specifies the arguments handled to the rsync (or other) binary. 
        option is the default literal. only '%r' will be replaced with r when recursive
	operation is wanted, d when not. exclude file will be replaced with -exclude-from FILE 
	source will be the source path to sync from destination will be the
        destination path to sync to --> 	
        <callopts> 		
            <option text="-lt%r"/> 		
            <option text="--delete"/> 		
            <exclude -file/> 		
        <source /> 		
        <destination /> 	
        </callopts>     
    </settings> 
    <directory> 	
        <source path="/var/www/sync_test"/> 	
        <target path="desthost::module/"/> 	
        <!--  	or it can also be an absolute path for localhost  	
        <target path="/absolute/path/to/target"> 	--> 
    </directory> 
</lsyncd> 

Launch lsyncd in debug for testing

We’re ready to give it a go, may as well run it in debug for fun and to learn how lsyncd does what it does:

lsyncd --conf /etc/lsyncd.conf.xml --debug

Watch for errors, if none are found, continue.

Add files and watch them sync

Now we just need to copy some files into this directory on the source box:

/var/www/sync_test

And again, watch for any errors on the screen, if these come back as a failed connection it’ll be an SSH/key issue, common, and not too difficult to solve. From here add some directories and watch how they’re queued up, and then take a look at them on the remote box: from this point out it “just works”. Now give it more to do by adding files and directories, and then the logging for errors while they sync. As it stands the system uses the source system as the preferred environment, so any files that change, or are added or removed, will be processed on the remote system. This is analogous to how Dropbox works, you can use multiple sources (your laptop, your desktop, etc) and their server serves as the remote system, keeping all the clients in line.

Conclusion

You should now have a basic, working Dropbox style setup for your own personal use. I had this running and used it to sync my netbook back to my home server, and then have my work desktop sync to my home server, so both the netbook and the desktop would stay in sync without me doing anything besides putting files in the specfied folder. For my week long test I ran a directory alongside my Dropbox directory just to see how they both acted, and I didn’t have any failures along the way.

Now we have is a simple Dropbox style app that is lightweight, with a functional back-end running rsync, which is a known stable app that will scale, and while it doesn’t provide the front-end and web view that Dropbox does, that could be an easy part for a UX developer to tackle. The cool thing is, we have a solution that works, and other options like the apps I described in the beginning, can be dropped in and replace the functionality lsyncd provides in case they can do something better. For now, I’m playing around with it to learn the ins and outs of the system to see how it will behave long term under a much larger store (50Gig to start) to keep in check. I will also work on better integrating this solution it into a working system, and update this tread with init scripts, reports, or maybe even a web view beyond just an index view from Apache or nginx. Ideally we could have a web front end that would intelligently report if a file is complete on the server, and if the file is completely mirrored on another server or client. P2P or Bitorrent would also be really cool to consider with this, and I’m sure there will be more applications for a setup like this once we’ve it around as a resource for a time. Can you think of more applications for this? Did you get it to work? Can you think of a better way to do this?

    

View Comments to “HOWTO build your own open source Dropbox clone”

  1. Arthur Czuma says:

    Sorry to leave an unrelated comment, but I couldn't figure out your email; I'm embarrassed to say I spent a good twenty minutes on it and barely got anywhere.

    I'm interested in advertising on your site, specifically in this post. Let me know if you're interested.

  2. chongopants says:

    Hey man awesome how – to, I am trying to do the same thing but on a remote server to win/mac. Most of us are on mac, I dunno what your using, but once you set this up how do you setup the clients? Right now I am rsyncing everything up, but once done I wanna put it all on there. I have found the problem with other services like webdav, etc, they tend to freak out with large files, or large amounts of files…. and everything else is lacking in speed. All help is appreciated.

  3. chongopants says:

    oh yeah lsync not available in url given btw…..

  4. I too would be very interested in some form of a Windows compatible client.
    Have there been any advances in this area?

  5. [...] it yourself by assembling several systems. fak3r has been experimenting with this; you can read his step by step instructions here though I didn’t test them myself as I don’t have a Linux but a Windows system. Any [...]

  6. tomi says:

    my ssh key is still asked by lsyncd (?) even if i ssh XXX@YYYY works without any keys

  7. J Withers says:

    The tweets widget on the left side of your site actually blocks text while you are trying to read it. At least on chrome. Bad usability. Makes article unreadable. Seems like you might have had something interesting to say here. Too bad.

  8. [...] HOWTO build your own open source Dropbox clone « fak3r. Share and Enjoy: [...]

  9. fak3r says:

    Can you tell me what resolution you're running? For me it's way over to the side, no where near the text, but I guess on lower resolution it could be an issue. Thanks for letting me know!

  10. fak3r says:

    run `hostname` and see what that says; make sure you use the same hostname when you generate the key, the one that ssh-keygen sees/uses needs to match. If that fails, try again, but have it create a new file like known_hosts3 (or `mv ~/.ssh ~/.ssh-old` and start again. Let me know, sharing SSH keys is tricky but once it works, it makes a lot of things like this possible.

  11. fak3r says:

    @Aurthur – you can reach me at fak3r at fak3r dot com – sorry for the trouble, it was more for fun than anything else.

  12. J Withers says:

    1024 x 768, ubuntu, chrome 5.0.x, firefox 3.0.x

    Good luck with it.

  13. briealeida says:

    You can use the 'ssh-copy-id' command to transfer the necessary files for passwordless login instead.


    Brie

  14. fak3r says:

    Thanks, I just recently came across this, and if that fails to find an id (an error that I got when I tried with a non-privileged user) you can use the -i switch, so `ssh-copy-id -i ~/.ssh/id_rsa.pub REMOTE_HOST`

    And by the way Brie, you have a great site, was checking out many of the articles today, I'm writing up a little BASH script to post system stats to Twitter from some of your ideas. Will post it here in a few…

  15. briealeida says:

    To that, I'm using Firefox 3.6, Ubuntu, resolution 1600×1200 and the tweets widgets pops right over the content. If it was to the right, instead, I don't think it'd still be an issue (but I didn't even think to open it until I read this so…).

  16. briealeida says:

    Thanks! I really appreciate that and I've added your feed to Akregator so I'll be keeping an eye out for that. :) !


    Brie

  17. Hi, I like your idea. I'm however trying something different myself: I'll setup a WEBDAV shared folder (hence, available online), connect to that using a standard method (windows: mount the share), and use a tool like 'FreeFileSync' to actively monitor a folder outside the webdav to sync that with the online folder. When an internet connection is not available, I hope FreeFileSync will be smart enough to stop replicating. Basically this works with your implentation of inotify just the same, but then on Linux. The only thing is that I mount the remote folder rather than rsyncing into it.

    just my 5ct

  18. I have working iFolder and SImias scripts that build working client and servers for Ubuntu. The server is working on Ubuntu 9.04 and the iFolder client on Ubuntu 9.10. I tried your approach with the scripts running on the server to do backups for the Simias server, and the scripts on the client to keep moving changes into the iFolder for backups to the server end. I am also thinking about the version control issues. I think another layer beyond the Simias server…
    2010-04-02
    Check out the scripts and join the group at:
    http://groups.google.com/group/ifolder-ubuntu-d...

  19. phil_s says:

    For version control, you could play around with rsnapshot in place of rsync. I believe* rsnapshot uses rsync as its underlying engine so you don't lose any of rsync's proven goodness.

  20. [...] open-source-dropbox-clone HOWTO build your own open source Dropbox clone [...]

  21. fak3r says:

    I just came across pylsyncd, a python activated version of lsyncd, which brags that,

    “Pylsyncd is a python implementation similar to lsyncd that uses rsync to synchronize local directories with several remote machines running rsyncd. Pylsyncd monitors recursively a set of directories using pyinotify, a pure Python module used for monitoring filesystems changes that relies on inotify.

    The main advantage of pylsyncd against lsyncd is that it uses message queues in order to synchronize in a parallel way several destination servers, saving up time when it is required to have more than one destination. It has been tested in heavy loaded environments.”
    http://iaslanidis.github.com/pylsyncd/

    Worth testing, sounds like they've thought up some nice additions.

  22. fak3r says:

    I just came across pylsyncd, a python activated version of lsyncd, which brags that,

    “Pylsyncd is a python implementation similar to lsyncd that uses rsync to synchronize local directories with several remote machines running rsyncd. Pylsyncd monitors recursively a set of directories using pyinotify, a pure Python module used for monitoring filesystems changes that relies on inotify.

    The main advantage of pylsyncd against lsyncd is that it uses message queues in order to synchronize in a parallel way several destination servers, saving up time when it is required to have more than one destination. It has been tested in heavy loaded environments.”
    http://iaslanidis.github.com/pylsyncd/

    Worth testing, sounds like they've thought up some nice additions.

  23. [...] options. Well, there isn’t an out-of-the-box open source equivalent to Dropbox but you could easily build one: Now we have is a simple Dropbox style app that is lightweight, with a functional back-end running [...]

  24. [...] posibilidad de montar un Dropbox “DIY“. La cosa no estaba demasiado accesible, aunque no era imposible, así que me dio perecita ponerme manos a la [...]

  25. Jaykumar1 says:

    Hi Fak3r,

    Thank you for posting an excellent guide on creating your own dropbox clone. Though most parts are way over my head, I think i get the general idea.

    Would you be interested in taking on some freelance work and putting this system up on my server. We can figure out a price and start. Please let me know what is the best way to contact you because I could not find a contact form or your email address anywhere.

  26. Eric says:

    Just got iFolder running recently on Opensuse 11.3. Official support is for Opensuse 11.1 but there are various builds out there. If you ever do get access to Opensuse, you don't actually have to deal with mono dependencies. Just go to software.opensuse.org, select version of Opensuse or even SLES and search for iFolder and install it. This guide is mostly on the ball with what you need: http://www.diwi.nl/node/50

  27. fak3r says:

    Hmmm…and then I heard about 'MagicFolder' http://pypi.python.org/pypi/MagicFolder

    Annoying name, but interesting project…could kick that via inotify, or just via cron for eventual consistency.

  28. gorostas says:

    Great tutorial fak3r!

    But I have problem with one thing only. I have managed to sync 2 local folder.. but no luck with network ..
    This part cofuse me the most, the notation of the destination inside lsyncd.conf.xml file:

    <target path=”desthost::module/”/>

    THIS GOES!

    lsyncd ~/WELLBIS/IMAGES/a ~/WELLBIS/IMAGES/aa –debug –no-daemon

    THIS NOT!

    lsyncd /home/kreso/WELLBIS/IMAGES mk-lap::~/WELLBIS/IMAGES –debug –no-daemon

    cheers

  29. Egbert Pot says:

    Hi fak3r and Mike Chelen. Thanks for this post and all the usefull information in the replies!

    I’ve used lsyncd and ZFS-FUSE together to create my own dropbox clone on Ubuntu 10.04. I’ve writen a quick how-to bellow. All the configuration files have been posted to pastebin for easy copy-paste

    /etc/lsyncd.conf.xml : http://pastebin.com/4z0Xg2Pk
    /usr/local/sbin/lsyncd-execute : http://pastebin.com/y6jQ3EVH

    Install ZFS-FUSE
    # aptitude install zfs-fuse

    Create a file that can be used as a ZFS pool

    # dd if=/dev/zero of=/opt/zfsbackupstorage/zfsbackupstorage001 bs=1k count=1000000

    This wil make a 100mb empty file

    Create a ZFS pool with the name zfsbackup

    # zpool create zfsbackup /opt/zfsbackupstorage/zfsbackupstorage001

    Set the mountpoint

    # zfs set mountpoint=/opt/backup

    Check if the ZFS filesystem has been mounted

    # df -h

    Install lsyncd

    First make sure rsync is installed:

    # apt-get install rsync

    Install build dependencies

    # apt-get install libxml2-dev build-essential

    Download the source code

    # cd /usr/local/src

    # wget http://lsyncd.googlecode.com/files/lsyncd-1.37.tar.gz

    # tar -xzvf lsyncd-1.37.tar.gz

    # cd lsyncd-1.37

    # ./configure

    # make

    # make install

    Note: It’s a very small programm, it might look like nothing is happening

    This install does not install the configuration file, so we’ll do that manually now:

    #cp lsyncd.conf.xml /etc/

    Edit /etc/lsyncd.conf.xml

    See http://pastebin.com/4z0Xg2Pk

    Create the /usr/local/sbin/lsyncd-execute file

    # nano /usr/local/sbin/lsyncd-execute

    See http://pastebin.com/y6jQ3EVH

    Give the /usr/local/sbin/lsyncd-execute script permissions to be executed

    # chmod +x /usr/local/sbin/lsyncd-execute

    Launch lsyncd in debug for testing

    We’re ready to give it a go, may as well run it in debug for fun and to learn how lsyncd does what it does:

    # lsyncd –conf /etc/lsyncd.conf.xml –debug

    Watch the log for errors, if none are found, launch lsyncd. You can also add lsyncd to /etc/rc.local so it will be started every time your server / desktop starts

    Good luck!

    Best, Egbert

  30. Michael Langford says:

    This would be fantastic if integrated in with encryptions/Amazon S3

  31. Marius Kjeldahl says:

    Does this solution actually handle deletes, i.e. will a delete be properly propagated across machines? If your solution is similar to a “two-way” rsync, then if the file exists at either end, will it not be synced back if it is only deleted on one machine?

  32. Lauri says:

    instead of key without passphrases you could use ssh-agent for passwordless login.

  33. [...] HOWTO build your own open source Dropbox clone – fak3r (tags: dropbox linux backup opensource howto) Published: September 2, 2010 Filed Under: Delicious Leave a Comment Name: Required [...]

  34. fak3r says:

    Great writeup, I’ve been meaning to try out ZFS on Linux via FUSE, formerly ran ZFS on FreeNAS (FreeBSD). I’ve built a new RAID1 array from 2 1TB drives with ext4 on them. They’re the ‘green’ drives that use less power, and they do run cooler than the others, so having redundant storage using less power (considering I was running them in a separate server before it’s really less) was my goal, but having the advantages of ZFS would be the next step. Thanks for the detail!

Leave a Reply

blog comments powered by Disqus