UPDATE: Thanks to everyone who has contributed to this, and the Reddit thread, as it has provided some great ideas building off of my concept. I’m starting to rethink about how we could have version control on top of things, and I’ll update things when I have more to share. Also, does anyone have iFolder (thanks for the proper link working? It looks like you need SUSE Linux, which I don’t have access to, plus I know most Novell projects need a *ton* of Mono dependencies installed to have any of their stuff working, at least on the server side; but it sounds like they have Mac, Linux and Windows clients, which is encouraging. While for my needs something a bit more ‘close to the bone’ (as below) might be better for the server side, having it be inter-operable with something like iFolder could provide a lot more functionality for others.
First off, if you haven’t tried Dropbox, you should check it out; sync all of your computers via the Dropbox servers, their basic free service gives you 2Gigs of space and works cross-platform (Windows, Mac, Linux). I use it daily at home and work, and just having a live backup of my main data for my work workstation, my home netbook, and any other computer I need to login to is a huge win. Plus, I have various ‘shared’ folders that distribute certain data to certain users that I’ve granted access to, this means work details can be updated and automatically distributed to the folks I want to review/use the data. I recommend everyone try it out, and see how useful it is, it’s turned into a game changer for me. So a few months ago they made headlines on supporting Linux as they released the client as open source. While this got hopes up for many, it was only the client that was open source, the server is still proprietary. While slightly disappointing, this is fine, they’re a company trying to make money. I don’t fault them for this, it’s just that a free, portable service like that would be a killer app.
Meanwhile at work I’m working on a solution to sync large data clusters online and the project manager described it as the need for ‘Dropbox on steroids’. Before I had thought it was more complicated, but after thinking about it, I realized he was right. Look, Dropbox is a great idea, but it obviously is just a melding of rsync, with something watching for file changes to initiate the sync, along with an easy to use front end. From there I just started looking at ways this could work, and there are more than a few; here’s how I made it work.
Linux now includes inotify, which is a kernel subsystem that provides file system event notification. From there all it took was to find an application that listens to inotify and then kicks off a command when it hears of a change. I tried a few different applications like inocron, inosync and iwatch, before going with lsyncd. While all of them could work, lsyncd seemed to be the most mature, simple to configure and fast. Lsyncd uses inotify to watch a specified directory for any new, edited or removed files or directories, and then calls rsync to take care of business. So let’s get started in making our own open source Dropbox clone with Debian GNU/Linux (lenny)
First, you need 2 severs; one being the server and the other the client. (you could do this on one host if you wanted to see how it works for a proof of concept)
First you’ll need to install OpenSSH Server on the remote system:
apt-get install openssh-server
You’ll need to configure passwordless logins between the two hosts you want to use, this is how rsync will pass the files back and forth. I’ve previously written a HOWTO on this topic, so we’ll crib from there.
First, generate a key:
ssh-keygen -t rsa
UPDATE: actually, it’s easier to do it this way
ssh-keygen -N '' -f ~/.ssh/id_dsa
(Enter)
You shouldn’t have a key stored there yet, but if you do it will prompt you now; make sure you overwrite it.
Enter passphrase (empty for no passphrase):
(Enter)
Enter same passphrase again:
(Enter)
We’re not using passphrases so logins can be automated, this should only be done for scripts or applications that need this functionality, it’s not for logging into servers lazily, and it should not be done as root!
Now, replace REMOTE_SERVER with the hostname or IP that you’re going to call when you SSH to it, and copy the key over to the server:
cat ~/.ssh/id_rsa.pub | ssh REMOTE_SERVER 'cat - >> ~/.ssh/authorized_keys2'
UPDATE: now you can use ssh-copy-id for this instead (hat tip briealeida)
ssh-copy-id REMOTE_SERVER
Set the permissions to a sane level:
ssh REMOTE_SERVER 'chmod 700 .ssh'
Lastly, give it a go to see if it worked:
ssh REMOTE_SERVER
You should be dropped to a prompt on the remote server. If not you may need to redo your .ssh directory, so on both servers:
`mv ~/.ssh ~/.ssh-old`
and goto 10
Next up is to install rsync and lsyncd. First, rsync is simple, and could already be installed (you don’t need to run it as a server, just the client), make sure you have it with:
apt-get install rsync
Next is lsyncd. There is no official Debian package yet, but it’s simple to build from source and install. First off, if you don’t have build essentials you’ll need them, as well as libxml2-dev to build the lsyncd source. Installing those is as simple as:
apt-get install libxml2-dev build-essential
Now we’ll get the lsyncd code (you can check for a newer version at http://lsyncd.googlecode.com) and build that:
wget http://lsyncd.googlecode.com/files/lsyncd-1.26.tar.gz tar -zxf lsyncd-1.26.tar.gz cd lsyncd-1.26 ./configure make; make install
This install does not install the configuration file, so we’ll do that manually now:
cp lsyncd.conf.xml /etc/
Next up, we’ll edit the configuration file now located in /etc The file is a simple, well documented XML file, and mine ended up like so – just be sure to change the source and target hosts and paths to work with your systems:
<lsyncd version="1.25"> <settings> <logfile filename="/var/log/lsyncd"/> <!--Specify the rsync (or other) binary to call--> <binary filename="/usr/bin/rsync"/> <!--uncomment to create a file containing pid of the daemon--> <!--pidfile filename="/tmp/pid"/--> <!--this specifies the arguments handled to the rsync (or other) binary. option is the default literal. only '%r' will be replaced with r when recursive operation is wanted, d when not. exclude file will be replaced with -exclude-from FILE source will be the source path to sync from destination will be the destination path to sync to --> <callopts> <option text="-lt%r"/> <option text="--delete"/> <exclude -file/> <source /> <destination /> </callopts> </settings> <directory> <source path="/var/www/sync_test"/> <target path="desthost::module/"/> <!-- or it can also be an absolute path for localhost <target path="/absolute/path/to/target"> --> </directory> </lsyncd> We’re ready to give it a go, may as well run it in debug for fun and to learn how lsyncd does what it does:
lsyncd --conf /etc/lsyncd.conf.xml --debug
Watch for errors, if none are found, continue.
Now we just need to copy some files into this directory on the source box:
/var/www/sync_test
And again, watch for any errors on the screen, if these come back as a failed connection it’ll be an SSH/key issue, common, and not too difficult to solve. From here add some directories and watch how they’re queued up, and then take a look at them on the remote box: from this point out it “just works”. Now give it more to do by adding files and directories, and then the logging for errors while they sync. As it stands the system uses the source system as the preferred environment, so any files that change, or are added or removed, will be processed on the remote system. This is analogous to how Dropbox works, you can use multiple sources (your laptop, your desktop, etc) and their server serves as the remote system, keeping all the clients in line.
You should now have a basic, working Dropbox style setup for your own personal use. I had this running and used it to sync my netbook back to my home server, and then have my work desktop sync to my home server, so both the netbook and the desktop would stay in sync without me doing anything besides putting files in the specfied folder. For my week long test I ran a directory alongside my Dropbox directory just to see how they both acted, and I didn’t have any failures along the way.
Now we have is a simple Dropbox style app that is lightweight, with a functional back-end running rsync, which is a known stable app that will scale, and while it doesn’t provide the front-end and web view that Dropbox does, that could be an easy part for a UX developer to tackle. The cool thing is, we have a solution that works, and other options like the apps I described in the beginning, can be dropped in and replace the functionality lsyncd provides in case they can do something better. For now, I’m playing around with it to learn the ins and outs of the system to see how it will behave long term under a much larger store (50Gig to start) to keep in check. I will also work on better integrating this solution it into a working system, and update this tread with init scripts, reports, or maybe even a web view beyond just an index view from Apache or nginx. Ideally we could have a web front end that would intelligently report if a file is complete on the server, and if the file is completely mirrored on another server or client. P2P or Bitorrent would also be really cool to consider with this, and I’m sure there will be more applications for a setup like this once we’ve it around as a resource for a time. Can you think of more applications for this? Did you get it to work? Can you think of a better way to do this?
Sorry to leave an unrelated comment, but I couldn't figure out your email; I'm embarrassed to say I spent a good twenty minutes on it and barely got anywhere.
I'm interested in advertising on your site, specifically in this post. Let me know if you're interested.
Hey man awesome how – to, I am trying to do the same thing but on a remote server to win/mac. Most of us are on mac, I dunno what your using, but once you set this up how do you setup the clients? Right now I am rsyncing everything up, but once done I wanna put it all on there. I have found the problem with other services like webdav, etc, they tend to freak out with large files, or large amounts of files…. and everything else is lacking in speed. All help is appreciated.
oh yeah lsync not available in url given btw…..
I too would be very interested in some form of a Windows compatible client.
Have there been any advances in this area?
[...] it yourself by assembling several systems. fak3r has been experimenting with this; you can read his step by step instructions here though I didn’t test them myself as I don’t have a Linux but a Windows system. Any [...]
my ssh key is still asked by lsyncd (?) even if i ssh XXX@YYYY works without any keys
The tweets widget on the left side of your site actually blocks text while you are trying to read it. At least on chrome. Bad usability. Makes article unreadable. Seems like you might have had something interesting to say here. Too bad.
[...] HOWTO build your own open source Dropbox clone « fak3r. Share and Enjoy: [...]
Can you tell me what resolution you're running? For me it's way over to the side, no where near the text, but I guess on lower resolution it could be an issue. Thanks for letting me know!
run `hostname` and see what that says; make sure you use the same hostname when you generate the key, the one that ssh-keygen sees/uses needs to match. If that fails, try again, but have it create a new file like known_hosts3 (or `mv ~/.ssh ~/.ssh-old` and start again. Let me know, sharing SSH keys is tricky but once it works, it makes a lot of things like this possible.
@Aurthur – you can reach me at fak3r at fak3r dot com – sorry for the trouble, it was more for fun than anything else.
1024 x 768, ubuntu, chrome 5.0.x, firefox 3.0.x
Good luck with it.
You can use the 'ssh-copy-id' command to transfer the necessary files for passwordless login instead.
–
Brie
Thanks, I just recently came across this, and if that fails to find an id (an error that I got when I tried with a non-privileged user) you can use the -i switch, so `ssh-copy-id -i ~/.ssh/id_rsa.pub REMOTE_HOST`
And by the way Brie, you have a great site, was checking out many of the articles today, I'm writing up a little BASH script to post system stats to Twitter from some of your ideas. Will post it here in a few…
To that, I'm using Firefox 3.6, Ubuntu, resolution 1600×1200 and the tweets widgets pops right over the content. If it was to the right, instead, I don't think it'd still be an issue (but I didn't even think to open it until I read this so…).
Thanks! I really appreciate that and I've added your feed to Akregator so I'll be keeping an eye out for that.
!
–
Brie
Hi, I like your idea. I'm however trying something different myself: I'll setup a WEBDAV shared folder (hence, available online), connect to that using a standard method (windows: mount the share), and use a tool like 'FreeFileSync' to actively monitor a folder outside the webdav to sync that with the online folder. When an internet connection is not available, I hope FreeFileSync will be smart enough to stop replicating. Basically this works with your implentation of inotify just the same, but then on Linux. The only thing is that I mount the remote folder rather than rsyncing into it.
just my 5ct
I have working iFolder and SImias scripts that build working client and servers for Ubuntu. The server is working on Ubuntu 9.04 and the iFolder client on Ubuntu 9.10. I tried your approach with the scripts running on the server to do backups for the Simias server, and the scripts on the client to keep moving changes into the iFolder for backups to the server end. I am also thinking about the version control issues. I think another layer beyond the Simias server…
2010-04-02
Check out the scripts and join the group at:
http://groups.google.com/group/ifolder-ubuntu-d...
For version control, you could play around with rsnapshot in place of rsync. I believe* rsnapshot uses rsync as its underlying engine so you don't lose any of rsync's proven goodness.
[...] open-source-dropbox-clone HOWTO build your own open source Dropbox clone [...]
I just came across pylsyncd, a python activated version of lsyncd, which brags that,
“Pylsyncd is a python implementation similar to lsyncd that uses rsync to synchronize local directories with several remote machines running rsyncd. Pylsyncd monitors recursively a set of directories using pyinotify, a pure Python module used for monitoring filesystems changes that relies on inotify.
The main advantage of pylsyncd against lsyncd is that it uses message queues in order to synchronize in a parallel way several destination servers, saving up time when it is required to have more than one destination. It has been tested in heavy loaded environments.”
http://iaslanidis.github.com/pylsyncd/
Worth testing, sounds like they've thought up some nice additions.
I just came across pylsyncd, a python activated version of lsyncd, which brags that,
“Pylsyncd is a python implementation similar to lsyncd that uses rsync to synchronize local directories with several remote machines running rsyncd. Pylsyncd monitors recursively a set of directories using pyinotify, a pure Python module used for monitoring filesystems changes that relies on inotify.
The main advantage of pylsyncd against lsyncd is that it uses message queues in order to synchronize in a parallel way several destination servers, saving up time when it is required to have more than one destination. It has been tested in heavy loaded environments.”
http://iaslanidis.github.com/pylsyncd/
Worth testing, sounds like they've thought up some nice additions.
[...] options. Well, there isn’t an out-of-the-box open source equivalent to Dropbox but you could easily build one: Now we have is a simple Dropbox style app that is lightweight, with a functional back-end running [...]
[...] posibilidad de montar un Dropbox “DIY“. La cosa no estaba demasiado accesible, aunque no era imposible, así que me dio perecita ponerme manos a la [...]
Hi Fak3r,
Thank you for posting an excellent guide on creating your own dropbox clone. Though most parts are way over my head, I think i get the general idea.
Would you be interested in taking on some freelance work and putting this system up on my server. We can figure out a price and start. Please let me know what is the best way to contact you because I could not find a contact form or your email address anywhere.
Just got iFolder running recently on Opensuse 11.3. Official support is for Opensuse 11.1 but there are various builds out there. If you ever do get access to Opensuse, you don't actually have to deal with mono dependencies. Just go to software.opensuse.org, select version of Opensuse or even SLES and search for iFolder and install it. This guide is mostly on the ball with what you need: http://www.diwi.nl/node/50
Hmmm…and then I heard about 'MagicFolder' http://pypi.python.org/pypi/MagicFolder
Annoying name, but interesting project…could kick that via inotify, or just via cron for eventual consistency.
Great tutorial fak3r!
But I have problem with one thing only. I have managed to sync 2 local folder.. but no luck with network ..
This part cofuse me the most, the notation of the destination inside lsyncd.conf.xml file:
<target path=”desthost::module/”/>
THIS GOES!
lsyncd ~/WELLBIS/IMAGES/a ~/WELLBIS/IMAGES/aa –debug –no-daemon
THIS NOT!
lsyncd /home/kreso/WELLBIS/IMAGES mk-lap::~/WELLBIS/IMAGES –debug –no-daemon
cheers
Hi fak3r and Mike Chelen. Thanks for this post and all the usefull information in the replies!
I’ve used lsyncd and ZFS-FUSE together to create my own dropbox clone on Ubuntu 10.04. I’ve writen a quick how-to bellow. All the configuration files have been posted to pastebin for easy copy-paste
/etc/lsyncd.conf.xml : http://pastebin.com/4z0Xg2Pk
/usr/local/sbin/lsyncd-execute : http://pastebin.com/y6jQ3EVH
Install ZFS-FUSE
# aptitude install zfs-fuse
Create a file that can be used as a ZFS pool
# dd if=/dev/zero of=/opt/zfsbackupstorage/zfsbackupstorage001 bs=1k count=1000000
This wil make a 100mb empty file
Create a ZFS pool with the name zfsbackup
# zpool create zfsbackup /opt/zfsbackupstorage/zfsbackupstorage001
Set the mountpoint
# zfs set mountpoint=/opt/backup
Check if the ZFS filesystem has been mounted
# df -h
Install lsyncd
First make sure rsync is installed:
# apt-get install rsync
Install build dependencies
# apt-get install libxml2-dev build-essential
Download the source code
# cd /usr/local/src
# wget http://lsyncd.googlecode.com/files/lsyncd-1.37.tar.gz
# tar -xzvf lsyncd-1.37.tar.gz
# cd lsyncd-1.37
# ./configure
# make
# make install
Note: It’s a very small programm, it might look like nothing is happening
This install does not install the configuration file, so we’ll do that manually now:
#cp lsyncd.conf.xml /etc/
Edit /etc/lsyncd.conf.xml
See http://pastebin.com/4z0Xg2Pk
Create the /usr/local/sbin/lsyncd-execute file
# nano /usr/local/sbin/lsyncd-execute
See http://pastebin.com/y6jQ3EVH
Give the /usr/local/sbin/lsyncd-execute script permissions to be executed
# chmod +x /usr/local/sbin/lsyncd-execute
Launch lsyncd in debug for testing
We’re ready to give it a go, may as well run it in debug for fun and to learn how lsyncd does what it does:
# lsyncd –conf /etc/lsyncd.conf.xml –debug
Watch the log for errors, if none are found, launch lsyncd. You can also add lsyncd to /etc/rc.local so it will be started every time your server / desktop starts
Good luck!
Best, Egbert
This would be fantastic if integrated in with encryptions/Amazon S3
Does this solution actually handle deletes, i.e. will a delete be properly propagated across machines? If your solution is similar to a “two-way” rsync, then if the file exists at either end, will it not be synced back if it is only deleted on one machine?
instead of key without passphrases you could use ssh-agent for passwordless login.
[...] HOWTO build your own open source Dropbox clone – fak3r (tags: dropbox linux backup opensource howto) Published: September 2, 2010 Filed Under: Delicious Leave a Comment Name: Required [...]
Great writeup, I’ve been meaning to try out ZFS on Linux via FUSE, formerly ran ZFS on FreeNAS (FreeBSD). I’ve built a new RAID1 array from 2 1TB drives with ext4 on them. They’re the ‘green’ drives that use less power, and they do run cooler than the others, so having redundant storage using less power (considering I was running them in a separate server before it’s really less) was my goal, but having the advantages of ZFS would be the next step. Thanks for the detail!