How to Use Rsync to Backup Your Files

September 3, 2021

What Is Rsync

If you've been programming for a while, you may have heard of rsync. It is a tool to transfer and synchronize files in different directories. These directories can be inside the same machine or between two connected machines.

Some of you may wonder, "Well, why can't I just use the copy command cp, that way I don't have to learn a new command?" These two are two different programs. cp copies everything from one location to another, while rsync copies the deltas (the differences) from one location to another.

Suppose that your source directory A contains files totaling 1GB in size. Assume that you also have directory B with the same 1GB of files. Then you add small changes of about 0.1GB. With rsync, you won't have to copy the whole 1.1GB of data from A to B. You will only have to transfer 0.1GB of data. Why copy mostly the same data if you can just copy only the differences? This lets you minimize the network usage, which can be useful if you have a small bandwidth.

Let's jump straight to the code. I would strongly encourage you to code along. I find it more useful when learning a new thing if I actually type the commands. Moreover, don't just type everything you see in this article and stop there. Experiment with these commands. Read the man rsync page. Make variations. Experiment. Do things that I don't mention here. Break things! Just make sure you make a backup first (see what I did there? :D) Only by doing these you'll get the most out of this article.

Basic rsync

At its core, the rsync command looks like this:

rsync source destination

Suppose that you have a file ~/Projects/source/file1.txt that you want to sync to ~/Projects/destination/. Run:

rsync ~/Projects/source/file1.txt ~/Projects/destination/

You should see file1.txt inside destination/ now. Cool! However, practically speaking, you probably don't need to rsync a single file. Rsync is usually used on a directory.

To rsync the source/ directory (including all its files) to the destination/ directory, run:

rsync source/* destination/

Now you will find all the files inside source/ are copied inside the destination/. If you run the command rsync source/* destination/ again without making any changes, rsync won't do anything (there are no deltas).

If you add a file inside source/ that is not yet in destination/, running the rsync command adds that file into destination/.

If you remove a file inside source/ and that file is also inside destination/, running the rsync command will not remove that file from destination/. Rsync by default has an additive nature. To also delete a file in destination/ when the source file is deleted, pass the --delete option.

Finally, if you add a directory inside source/, the rsync command above won't sync the directory (and neither the contents inside that directory). To sync directories within a source directory, you need to use rsync recursively.

Recursive Rsync

The -r option syncs a directory recursively. If your source/ directory contains:

file1.txt
file2.txt
dir1/
dir1/file1.md
dir1/file2.md

Running rsync source/* destination/ won't bring dir1/ (and the files inside it) into the destination/ directory. However, if you run:

rsync -r source/ destination/

Everything will carry over. Neat!

Rsync Archive

If you read online articles about rsync, you will notice that many developers use the -a command (--archive). This is analogous to running rsync -rlptgoD. Whoa, that's a lot of options! Don't worry, let's break it down:

  • -r is recursive, just as you saw above
  • -l copies symlinks and keep them as symlinks
  • -p preserves file permissions / privileges
  • -t preserves time metadata in a file
  • -g preserves a group
  • -o preserves owner (only for super-user)
  • -D preserves device and special files

The big picture is, running rsync -a preserves all the important metadata when transferring files. It is safe to say you will want to run rsync -a 90% of the time.

Rsync and SSH

You can use rsync over a network connection. If you have access to a remote server, you can quickly sync your directory locally with a remote server, vice versa.

Wait a second... doesn't that sound like dropbox? Yup! There are tons of other features that Dropbox has that rsync doesn't, but at the gist of it, dropbox is a fancy and glorified rsync with durability added.

For this section, if you're coding along, I am assuming that you have access to a remote server. If you don't, keep reading but take a mental note. There will probably come a time when you need to do this in the future.

To rsync your source/ directory to the remote server's ~/stash/destination/, run:

rsync -a source/ yourUserName@123.456.788.000:~/stash/destination/

If you store a Host inside SSH config, you can also use that. For example, I have a Host named gc (Google Compute). To sync the Projects/ directory, I can run:

rsync -a ~/Projects gc:~

Notice that I don't have a forward slash after Projects even though it is a directory. When you rsync a directory but you don't pass it a slash, rsync will create a directory with the same name as the source. What this does is it creates a ~/Projects/ directory inside my gc Host.

Here are some options that can be helpful when transferring files over the net:

  • -z to compress data during transfer
  • -v stands for verbose. This will show the outputs of the file transfer
  • -P stands for --partial and --progress: partial creates a partial file, in case a transfer is interrupted and progress shows the file transfer progress. This option is useful for large files.

Btw, did you know that you can pass a command when running rsync?

For example, if I want to rsync only test1.txt, test2.txt, ... test9.txt files from the remote server, I can run:

rsync -avz gc:'`find . -name "*test[0-9].txt"`' ~/Projects/source

The trick here is yourRemoteHost:'YOUR_CMD'. Note the backtick surrounding the find command.

Use this when you need to filter for specific files from a remote host instead of having to manually pick-and-choose the files.

Rsync and Cron

Rsync reminds me of file-backup services like Dropbox. When combined with cron, you can create an automated job to automatically sync data every day, hour, etc.

In Mac, I can edit a cron job with the crontab -e command (yours might be different depending on what OS you have).

To create a multiple backups:

00 */1 * * * rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/hourly
00 17 * * * rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/daily
00 18 * * 5 rsync -a --delete  /Users/iggy/source/ /Users/iggy/backup/weekly
00 19 1 * * rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/monthly_$(date +%Y%m)

This performs 4 backups:

  • an hourly backup
  • a daily backup every day at 5 PM
  • a weekly backup every Friday (day 5) at 6PM
  • a monthly backup on the 1st at 7PM

The first three backups will overwrite the previous backup (it will rsync into the directory with the same name). The monthly backup will have a unique name.

Conclusion

Rsync is a powerful command for creating backups or syncing two directories. If you only need to do a one-time copy, the cp command is probably simpler. But if you need to keep two directories in sync, rsync is a better option.

Rsync and cron are like peanut butter and jelly. Together they let you perform automated backups easily. What other uses of rsync can you think of?

Who knows, maybe in the future you will use rsync to create the next Dropbox rival! When you do, please let me know :).

Until then, happy coding!