Backup automated using rdiff-backup
Introduction
One day your blog, code or pretty much anything may crash, and sadly, your most valuable information could be irredeemably lost ! Consider the consequences if this ever happens (touch wood!). Pictured them? Scary, right? Now, just imagine how relaxed you would have been instead, if only you'd bothered to make a backup.
Today I'm going to show you my personal backup method. I use the awesome rdiff-backup
tool which combines an incremental backup with a mirror.
You can read more about this tool on the official page.
What is it?
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.
Installation
rdiff-backup is available in the most important linux distribution. In my case, I'm using an ArchLinux distributions (Manjaro) and the yay package (Yet another Yogurt - An AUR Helper written in Go) to install the tool.
yay rdiff-backup
If you use another distribution, this software can also be installed:
- Debian
apt-get install rdiff-backup
- Fedora/RedHat
yum install rdiff-backup
Using rdiff-backup
Making backups
Making backups is very easy when you using rdiff-backup
. You may picture this tool as similar to the cp
command. In other words, rdiff-backup
has two arguments:
- source directory.
- target directory.
Both directories can be local or remote disk. For example, if you want to use rdiff-backup
in a local directory you would use the following command:
rdiff-backup source target
rdiff-backup my_personal_directory my_personal_directory_backup
In the same way, if any of the directories are in a remote server, you need only to indicate the path using the classic way: user@server::PATH
. The following commands show how either the remote or local servers can be used in both the source and target directories:
rdiff-backup carloscaballero@guybrush::/docker-volumes/ghost /mnt/backup/carloscaballero
# from the remote machine called guybrush using the user carloscaballero copy the directory /docker-volumes/ghost to the local directory /mnt/backup/carloscballero
rdiff-backup /docker-volumes/ghost carloscaballero@guybrush::/docker-volumes/ghost
# from the local machine copy the directory /docker-volumes/ghost to the remote server guybrush using the user carloscaballero into the directory /docker-volumes/ghost
rdiff-backup carloscaballero@guybrush::/docker-volumes/ghost luisgarcia@lechuck::/docker-volumes/ghost
# from the remote machine called guybrush using the user carloscaballero copy the directory /docker-volumes/ghost to the machine lechuck using the user luisgarcia into the directory /docker-volumes/ghost
When using these commands, the remote machine will probably request the user's password (for the previous commands, carloscaballero
and luisgarcia
respectively). You can omit this step by configuring an SSH Key-Based Authentication on a Linux Server.
The real power of this tool is truly appreciated when wanting to restore the information. If you list the contents of the directory in which you made your copy, you will see the contents that you'd previously copied, and futhermore, you will find a directory named rdiff-backup-data
. This directory is very important, since it stores the incremental backups of our data.
In this directory, the contents shown consist of the last version of our backup, plus the incremental copies, which are stored in the rdiff-backup-data/increments
directory.
Now imagine that I've created a file called file1.txt
which contains a single sentence. A copy is done using rdiff-backup
and, a few minutes after another copy is done. Now, we shown the list of files in our system wich is the following:
|-- prueba
| `-- file1.txt
`-- prueba-backup
|-- file1.txt
`-- rdiff-backup-data
|-- access_control_lists.2019-01-23T11:47:36Z.snapshot
|-- access_control_lists.2019-01-23T11:51:32Z.snapshot
|-- access_control_lists.2019-01-23T11:52:24Z.snapshot
|-- backup.log
|-- chars_to_quote
|-- current_mirror.2019-01-23T11:52:24Z.data
|-- error_log.2019-01-23T11:47:36Z.data
|-- error_log.2019-01-23T11:51:32Z.data
|-- error_log.2019-01-23T11:52:24Z.data
|-- extended_attributes.2019-01-23T11:47:36Z.snapshot
|-- extended_attributes.2019-01-23T11:51:32Z.snapshot
|-- extended_attributes.2019-01-23T11:52:24Z.snapshot
|-- file_statistics.2019-01-23T11:47:36Z.data.gz
|-- file_statistics.2019-01-23T11:51:32Z.data.gz
|-- file_statistics.2019-01-23T11:52:24Z.data.gz
|-- increments
| `-- file1.txt.2019-01-23T11:51:32Z.diff.gz
|-- increments.2019-01-23T11:51:32Z.dir
|-- mirror_metadata.2019-01-23T11:47:36Z.diff
|-- mirror_metadata.2019-01-23T11:51:32Z.diff.gz
|-- mirror_metadata.2019-01-23T11:52:24Z.snapshot.gz
|-- session_statistics.2019-01-23T11:47:36Z.data
|-- session_statistics.2019-01-23T11:51:32Z.data
`-- session_statistics.2019-01-23T11:52:24Z.data
You may note that the file file1.txt
has an incremental copy in the increments
directory.
Restoring backups
We can restore a copy with the rdiff-backup
comand, or by directly using the cp
command, since the copy is neither compressed, nor has any of its metadata altered. Therefore, the files are in the same state as when they were copied. Although, you may use the cp
command, the rdiff-backup
tool is better to use, due to the data restoration being more flexible.
The use of the command for restoring backups is similar to the one to make the backup, with the added the option of (restore-as-of, -r
) , as well as the timestamp to restore. The timestamp is very flexible, since the acceptible time strings are intervals, like "3D64s"; w3-datetime strings, like "2002-04-26T04:22:01-07:00" (strings like "2002-04-26T04:22:01" are also acceptable - rdiff-backup will use the current time zone); or ordinary dates like 2/4/1997 or 2001-04-23 (various combinations are acceptable, bearing in mind that the month must always precede the day).
For example, the following command restores the copy made on 23 January 2010.
rdiff-backup -r 2010-01-23 /directory_where_is_my_backup /directory_where_restore_my_backup
rdiff-backup -r now /directory_where_is_my_backup /directory_where_restore_my_backup # Restore the last backup
Remove old backups
As you already know, the rdiff-backup
command makes an incremental backup, which entails a large amount of space disk being consumed. Therefore, it is highly recommended to remove old backups (as long as you have other, more recent backups, of course).
The rdiff-backup
tool has the remove-older-than
option, which removes any backups older than that the date used in the argument. A good example is removing any backups older than 1 year:
rdiff-backup --remove-older-than 1Y /directory_where_is_my_backup
Filter Options
Most of the time, we are required to include o exclude files to our backup. The most common options which can be used in the rdiff-backup are:
**- include.
- include-file-list
- exclude.
- exclude-file-list**
As well as these, there are plenty more filter options to make our backups, such as:
rdiff-backup --exclude /mnt/backup / /mnt/backup
In this example we exclude /mnt/backup to avoid an infinite loop, even though rdiff-backup can automatically detect simple loops like the one above. This is just an example, in reality it would be important to exclude /proc as well.
Getting information about the backup directory
There may be a time when we need information about the backup (metadata). rdiff-backup
allows us to obtain this information. The most common options for this are the following:
- list-increments
- list-changed-since
- list-at-time
- compare
- compare-at-time
Since they are quite descriptive, it isn't hard to imagine what the goal of each of the different options is. Despite this, I will show several examples applying each of them:
rdiff-backup --list-increments backup_directory/subdirectory # Lists all the files under backup_directory/subdirectory
rdiff-backup -l backup_directory/subdirectory # The following command lists all the files under backup_directory/subdirectory which has changed in the last 5 days.
rdiff-backup --list-changed-since 5D directory/subdirectory # This command lists all the files that were present in directory/subdirectory 5 days ago.
rdiff-backup --list-at-time 5D directory/subdirectory # This command lists all the files that were present in directory/subdirectory 5 days ago.
rdiff-backup --compare in-directory user@host::out-directory # compares the current files in out-directory with the files in in-directory, displaying which ones have changed.
rdiff-backup --compare-at-time 2W in-directory user@host::out-directory # This command is similar but compares in-directory to out-directory as it was 2 weeks ago.
Using in cron
A good practice is automating the backups in our system. To do this, we may use the cron service.
Prior to using cron, we must remember to make sure that the script used in cron doesn't output anything, otherwise:
- cron will assume there is an error
- if there is any error, you will not be able to see it
The command which we used in our script is the following:
#!/bin/bash
. /root/.bashrc
rdiff-backup --force --print-statistics --include-globbing-filelist /root/rdiff-backup-configuration/files_backup.txt / root@brix.qontu.com::/root/backups/carloscaballero.io 2>&1 > /var/log/rdiff-backup.log
rdiff-backup --remove-older-than 1Y root@brix.qontu.com::/root/backups/carloscaballero.io 2>&1 > /var/log/rdiff-backup-remove.log
The content of the files_backup.txt
file is the following:
+ /root/ghost
- **
It is important to know that both success and error logs are saved in the same logfile, named rdiff-backup.log
. Another interesting point is that I've used the filter option include-globbing-filelist
which allows the use of a file as argument. This file contains the directories which will be backed up by using the string +
or -
to express that said directory must be either included or excluded. Note that the backups older than 1 year are deleted to perserve disk space.
Finally, edit the cron file using the crontab -e
command.
0 1 * * * sh /root/rdiff-backup-configuration/rdiff-backup.sh
Conclusions
In this post I've explained the rdiff-backup
tool, which allows us to make incremental backups. I've also shown you the script I use to backup my projects, which is executed by cron one time a day.