In this article we will setup automated backup using Duplicity and Amazon S3.
Installation
Run following commands to install latest duplicity and python-boto library (duplicity needs it for Amazon S3).
apt-add-repository ppa:duplicity-team/ppa apt-get update apt-get install duplicity apt-get install python-pip pip install boto
Amazon S3
S3 Bucket
Create an Amazon S3 bucket to store backup data. If you have multiple server then you may set bucket-name to server’s FQDN.
For tutorial, we have created Amazon S3 buckets in US Standard zone only as we have most servers located on US East Coast. You may need to change Amazon S3 endpoint URL if you pick some other zone. This might help.
API Access
Next create IAM user with API credentials. You will need Access Key Id and Secret Access Key later. Setting up password is not recommended as we will be using these credentials in backup script later on (in plaintext).
As we backup for many client servers, we create one IAM user per server with access to only one S3 bucket. You can generate access policies for IAM user using this amazon article.
In our case, we use following policy (credit to Mike):
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:ListAllMyBuckets", "Resource": "arn:aws:s3:::*" }, { "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::BUCKET_NAME", "arn:aws:s3:::BUCKET_NAME/*" ] } ] }
Make sure you replace BUCKET_NAME by actual S3 bucket name.
GPG Key (optional)
By default, duplicity uses symmetric encryption on backup. This means using a simple password which is fine in most cases.
Still, if you are paranoid about security, you can add asymmetric i.e. public-private key encryption.
We have a complete tutorial which you can follow to generate GPG keys. Make sure you remember passphrase.
Backup Script
Create Backup Script
Duplicity works with environment variables so we need to create a script to setup correct parameter. Also this script can be used with cron for automated backup.
Create backup script file:
vim /usr/local/sbin/backup.sh
Add following codes to it
#!/bin/bash # Export some ENV variables so you don't have to type anything export AWS_ACCESS_KEY_ID="" export AWS_SECRET_ACCESS_KEY="" export PASSPHRASE="" # Your GPG key GPG_KEY= # The S3 destination followed by bucket name DEST="s3://s3.amazonaws.com//" # Set up some variables for logging LOGFILE="/var/log/duplicity/backup.log" DAILYLOGFILE="/var/log/duplicity/backup.daily.log" FULLBACKLOGFILE="/var/log/duplicity/backup.full.log" HOST=`hostname` DATE=`date +%Y-%m-%d` MAILADDR="[email protected]" TODAY=$(date +%d%m%Y) is_running=$(ps -ef | grep duplicity | grep python | wc -l) if [ ! -d /var/log/duplicity ];then mkdir -p /var/log/duplicity fi if [ ! -f $FULLBACKLOGFILE ]; then touch $FULLBACKLOGFILE fi if [ $is_running -eq 0 ]; then # Clear the old daily log file cat /dev/null > ${DAILYLOGFILE} # Trace function for logging, don't change this trace () { stamp=`date +%Y-%m-%d_%H:%M:%S` echo "$stamp: $*" >> ${DAILYLOGFILE} } # How long to keep backups for OLDER_THAN="1M" # The source of your backup SOURCE=/ FULL= tail -1 ${FULLBACKLOGFILE} | grep ${TODAY} > /dev/null if [ $? -ne 0 -a $(date +%d) -eq 1 ]; then FULL=full fi; trace "Backup for local filesystem started" trace "... removing old backups" duplicity remove-older-than ${OLDER_THAN} ${DEST} >> ${DAILYLOGFILE} 2>&1 trace "... backing up filesystem" duplicity \ ${FULL} \ --encrypt-key=${GPG_KEY} \ --sign-key=${GPG_KEY} \ --include=/var/rsnap-mysql \ --include=/var/www \ --include=/etc \ --exclude=/** \ ${SOURCE} ${DEST} >> ${DAILYLOGFILE} 2>&1 trace "Backup for local filesystem complete" trace "------------------------------------" # Send the daily log file by email #cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR BACKUPSTATUS=`cat "$DAILYLOGFILE" | grep Errors | awk '{ print $2 }'` if [ "$BACKUPSTATUS" != "0" ]; then cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR elif [ "$FULL" = "full" ]; then echo "$(date +%d%m%Y_%T) Full Back Done" >> $FULLBACKLOGFILE fi # Append the daily log file to the main log file cat "$DAILYLOGFILE" >> $LOGFILE # Reset the ENV variables. Don't need them sitting around unset AWS_ACCESS_KEY_ID unset AWS_SECRET_ACCESS_KEY unset PASSPHRASE fi
Make sure you substitute correct values for:
AWS_ACCESS_KEY_ID
IAM user’s access key IDAWS_SECRET_ACCESS_KEY
IAM user’s secret access keyPASSPHRASE
– this will be a symmetric encryption password or GPG key passphrase if asymmetric encryption is usedDEST="s3://s3.amazonaws.com/example.com/"
– AWS S3 region and bucket. example.com is bucket name here.
If you are using GPG then make sure:
PASSPHRASE
will be GPG Key passphrase- Uncomment line
#GPG_KEY=KEY_HERE
and replaceKEY_HERE
with actual GPG key - Uncomment line
# --encrypt-key=${GPG_KEY} \
- Uncomment line
# --sign-key=${GPG_KEY} \
Setup backup script permission and user:
chown root:root /usr/local/sbin/backup.sh chmod 0700 /usr/local/sbin/backup.sh
Run Backup
Run backup script to verify if its actually backing up:
bash /usr/local/sbin/backup.sh
Setup Cron
After first backup, only changes are sent.
You can add following line to cron (crontab -e
)
0 * * * * /usr/local/sbin/backup.sh
Above cron will run our backup script hourly.
Restore Script
A backup is useless without easy restore support.
Create restore script:
vim /usr/local/sbin/restore.sh
Add following codes to it:
#!/bin/bash # Export some ENV variables so you don't have to type anything export AWS_ACCESS_KEY_ID="IAM_USER_ACCESS_KEY_ID" export AWS_SECRET_ACCESS_KEY="IAM_USER_SECRET_ACCESS_KEY" export PASSPHRASE="GPG_OR_SOME_OTHER_PASSPHRASE" # The S3 destination followed by bucket name DEST="s3://s3.amazonaws.com/example.com/" # Your GPG key #GPG_KEY=YOUR_GPG_KEY if [ $# -lt 3 ]; then echo "Usage $0 <date> <file> <restore-to>"; exit; fi duplicity \ --restore-time $1 \ --file-to-restore $2 \ ${DEST} $3 # Reset the ENV variables. Don't need them sitting around unset AWS_ACCESS_KEY_ID unset AWS_SECRET_ACCESS_KEY unset PASSPHRASE
Setup restore script permission and user:
chown root:root /usr/local/sbin/restore.sh chmod 0700 /usr/local/sbin/restore.sh
Usage:
restore.sh <date> <file> <restore-to>
It is strongly recommended that you create a new folder for restore location and it must be different than actual file/folder you are trying to restore.
Verify Script
Ideally, there must be a mechanism to check integrity of check periodically. I think duplicity command’s verify option does that. (not sure)
Create verify script:
vim /usr/local/sbin/verify.sh
Add following codes to it:
#!/bin/bash # Export some ENV variables so you don't have to type anything export AWS_ACCESS_KEY_ID="IAM_USER_ACCESS_KEY_ID" export AWS_SECRET_ACCESS_KEY="IAM_USER_SECRET_ACCESS_KEY" export PASSPHRASE="GPG_OR_SOME_OTHER_PASSPHRASE" # The S3 destination followed by bucket name DEST="s3://s3.amazonaws.com/example.com/" # The source of your backup SOURCE=/ # Your GPG key #GPG_KEY=YOUR_GPG_KEY duplicity verify -v4 ${DEST} ${SOURCE}\ --include=/var/www \ --include=/etc \ --include=/home \ --include=/root \ --exclude=/** \ --exclude=/root/.cache # Reset the ENV variables. Don't need them sitting around unset AWS_ACCESS_KEY_ID unset AWS_SECRET_ACCESS_KEY unset PASSPHRASE
Setup restore script permission and user:
chown root:root /usr/local/sbin/verify.sh chmod 0700 /usr/local/sbin/verify.sh
Usage:
verify.sh
TODO
- Publish mysqldump script and maybe call it from backup script (msyqldump followed by backup)
- Add support for running
restore.sh
without date. That should restore most recent version of file. - Add a verify script with cronjob. Not sure if it can be used for testing backup’s integrity. It will be good if it emails sysadmin about possible backup corruption.
- Tweak file selection to exclude backups and hidden files/folder. Useful reading.
- Move setting and unsetting credential to common file OR create a single script file with different functions are backup, restore, verify, etc. I guess this will be done in EasyEngine.
Credits
- Backup and restore scripts source
- Another useful tutorial
- Ubuntu man page helped for verify script