Duplicity + Amazon S3

In this article we will setup automated backup using Duplicity and Amazon S3.

Installation

Run following commands to install latest duplicity and python-boto library (duplicity needs it for Amazon S3).

apt-add-repository ppa:duplicity-team/ppa
apt-get update
apt-get install duplicity
apt-get install python-pip
pip install boto

Amazon S3

S3 Bucket

Create an Amazon S3 bucket to store backup data. If you have multiple server then you may set bucket-name to server’s FQDN.

For tutorial, we have created Amazon S3 buckets in US Standard zone only as we have most servers located on US East Coast. You may need to change Amazon S3 endpoint URL if you pick some other zone. This might help.

API Access

Next create IAM user with API credentials. You will need Access Key Id and Secret Access Key later. Setting up password is not recommended as we will be using these credentials in backup script later on (in plaintext).

As we backup for many client servers, we create one IAM user per server with access to only one S3 bucket. You can generate access policies for IAM user using this amazon article.

In our case, we use following policy (credit to Mike):

{
    "Version":"2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME",
                "arn:aws:s3:::BUCKET_NAME/*"
            ]
        }
    ]
}

Make sure you replace BUCKET_NAME by actual S3 bucket name.

GPG Key (optional)

By default, duplicity uses symmetric encryption on backup. This means using a simple password which is fine in most cases.

Still, if you are paranoid about security, you can add asymmetric i.e. public-private key encryption.

We have a complete tutorial which you can follow to generate GPG keys. Make sure you remember passphrase.

Backup Script

Create Backup Script

Duplicity works with environment variables so we need to create a script to setup correct parameter. Also this script can be used with cron for automated backup.

Create backup script file:

vim /usr/local/sbin/backup.sh

Add following codes to it

#!/bin/bash

# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""
export PASSPHRASE=""

# Your GPG key
GPG_KEY=

# The S3 destination followed by bucket name
DEST="s3://s3.amazonaws.com//"


# Set up some variables for logging
LOGFILE="/var/log/duplicity/backup.log"
DAILYLOGFILE="/var/log/duplicity/backup.daily.log"
FULLBACKLOGFILE="/var/log/duplicity/backup.full.log"
HOST=`hostname`
DATE=`date +%Y-%m-%d`
MAILADDR="[email protected]"
TODAY=$(date +%d%m%Y)

is_running=$(ps -ef | grep duplicity  | grep python | wc -l)

if [ ! -d /var/log/duplicity ];then
    mkdir -p /var/log/duplicity
fi

if [ ! -f $FULLBACKLOGFILE ]; then
    touch $FULLBACKLOGFILE
fi

if [ $is_running -eq 0 ]; then
    # Clear the old daily log file
    cat /dev/null > ${DAILYLOGFILE}

    # Trace function for logging, don't change this
    trace () {
            stamp=`date +%Y-%m-%d_%H:%M:%S`
            echo "$stamp: $*" >> ${DAILYLOGFILE}
    }

    # How long to keep backups for
    OLDER_THAN="1M"

    # The source of your backup
    SOURCE=/

    FULL=
    tail -1 ${FULLBACKLOGFILE} | grep ${TODAY} > /dev/null
    if [ $? -ne 0 -a $(date +%d) -eq 1 ]; then
            FULL=full
    fi;

    trace "Backup for local filesystem started"

    trace "... removing old backups"

    duplicity remove-older-than ${OLDER_THAN} ${DEST} >> ${DAILYLOGFILE} 2>&1

    trace "... backing up filesystem"

    duplicity \
        ${FULL} \
        --encrypt-key=${GPG_KEY} \
        --sign-key=${GPG_KEY} \
        --include=/var/rsnap-mysql \
        --include=/var/www \
        --include=/etc \
        --exclude=/** \
        ${SOURCE} ${DEST} >> ${DAILYLOGFILE} 2>&1

    trace "Backup for local filesystem complete"
    trace "------------------------------------"

    # Send the daily log file by email
    #cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR
    BACKUPSTATUS=`cat "$DAILYLOGFILE" | grep Errors | awk '{ print $2 }'`
    if [ "$BACKUPSTATUS" != "0" ]; then
       cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR
    elif [ "$FULL" = "full" ]; then
        echo "$(date +%d%m%Y_%T) Full Back Done" >> $FULLBACKLOGFILE
    fi

    # Append the daily log file to the main log file
    cat "$DAILYLOGFILE" >> $LOGFILE

    # Reset the ENV variables. Don't need them sitting around
    unset AWS_ACCESS_KEY_ID
    unset AWS_SECRET_ACCESS_KEY
    unset PASSPHRASE
fi

Make sure you substitute correct values for:

  • AWS_ACCESS_KEY_ID  IAM user’s access key ID
  • AWS_SECRET_ACCESS_KEY  IAM user’s secret access key
  • PASSPHRASE – this will be a symmetric encryption password or GPG key passphrase if asymmetric encryption is used
  • DEST="s3://s3.amazonaws.com/example.com/" – AWS S3 region and bucket. example.com is bucket name here.

If you are using GPG then make sure:

  • PASSPHRASE will be GPG Key passphrase
  • Uncomment line #GPG_KEY=KEY_HERE and replace KEY_HERE with actual GPG key
  • Uncomment line  # --encrypt-key=${GPG_KEY} \
  • Uncomment line # --sign-key=${GPG_KEY} \

Setup backup script permission and user:

chown root:root /usr/local/sbin/backup.sh
chmod 0700 /usr/local/sbin/backup.sh

Run Backup

Run backup script to verify if its actually backing up:

bash /usr/local/sbin/backup.sh

Setup Cron

After first backup, only changes are sent.

You can add following line to cron (crontab -e)

0 * * * * /usr/local/sbin/backup.sh

Above cron will run our backup script hourly.

Restore Script

A backup is useless without easy restore support.

Create restore script:

vim /usr/local/sbin/restore.sh

Add following codes to it:

#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="IAM_USER_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="IAM_USER_SECRET_ACCESS_KEY"
export PASSPHRASE="GPG_OR_SOME_OTHER_PASSPHRASE"

# The S3 destination followed by bucket name
DEST="s3://s3.amazonaws.com/example.com/"

# Your GPG key
#GPG_KEY=YOUR_GPG_KEY

if [ $# -lt 3 ]; then echo "Usage $0 <date> <file> <restore-to>"; exit; fi

duplicity \
    --restore-time $1 \
    --file-to-restore $2 \
    ${DEST} $3

# Reset the ENV variables. Don't need them sitting around
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset PASSPHRASE

Setup restore script permission and user:

chown root:root /usr/local/sbin/restore.sh
chmod 0700 /usr/local/sbin/restore.sh

Usage:

restore.sh <date> <file> <restore-to>

It is strongly recommended that you create a new folder for restore location and it must be different than actual file/folder you are trying to restore.

Verify Script

Ideally, there must be a mechanism to check integrity of check periodically. I think duplicity command’s verify option does that. (not sure)

Create verify script:

vim /usr/local/sbin/verify.sh

Add following codes to it:

#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="IAM_USER_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="IAM_USER_SECRET_ACCESS_KEY"
export PASSPHRASE="GPG_OR_SOME_OTHER_PASSPHRASE"

# The S3 destination followed by bucket name
DEST="s3://s3.amazonaws.com/example.com/"

# The source of your backup
SOURCE=/
# Your GPG key
#GPG_KEY=YOUR_GPG_KEY

duplicity verify -v4 ${DEST} ${SOURCE}\
    --include=/var/www \
    --include=/etc \
    --include=/home \
    --include=/root \
    --exclude=/** \
    --exclude=/root/.cache 

# Reset the ENV variables. Don't need them sitting around
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset PASSPHRASE

Setup restore script permission and user:

chown root:root /usr/local/sbin/verify.sh
chmod 0700 /usr/local/sbin/verify.sh

Usage:

verify.sh

TODO

  1. Publish mysqldump script and maybe call it from backup script (msyqldump followed by backup)
  2. Add support for running restore.sh without date. That should restore most recent version of file.
  3. Add a verify script with cronjob. Not sure if it can be used for testing backup’s integrity. It will be good if it emails sysadmin about possible backup corruption.
  4. Tweak file selection to exclude backups and hidden files/folder. Useful reading.
  5. Move setting and unsetting credential to common file OR create a single script file with different functions are backup, restore, verify, etc. I guess this will be done in EasyEngine.

Credits

  1. Backup and restore scripts source
  2. Another useful tutorial
  3. Ubuntu man page helped for verify script