Le weblog entièrement nu

Roland, entièrement nu... de temps en temps.

Rsyncing a BackupPC storage pool, efficiently

BackupPC is a pretty good backup system. Its configuration is rather flexible, it has nice expiry policies, and it can store duplicated file contents only once (for files that are shared across hosts or don't change in time) within a compressed pool of data. However, it doesn't do much to help pushing the data to off-site storage, or at least not very efficiently. So if you have a BackupPC instance running on a Raspberry Pi or a plug computer at home, it's a bit tricky to protect your data against loss due to burglary or home fire.

The obvious solution would be to rsync the storage pool to a remote site. However, the current pooling system relies heavily on hardlinks, and rsync is notoriously inefficient with those. In the home backup server scenario, this means that even if the computer is more powerful than a Pi and can handle the memory requirements of rsync, you'll often end up transferring way too much data.

So, since the obvious solution doesn't work straight away, what do we do? Why, we fix it, of course. With a little look into the storage pool, we notice that the bulk of the data is stored in files with an “abstract“ name (related to the contents) within a $prefix/pool directory; the files with concrete names looking much like their original are stored within $prefix/pc, and they're actually the same files because they're hardlinks. Knowing this (that rsync doesn't), we can make a smarter replication tool, by

  1. pushing only the pool with standard rsync;
  2. storing locally, and recreating remotely, the structure of hardlinks;
  3. pushing everything again with standard rsync.

Steps 1 and 3 are simple invocations of rsync -aH; step 2 can be implemented using the following two scripts. Run store-hardlinks.pl locally, push the links file, then run restore-hardlinks.pl on the remote server. This will ensure that files already present in the pool are also hardlinked in their natural location.

store-hardlinks.pl:

#! /usr/bin/perl -w

use strict;
use Storable qw(nstore);
use File::Find;

use vars qw/$prefix $poolpath $pcpath %i2cpool %todo $store/;

$prefix = '/var/lib/backuppc';

$poolpath = '$prefix/cpool';
$pcpath = '$prefix/pc';
$store = '$prefix/links';

# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;

# Scan pool
File::Find::find({wanted => \&wanted_pool}, $poolpath);

# Scan PC dirs
File::Find::find({wanted => \&wanted_pc}, $pcpath);

nstore \%todo, $store;
exit;

sub wanted_pc {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
      -f _ &&
      ($nlink > 1) &&
      do {
      $name =~ s,$pcpath/,,;
      if (defined $i2cpool{$ino}) {
      $todo{$name} = $i2cpool{$ino};
      }
    }
}

sub wanted_pool {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
      -f _ &&
      ($nlink > 1) &&
      do {
      $name =~ s,$poolpath/,,;
      $i2cpool{$ino} = $name;
    }
}

restore-hardlinks.pl:

#! /usr/bin/perl -w

use strict;
use Storable;
use File::Path qw/make_path/;

use vars qw/$prefix $poolpath $pcpath %todo $store/;

$prefix = '/srv/backuppc-mirror';

$poolpath = "$prefix/cpool";
$pcpath = "$prefix/pc";
$store = "$prefix/links";

%todo = %{retrieve ($store)};

my ($dev,$ino,$mode,$nlink,$uid,$gid);

foreach my $src (keys %todo) {
    my $inode;
    my $dest = $todo{$src};
    my $dpath = "$poolpath/$dest";
    my $spath = "$pcpath/$src";
    my $sdir = $spath;
    $sdir =~ s,/[^/]*?$,,;
    make_path ($sdir);
    next unless -e $dpath;
    if (! -e $spath) {
      link $dpath, $spath;
      next;
    }
    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($spath));
    $inode = $ino;
    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($dpath));
    if ($ino != $inode) {
      unlink $spath;
      link $dpath, $spath;
    }
}

The initial transfer can still take forever if the pool is large (and if you're pushing it through the small end of an ADSL link…), but at least the files are only transferred once.

Note: This is only useful for current versions of BackupPC. Apparently BackupPC 4 will have a different pooling system without hardlinks, and the following hack will no longer be required. For now, though, here it is.

Tags:
Creative Commons License Sauf indication contraire, le contenu de ce site est mis à disposition sous un contrat Creative Commons.