Slide 1

Slide 1 text

bup: Git for backups #bup #28c3 1 / 26

Slide 2

Slide 2 text

Zoran Zari´ c @zoranzaric Computer Science student at TU Darmstadt bup since April 2010 2 / 26

Slide 3

Slide 3 text

toc 1. Motivation 2. Git backgrounds 3. bup 3.1 Features 3.2 Algorithms & data structures 3 / 26

Slide 4

Slide 4 text

Motivation Space efficiency of backups Convenient access to backups Safety against bitrot, filesystem-, and media errors Safety against history changes 4 / 26

Slide 5

Slide 5 text

Git 5 / 26

Slide 6

Slide 6 text

Git Distributed version control system 5 / 26

Slide 7

Slide 7 text

Git Distributed version control system Content addressed 5 / 26

Slide 8

Slide 8 text

Git Distributed version control system Content addressed Immutable objects 5 / 26

Slide 9

Slide 9 text

Git Distributed version control system Content addressed Immutable objects Snapshot- instead of diff-based 5 / 26

Slide 10

Slide 10 text

Git: A Repository 6 / 26

Slide 11

Slide 11 text

Git: A Repository BLOBs e69de29 Hello World 6 / 26

Slide 12

Slide 12 text

Git: A Repository BLOBs e69de29 Trees 82e3a75 100644 blob 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689 README 100644 blob 39c8418e04721b9a30232ce754cac8d9ee78340a DESIGN 040000 tree 482fa65ae85c1e5bca8c091b479de60b714a4b6a src 6 / 26

Slide 13

Slide 13 text

Git: A Repository BLOBs e69de29 Trees 82e3a75 Commits 3dfe461f tree a3d703e579dc9baae20456eb63fa49f5e4e7c9b4 author Zoran Zaric 1314498536 +0200 committer Zoran Zaric 1314498536 +0200 Example commit 6 / 26

Slide 14

Slide 14 text

Git: A Repository BLOBs e69de29 Trees 82e3a75 Commits 3dfe461f Tags & Branches v0.1 master 63866463d511a245a55a57ca48efe8e67b955dec 6 / 26

Slide 15

Slide 15 text

Git: A Repository BLOBs e69de29 Trees 82e3a75 Commits 3dfe461f Tags & Branches v0.1 master 3dfe461f 82e3a75 e69de29 25b2be3 78af04f 41c28e8 master v1.0 6 / 26

Slide 16

Slide 16 text

Git: A Repository Packfiles e69de29 82e3a75 3dfe461f 41c28e8 78af04f 25b2be3 7 / 26

Slide 17

Slide 17 text

Git: Problems 8 / 26

Slide 18

Slide 18 text

Git: Problems Slow & memory-hungry for bigger files 8 / 26

Slide 19

Slide 19 text

Git: Problems Slow & memory-hungry for bigger files No meta data (permissions, owners, ACLs) 8 / 26

Slide 20

Slide 20 text

bup Avery Pennarun (git subtree, sshuttle, redo) https://github.com/apenwarr/bup http://groups.google.com/group/bup-list 9 / 26

Slide 21

Slide 21 text

bup: Installation $ sudo apt-get install python2.6-dev python-fuse $ sudo apt-get install python-pyxattr python-pylibacl $ mkdir ~/src && cd ~/src $ git clone https://github.com/apenwarr/bup.git $ cd bup $ make $ make test $ sudo make install 10 / 26

Slide 22

Slide 22 text

bup: Examples $ bup index -ux /home/zz $ bup save -n laptop /home/zz $ bup save -r myserver -n laptop /home/zz $ bup on myserver index -ux /home/zz $ bup on myserver save -n server /home/zz $ bup ls laptop/latest/home/zz 11 / 26

Slide 23

Slide 23 text

bup: Features Deduplication (http://goo.gl/aBpny) 12 / 26

Slide 24

Slide 24 text

bup: Features Deduplication (http://goo.gl/aBpny) Benchmark with two servers and a pseudo vm image on them with little changes rsnapshot: 4.97G bup: 2.18G 12 / 26

Slide 25

Slide 25 text

bup: Features Deduplication (http://goo.gl/aBpny) Benchmark with two servers and a pseudo vm image on them with little changes rsnapshot: 4.97G bup: 2.18G Import of rsnapshot backups to bup rsnapshot: 12.6G bup: 4.6G 12 / 26

Slide 26

Slide 26 text

bup: Features Meta data (almost done) Owner Exakt times Permissions Extended ACLs SELinux 13 / 26

Slide 27

Slide 27 text

bup: Features FUSE module You can mount your backups and browse them with your favorite filemanager 14 / 26

Slide 28

Slide 28 text

bup: Features Web interface 15 / 26

Slide 29

Slide 29 text

bup: Features Runs on dd-wrt 16 / 26

Slide 30

Slide 30 text

bup: Features Import-script for rsnapshot backups More will follow (Duplicity) 17 / 26

Slide 31

Slide 31 text

bup: Features Full compatibility with Git Git tools like gitk or tig can be used with bup repositores 18 / 26

Slide 32

Slide 32 text

bup: Features Uses par2 to be save against bitrot, filesystem-, and media-errors 19 / 26

Slide 33

Slide 33 text

bup: Algorithms & Data Structures Hashsplitting Midx Bloom filters 20 / 26

Slide 34

Slide 34 text

Hashsplitting Rolling checksum rsync’s algorithm Big files are split in 8kB Chunks (avg) 11 least significant bits of the checksum ”1“ ⇒ new chunk 21 / 26

Slide 35

Slide 35 text

Midx idx: indexes for packfiles 1 idx per packfile An object is found with 3-4 lookups per packfile Midx for several packfiles Object is found with 2 lookups Problem: midx have to be recreated for every change 22 / 26

Slide 36

Slide 36 text

Bloom Filters Probabilistic data structure Check if a datum is known Append possible False-positives Rate grows with added data When rate >1% the bloom filter is expanded and rewritten Hash function optimized for few 1s in result Bloom filter is a bitarray; the result is added with bitwise OR When a hit is found a midx-lookup is done 23 / 26

Slide 37

Slide 37 text

Recent Meta data support about to be finished (patchset available, testing needed) Repack patches pending (deleting old backups) inotify based daemon is being discussed 24 / 26

Slide 38

Slide 38 text

You & bup? Python & a bit of C Native Windows support? OSX / Windows meta data support? OSX ”inotify“-like port? GUI? Diff 25 / 26

Slide 39

Slide 39 text

Thank You @zoranzaric zorzar on freenode & hackint [email protected] (Email & Jabber) zoranzaric.de github.com/zoranzaric gplus.zoranzaric.de Slides: zoranzaric.de/bup-28c3.pdf 26 / 26