Slide 1

Slide 1 text

LVM It's only logical. Steven Lembark Workhorse Computing [email protected]

Slide 2

Slide 2 text

What's so logical about LVM?

Slide 3

Slide 3 text

What's so logical about LVM? Simple: It isn't phyiscal.

Slide 4

Slide 4 text

What's so logical about LVM? Simple: It isn't phyiscal. Think of it as "virtual memory" on a disk.

Slide 5

Slide 5 text

Simple??? PV VG LV thin provisioned mirrored snapshots

Slide 6

Slide 6 text

Begin at the beginning... Disk drives were 5, maybe 10Mb. Booting from tape takes too long. Can't afford a whole disk for swap. What to do? Tar overlays? RA-70 packs?

Slide 7

Slide 7 text

Partitions save the day! Divide the drive for swap, boot, O/S. Allows separate mount points. New partitions == New mount points.

Slide 8

Slide 8 text

What you used to do Partition the drive. Remembering to keep a separate partition for /boot. Using parted once the original layout was outgrown. Figuring out how to split space with new disks...

Slide 9

Slide 9 text

Size matters. Say you have something big: 20MB of data. Tape overlays take too long. RA-70's require remounts. How can we manage it?

Slide 10

Slide 10 text

Making it bigger We need an abstraction: Vnodes. Instead of "hardware".

Slide 11

Slide 11 text

Veritas & HP Developed different ways to do this. Physical drives. Grouped into "blocks". Allocated into "volumes". Fortunately linux uses HP's scheme.

Slide 12

Slide 12 text

First few steps pvcreate initialize physical storage. whole disk or partition. vgcreate multiple drives into pool of blocks. lvcreate re-partition blocks into mountable units.

Slide 13

Slide 13 text

Example: single-disk desktop grub2 speaks lvm – goodby boot partitions! Two partitions: efi boot + everything else. Call them /dev/sda{1,2}. efi == 128M rest == lvm

Slide 14

Slide 14 text

Example: single-disk desktop grub2 speaks lvm – goodby boot partitions! Two partitions: efi boot + everything else. Call them /dev/nvme0n1p{1,2}. efi == 128M rest == lvm

Slide 15

Slide 15 text

Example: single-disk desktop grub2 speaks lvm – goodby boot partitions! Three for notebooks efi + swap + everything else. Call them /dev/neme0n1p{0,1,3} swap is for hibernate/recovery. efi+LVM look like desktop/server.

Slide 16

Slide 16 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00; # mkfs.xfs -blog=12 -L root /dev/vg00/root; # mount /dev/vg00/root /mnt/gentoo;

Slide 17

Slide 17 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e

Slide 18

Slide 18 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2;

Slide 19

Slide 19 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2;

Slide 20

Slide 20 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00;

Slide 21

Slide 21 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00; # mkfs.xfs -blog=12 -L root /dev/vg00/root;

Slide 22

Slide 22 text

Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00; # mkfs.xfs -blog=12 -L root /dev/vg00/root; # mount /dev/vg00/root /mnt/gentoo;

Slide 23

Slide 23 text

Finding yourself /etc/fstab gets interesting... Ever get sick of UUID? Labels? Device paths?

Slide 24

Slide 24 text

Finding yourself /etc/fstab gets interesting... LVM assigns UUIDs to PV, VG, LV.

Slide 25

Slide 25 text

Finding yourself Let LVM do the walking: vgscan -v

Slide 26

Slide 26 text

Give linux the boot Sample initrd/init script: mount -t proc none /proc; mount -t sysfs none /sys; /sbin/mdadm --verbose --assemble --scan; /sbin/vgscan –verbose; /sbin/vgchange -a y; /sbin/mount /dev/vg00/root /mnt/root; exec /sbin/switch_root /mnt/root /sbin/init;

Slide 27

Slide 27 text

Then root fills up... Say goodby to parted. lvextend -L12Gi /dev/vg00/root; xfs_growfs /dev/vg00/root; Notice the lack of any umount.

Slide 28

Slide 28 text

Then root fills up... Say goodby to parted. lvextend -L12Gi /dev/vg00/root; xfs_growfs /dev/vg00/root; Notice the lack of any umount.

Slide 29

Slide 29 text

Add a new disk /sbin/fdisk /dev/sdb; # sdb1 => 8e pvcreate /dev/sdb1; vgextend vg00 /dev/sdb1; lvextend -L24Gi /dev/vg00/root; xfs_growfs /dev/vg00/root;

Slide 30

Slide 30 text

And another disk, and another... Let's say you've scrounged ten disks. One large VG.

Slide 31

Slide 31 text

And another disk, and another... Let's say you've scrounged ten disks. One large VG. Then one disk fails.

Slide 32

Slide 32 text

And another disk, and another... Let's say you've scrounged ten disks. One large VG. Then one disk fails. And the entire VG with it.

Slide 33

Slide 33 text

Adding volume groups Lesson: Over-large VG's become fragile. Fix: Multiple VG's partition the vulnerability. One disk won't bring down everyhing.

Slide 34

Slide 34 text

Growing a new VG Plug in and fdisk your new device. # pvcreate /dev/sdc1; # vgcreate vg01 /dev/sdc1; # lvcreate -n home vg01; # mkfs.xfs -blog=12 -L home /dev/vg01/home; Copy files, add /dev/vg01/home to /etc/fstab.

Slide 35

Slide 35 text

Your backups just got easier Separate mount points for "scratch". "find . -xdev" Mount /var/spool, /var/tmp. Back up persistent portion of /var with rest of root.

Slide 36

Slide 36 text

More smaller volumes More /etc/fstab entries. Isolate disk failures to non-essential data. Back up by mount point. Use different filesystems (xfs vs. ext4).

Slide 37

Slide 37 text

RAID + LVM - LV with copies using LVM. - Or make PV's out of mdadm volumes. LV's simplify handling huge RAID volumes.

Slide 38

Slide 38 text

LVM RAID # lvcreate -m <#copies> … Automatically duplicates LV data. "-m 2" == three-volume RAID (1 data + 2 copy). Blocks don't need to be contiguous. Can lvextend later on.

Slide 39

Slide 39 text

LVM on RAID Division of labor: - mdadm for RAID. - LVM for mount points. Use mdadm to create space. Use LVM to manage it.

Slide 40

Slide 40 text

"Stride" for RAID > 1 LV blocks == RAID page size. Good for RAID 5, 6, 10. Meaningless for mirror (LVM or hardware).

Slide 41

Slide 41 text

Monitored LVM # lvmcreate –monitor y ... Use dmeventd for monitoring. Know about I/O errors. What else would you want to do at 3am?

Slide 42

Slide 42 text

Real SysAdmin's don't need sleep! Archiving many GB takes time. Need stable filesystems. Easy: Kick everyone off, run the backups at 0300.

Slide 43

Slide 43 text

Real SysAdmin's don't need sleep! Archiving many GB takes time. Need stable filesystems. Easy: Kick everyone off, run the backups at 0300. If you enjoy that sort of thing.

Slide 44

Slide 44 text

Snapshots: mounted backups Not a hot copy. Snapshot pool == stable version of COW blocks. Stable version of LV being backed up. Size == max pages that might change during lifetime.

Slide 45

Slide 45 text

Lifecycle of a snapshot Find mountpoint. Snapshot mount point. Work with static contents. Drop snapshot.

Slide 46

Slide 46 text

Most common: backup Live database Spool directory Disk cache Home dirs

Slide 47

Slide 47 text

Backup a database Data and config under /var/lib/postgres. Data on single LV /dev/vg01/postgres. At 0300 up to 1GB per hour. Daily backup takes about 30 minutes. VG keeps 8GB free for snapshots.

Slide 48

Slide 48 text

Backup a database # lvcreate -L1G -s -n pg-tmp \ /dev/vg01/postgres; 1G == twice the usual amount. Updates to /dev/vg01/postgres continue. Original pages stored in /dev/vg01/pg-backup. I/O error in pg-backup if > 1GB written.

Slide 49

Slide 49 text

Backup a database # lvcreate -L1G -s -n pg-tmp \ /dev/vg01/postgres; # mount --type xfs \ -o'ro,norecovery,nouuid' /dev/vg01/pg-tmp /mnt/backup; # find /mnt/backup -xdev … /mnt/backup is stable for duration of backup. /var/postgres keeps writing.

Slide 50

Slide 50 text

Backup a database One downside: Duplicate running database. Takes extra steps to restart. Not difficult. Be Prepared.

Slide 51

Slide 51 text

Giving away what you ain't got "Thin volumes". Like sparse files. Pre-allocate pool of blocks. LV's grow as needed. Allows overcommit.

Slide 52

Slide 52 text

"Thin"? "Thick" == allocate LV blocks at creation time. "—thin" assigns virtual size. Physical size varies with use.

Slide 53

Slide 53 text

Why bother? Filesystems that change over time. No need to pre-allocate all of the space. Add physical storage as needed. ETL intake. Archival. User scratch.

Slide 54

Slide 54 text

Example: Scratch space for users. Say you have ~30GB of free space. And three users. Each "needs" 40GB of space. No problem.

Slide 55

Slide 55 text

The pool is an LV. "--thinpool" labels the LV as a pool. Allocate 30GB into /dev/vg00/scatch. # lvcreate -L 30Gi --thinpool scratch vg00;

Slide 56

Slide 56 text

Virtually allocate LV "lvcreate -V" allocates space out of the pool. "-V" == "virtual" # lvcreate -V 40Gi --thin -n thin_one \ /dev/vg00/scratch; # lvcreate -V 40Gi ...

Slide 57

Slide 57 text

Virtually yours Allocated 120Gi using 30GB of disk. lvdisplay shows 0 used for each volume??

Slide 58

Slide 58 text

Virtually yours Allocated 60GiB of 50GB. lvdisplay shows 0 used for each volume?? Right: None used. Yet. 40GiB is a limit.

Slide 59

Slide 59 text

Pure magic! Make a filesytem. Mount the lvols. df shows them as 40GiB. Everyone is happy!

Slide 60

Slide 60 text

Pure magic! Make a filesytem. Mount the lvols. df shows them as 20GiB. Everyone is happy... Until 30GB is used up.

Slide 61

Slide 61 text

Size does matter. No blocks left to allocate. Now what? Writing procs are "killable blocked". Hold queue until "kill -KILL" or space available.

Slide 62

Slide 62 text

One fix: scrounge a disk vgextend vg00; lvextend -L /dev/vg00/scratch; Bingo: free blocks.

Slide 63

Slide 63 text

Reduce, reuse, recycle fstrim(8) removed unused blocks from a filesystem. Reduces virtual allocations. Allows virtual volumes to re-grow: ftrim --all -verbose; cron is your friend.

Slide 64

Slide 64 text

Highly dynamic environment Weekly doesn't cut it: download directory. uncompress space compile large projects. Or a notebook with small SSD.

Slide 65

Slide 65 text

Automatic real-time trimming 3.0+ kernel w/ "TRIM". Device needs "FITRIM". http://xfs.org/index.php/FITRIM/discard

Slide 66

Slide 66 text

Sanity check: discard avaiable? $ cat /sys/block/sda/queue/discard_max_bytes; 2147450880 $ cat /sys/block/dm-8/queue/discard_max_bytes; 2147450880 So far so good...

Slide 67

Slide 67 text

Do the deed /dev/vg00/scratch_one /scratch/jowbloe \ xfs defaults,discard 0 1 or mount -o defaults,discard /foo /mnt/foo;

Slide 68

Slide 68 text

What you see is all you get Dynamic discard == overhead. Often not worse than network mounts. YMMV... Avoids "just in case" provisioning. Good with SSD, battery-backed RAID.

Slide 69

Slide 69 text

LVM benefits - Saner mount points. - Hot backups. - Dynamic space manglement.

Slide 70

Slide 70 text

LVM benefits - Saner mount points. - Hot backups. - Dynamic space manglement. Seems logical?