Slide 1

Slide 1 text

things you should know about… @ordepdev Database Storage and Retrieval

Slide 2

Slide 2 text

@ordepdev @ordepdev

Slide 3

Slide 3 text

Why should you care?

Slide 4

Slide 4 text

db_set() { echo “$1,$2” >> database; }

Slide 5

Slide 5 text

$ db_set C 1 $ db_set A 1 $ db_set B 1

Slide 6

Slide 6 text

$ cat database “C,1” “A,1” “B,1”

Slide 7

Slide 7 text

Log-Structured file 1991

Slide 8

Slide 8 text

A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. ” “

Slide 9

Slide 9 text

Collect large amounts of new data in a file cache in main memory, then write the data to disk in a single large I/0.” “

Slide 10

Slide 10 text

C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 Data File Segment 1 B:4 B:5 C:2 A:5 D:1 D:2 A:6 Data File Segment 2 A:7 B:6 C:3 C:4 B:7 E:1 Data File Segment …N Memory

Slide 11

Slide 11 text

How do we avoid running out of space?

Slide 12

Slide 12 text

C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 Compactation Data File Segment

Slide 13

Slide 13 text

C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 C:1 B:3 A:4 Data File Segment Compacted Segment Compactation

Slide 14

Slide 14 text

B:4 C:2 C:3 C:4 C:5 A:5 A:6 A:7 C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 Merging & Compactation Data File Segment 1 Data File Segment 2

Slide 15

Slide 15 text

B:4 C:2 C:3 C:4 C:5 A:5 A:6 A:7 C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 B:4 C:5 A:7 + Merging & Compactation Data File Segment 2 Data File Segment 1 Compacted & Merged Segment

Slide 16

Slide 16 text

Why using an Append-only log? Sequential write operations are much more faster than random writes. Concurrency and crash recovery are much simpler. Merging old segments avoids fragmentation.

Slide 17

Slide 17 text

How do we find the value of a given key?

Slide 18

Slide 18 text

Index Additional structure that is derived from data. It keeps some additional metadata on the side that helps to locate the data. Maintaining such structures incurs overhead, especially on write!

Slide 19

Slide 19 text

The simplest possible indexing strategy is to keep an in-memory hash map where each key is mapped to a byte offset.” “

Slide 20

Slide 20 text

Hash Indexes 1 0 0 , { “ n a : “ P m e “ o r t o “ key byte offset 100 0 101 20 Log-structured file on disk In-memory hash map 0 1 , “ n a m e L i s b o n “ } } \n 1 “ : “ \n

Slide 21

Slide 21 text

Hash Indexes 1 0 0 , { “ n a : “ P m e “ o r t o “ key byte offset 100 0 101 20 0 1 , “ n a m e L i s b o n “ } } \n 1 “ : “ \n In-memory hash map Log-structured file on disk

Slide 22

Slide 22 text

A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2 2010 Hash Indexes

Slide 23

Slide 23 text

When a write occurs, the keydir is atomically updated with the location of the newest data. ” “

Slide 24

Slide 24 text

The old data is still present on disk, but any new reads will use the latest version available in the keydir.” “

Slide 25

Slide 25 text

Hash Indexes LIMITATIONS Not suitable for a very large number of keys, since the entire hash map must fit in memory! Scanning over a range of keys it’s not efficient — it would be necessary to look up each key individually in the hash maps.

Slide 26

Slide 26 text

SSTables A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2 2006

Slide 27

Slide 27 text

An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.” “

Slide 28

Slide 28 text

A lookup can be performed by first finding the appropriate block with a binary search in the in-memory index, and then reading the appropriate block from disk.” “

Slide 29

Slide 29 text

sparse in-memory index A:1 B:1 C:1 D:2 E:1 F:1 G:9 H:1 B:4 C:5 A:7 Compacted Data Segment I:2 J:4 K:2 L:7 M:1 N:7 O:1 P:3 key byte offset A 100491 I 101201 M 103041 X 104204 Sorted segment file on disk ………… ………… In-memory index

Slide 30

Slide 30 text

Merging & Compactation A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2

Slide 31

Slide 31 text

Merging & Compactation B:4 C:5 A:7 + Compacted & Merged Segment A:2 B:2 C:2 D:1 E:1 F:1 G:1 H1 A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2

Slide 32

Slide 32 text

Storage engines that are based on this principle of merging and compacting sorted files are often called LSM storage engines.

Slide 33

Slide 33 text

LSM-Tree A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2 2006

Slide 34

Slide 34 text

What about read performance?

Slide 35

Slide 35 text

Bloom filters Memory-efficient data structure for approximating the contents of a set. “ ”

Slide 36

Slide 36 text

Bloom filters It can tell if a key does not exist in the database, saving many unnecessary disk reads for nonexistent keys.

Slide 37

Slide 37 text

BLOOM FILTERS & SSTABLES A:1 B:1 C:1 D:2 E:1 F:1 G:9 H:1 B:4 C:5 A:7 Compacted Data Segment I:2 J:4 K:2 L:7 M:1 N:7 O:1 P:3 key byte offset A 100491 I 101201 M 103041 X 104204 Sorted segment file on disk In-memory index Bloom Filter

Slide 38

Slide 38 text

Advantages over Hash indexes When multiple segments contain the same key, the value from the most recent segment is kept and older segments are discarded. In order to find a particular key in the file, there’s no longer need to keep the full index in memory!

Slide 39

Slide 39 text

B-TREES A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2 1970 1979

Slide 40

Slide 40 text

The index is organized in pages of fixed size capable of holding up to 2k keys, but pages need only be partially filled. ” “

Slide 41

Slide 41 text

B-TREES 51 11 65 2 7 12 15 55 62 20

Slide 42

Slide 42 text

B-TREES 51 11 65 2 7 12 15 55 62 20 “look up id 15”

Slide 43

Slide 43 text

B-TREES 51 11 65 2 7 12 15 55 62 20 “look up id 15”

Slide 44

Slide 44 text

B-TREES 51 11 65 2 7 12 15 55 62 20 “look up id 15”

Slide 45

Slide 45 text

B-TREES 51 11 65 2 7 12 15 55 62 20 “look up id 15”

Slide 46

Slide 46 text

B-TREES 51 11 65 2 7 12 15 55 62 20 “look up id 15”

Slide 47

Slide 47 text

B-TREES 51 11 65 2 7 12 15 55 62 20 “look up id 15”

Slide 48

Slide 48 text

B-TREES :: REBALANCING 51 11 2 11 51 2

Slide 49

Slide 49 text

B-TREES :: REBALANCING 51 11 2 11 51 2

Slide 50

Slide 50 text

B-TREES :: SPLITTING 51 11 2 11 51 2

Slide 51

Slide 51 text

B-TREES :: SPLITTING 51 11 2 11 51 2

Slide 52

Slide 52 text

What about resilience?

Slide 53

Slide 53 text

Write-ahead log (wal) ~ Redo-log All modifications must be written before it can be applied to the pages of the tree itself. Writing all modifications to the WAL means that a B- tree index must write every piece of data at least twice!!!

Slide 54

Slide 54 text

Write amplification One write to the database that results in multiple writes to the disk. Write amplification has a direct performance cost! The more that a storage engine writes to disk, the fewer writes per second it can handle.

Slide 55

Slide 55 text

Wrapping Up.

Slide 56

Slide 56 text

B-trees are mutable and allow in-place updates.

Slide 57

Slide 57 text

Writes are slower on B-trees since they must write every piece of data at least twice.

Slide 58

Slide 58 text

LSM Trees are immutable, they are written on disk once and never updated.

Slide 59

Slide 59 text

Reads are slower on LSM-trees since they have to check several data structures at different stages of compaction.

Slide 60

Slide 60 text

LSM-trees are able to sustain higher write throughput due to lower write amplification and sequential writes.

Slide 61

Slide 61 text

Which one is the best type of storage?

Slide 62

Slide 62 text

There is no quick and easy rule for determining which type of storage engine is better for your use case.

Slide 63

Slide 63 text

Don’t pick databases based on hype.

Slide 64

Slide 64 text

Always test against your use case!

Slide 65

Slide 65 text

YOU SHOULD READ PAPERS! @PWLPORTO

Slide 66

Slide 66 text

things you should know about… @ordepdev Database Storage and Retrieval