Things you should know about Database Storage and Retrieval

things you should know about… @ordepdev Database Storage and Retrieval

@ordepdev @ordepdev

Why should you care?

db_set() { echo “$1,$2” >> database; }

$ db_set C 1 $ db_set A 1 $ db_set
B 1

$ cat database “C,1” “A,1” “B,1”

Log-Structured file 1991

A log-structured file system writes all modifications to disk sequentially
in a log-like structure, thereby speeding up both file writing and crash recovery. ” “

Collect large amounts of new data in a ﬁle cache
in main memory, then write the data to disk in a single large I/0.” “

C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 Data File
Segment 1 B:4 B:5 C:2 A:5 D:1 D:2 A:6 Data File Segment 2 A:7 B:6 C:3 C:4 B:7 E:1 Data File Segment …N Memory

How do we avoid running out of space?

C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 Compactation Data
File Segment

C:1 A:1 B:1 B:2 A:2 A:3 A:4 B:3 C:1 B:3
A:4 Data File Segment Compacted Segment Compactation

B:4 C:2 C:3 C:4 C:5 A:5 A:6 A:7 C:1 A:1
B:1 B:2 A:2 A:3 A:4 B:3 Merging & Compactation Data File Segment 1 Data File Segment 2

B:4 C:2 C:3 C:4 C:5 A:5 A:6 A:7 C:1 A:1
B:1 B:2 A:2 A:3 A:4 B:3 B:4 C:5 A:7 + Merging & Compactation Data File Segment 2 Data File Segment 1 Compacted & Merged Segment

Why using an Append-only log? Sequential write operations are much
more faster than random writes. Concurrency and crash recovery are much simpler. Merging old segments avoids fragmentation.

How do we find the value of a given key?

Index Additional structure that is derived from data. It keeps
some additional metadata on the side that helps to locate the data. Maintaining such structures incurs overhead, especially on write!

The simplest possible indexing strategy is to keep an in-memory
hash map where each key is mapped to a byte offset.” “

Hash Indexes 1 0 0 , { “ n a
: “ P m e “ o r t o “ key byte offset 100 0 101 20 Log-structured ﬁle on disk In-memory hash map 0 1 , “ n a m e L i s b o n “ } } \n 1 “ : “ \n

Hash Indexes 1 0 0 , { “ n a
: “ P m e “ o r t o “ key byte offset 100 0 101 20 0 1 , “ n a m e L i s b o n “ } } \n 1 “ : “ \n In-memory hash map Log-structured ﬁle on disk

A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1
H:1 Data Segment 1 Data Segment 2 2010 Hash Indexes

When a write occurs, the keydir is atomically updated with
the location of the newest data. ” “

The old data is still present on disk, but any
new reads will use the latest version available in the keydir.” “

Hash Indexes LIMITATIONS Not suitable for a very large number
of keys, since the entire hash map must ﬁt in memory! Scanning over a range of keys it’s not efﬁcient — it would be necessary to look up each key individually in the hash maps.

SSTables A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1
G:1 H:1 Data Segment 1 Data Segment 2 2006

An SSTable provides a persistent, ordered immutable map from keys
to values, where both keys and values are arbitrary byte strings.” “

A lookup can be performed by ﬁrst ﬁnding the appropriate
block with a binary search in the in-memory index, and then reading the appropriate block from disk.” “

sparse in-memory index A:1 B:1 C:1 D:2 E:1 F:1 G:9
H:1 B:4 C:5 A:7 Compacted Data Segment I:2 J:4 K:2 L:7 M:1 N:7 O:1 P:3 key byte offset A 100491 I 101201 M 103041 X 104204 Sorted segment ﬁle on disk ………… ………… In-memory index

Merging & Compactation A:2 B:2 C:2 A:1 B:1 C:1 D:1
E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2

Merging & Compactation B:4 C:5 A:7 + Compacted & Merged
Segment A:2 B:2 C:2 D:1 E:1 F:1 G:1 H1 A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1 G:1 H:1 Data Segment 1 Data Segment 2

Storage engines that are based on this principle of merging
and compacting sorted ﬁles are often called LSM storage engines.

LSM-Tree A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1
G:1 H:1 Data Segment 1 Data Segment 2 2006

What about read performance?

Bloom filters Memory-efﬁcient data structure for approximating the contents of
a set. “ ”

Bloom filters It can tell if a key does not
exist in the database, saving many unnecessary disk reads for nonexistent keys.

BLOOM FILTERS & SSTABLES A:1 B:1 C:1 D:2 E:1 F:1
G:9 H:1 B:4 C:5 A:7 Compacted Data Segment I:2 J:4 K:2 L:7 M:1 N:7 O:1 P:3 key byte offset A 100491 I 101201 M 103041 X 104204 Sorted segment ﬁle on disk In-memory index Bloom Filter

Advantages over Hash indexes When multiple segments contain the same
key, the value from the most recent segment is kept and older segments are discarded. In order to ﬁnd a particular key in the ﬁle, there’s no longer need to keep the full index in memory!

B-TREES A:2 B:2 C:2 A:1 B:1 C:1 D:1 E:1 F:1
G:1 H:1 Data Segment 1 Data Segment 2 1970 1979

The index is organized in pages of ﬁxed size capable
of holding up to 2k keys, but pages need only be partially ﬁlled. ” “

B-TREES 51 11 65 2 7 12 15 55 62
20

B-TREES 51 11 65 2 7 12 15 55 62
20 “look up id 15”

B-TREES :: REBALANCING 51 11 2 11 51 2

B-TREES :: SPLITTING 51 11 2 11 51 2

What about resilience?

Write-ahead log (wal) ~ Redo-log All modiﬁcations must be written
before it can be applied to the pages of the tree itself. Writing all modiﬁcations to the WAL means that a B- tree index must write every piece of data at least twice!!!

Write amplification One write to the database that results in
multiple writes to the disk. Write ampliﬁcation has a direct performance cost! The more that a storage engine writes to disk, the fewer writes per second it can handle.

Wrapping Up.

B-trees are mutable and allow in-place updates.

Writes are slower on B-trees since they must write every
piece of data at least twice.

LSM Trees are immutable, they are written on disk once
and never updated.

Reads are slower on LSM-trees since they have to check
several data structures at different stages of compaction.

LSM-trees are able to sustain higher write throughput due to
lower write ampliﬁcation and sequential writes.

Which one is the best type of storage?

There is no quick and easy rule for determining which
type of storage engine is better for your use case.

Don’t pick databases based on hype.

Always test against your use case!

YOU SHOULD READ PAPERS! @PWLPORTO

things you should know about… @ordepdev Database Storage and Retrieval

Things you should know about Database Storage a...

Things you should know about Database Storage and Retrieval

More Decks by Pedro Tavares

Other Decks in Programming

Featured

Transcript