Slide 1

Slide 1 text

How shit works: Databases Tomer Gabel // @tomerg // substrate.co.il

Slide 2

Slide 2 text

Every journey begins with a grumpy CEO. Go make me some money! Photo: Grumpy Old Men by Michael Summers (CC)

Slide 3

Slide 3 text

The pipeline churns… Product Manager Chief Architect Management Photos: Andrea Appiani, Oren Jack Turner, Philippe de Champaigne (Public Domain)

Slide 4

Slide 4 text

... and finally! Specification Management Product Engineering 1. You have a product to build 2. It is divided into sane chunks 3. Your team owns a chunk 4. You’re all set to implement it 5. Now you need a database

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

It depends.

Slide 8

Slide 8 text

It really does 1. But that’s not helpful 2. Too many paradigms 3. Too many options 4. Too many nuances 5. A little knowledge goes a long way. Photo: He Who Thinks by Damien Galban (CC)

Slide 9

Slide 9 text

How shit works: Databases Tomer Gabel at Joy of Coding, June 2023, Rotterdam

Slide 10

Slide 10 text

End-User License Agreement 1. I’m not an expert 2. We’ll barely scratch the surface 3. We’ll simplify things liberally Image: Public Domain Bullshit ahead!

Slide 11

Slide 11 text

Beginning at the Beginning “In computing, a database is an organized collection of data stored and accessed electronically. Access to this data is usually provided by a "database management system" (DBMS).” -- Wikipedia

Slide 12

Slide 12 text

What problems does a database solve? 1. Data modelling representation, schema 2. Data manipulation insert, update, delete 3. Data retrieval querying 4. Administration security, backup, ACL…

Slide 13

Slide 13 text

There’s no avoiding it 1. Databases deal with data 2. Data must be stored 3. Stored where? 4. Stored how? Photos: D-Kuru, Evan-Amos, Ryse93 (Creative Commons)

Slide 14

Slide 14 text

Common data representations 1. Relational 2. Document 3. Graph 4. Geospatial 5. Time-series

Slide 15

Slide 15 text

Common data representations 1. Relational 2. Document 3. Graph 4. Geospatial 5. Time-series Glorified K/V Stores

Slide 16

Slide 16 text

B-Trees They’re out to get you.

Slide 17

Slide 17 text

3 6 1 5 8 11 15 Remember Binary Search Trees? 1. A fundamentally simple idea 2. Keep your data sorted and lookup becomes cheap 3. Ill-suited to disk storage - Limited branching factor

Slide 18

Slide 18 text

Remember Binary Search Trees? 1. A fundamentally simple idea 2. Keep your data sorted and lookup becomes cheap 3. Ill-suited to disk storage - Limited branching factor - Pathological behavior 5 6 3 1 11 8 15

Slide 19

Slide 19 text

Balanced Trees 3 6 1 5 11 8 15 1. Optimized data structures that: - Limit the tree height - Reduce write cost - Avoid waste and fragmentation 2. Examples include: - AVL trees - Red/Black trees - Splay trees Height=2

Slide 20

Slide 20 text

Hello B-Tree My Old Friend Order = 5 82 27 54 5 11 13 20 91 99 30 31 37 42 67 72 78 80

Slide 21

Slide 21 text

82 27 54 5 11 13 20 26 91 99 30 31 37 42 67 72 78 80 B-Tree Insertion Too many keys - needs split!

Slide 22

Slide 22 text

82 13 27 54 5 11 91 99 20 26 30 31 37 42 B-Tree Insertion Median is ”pushed up”

Slide 23

Slide 23 text

B-Trees: Summary 1. A self-balancing tree 2. O(logN) select/insert/delete 3. Designed for on-disk storage 4. Many variants 5. Optimized for reads

Slide 24

Slide 24 text

Pretend it’s a product 1. Let’s build a social network! 2. Specifically, the storage subsystem for likes/dislikes 3. Extremely frequent operation 4. Are B-Trees suitable?

Slide 25

Slide 25 text

Not a Good Fit 1. Writes are costly! - Read-before-write - Write amplification - Wasted space 2. Data used in aggregate - Infrequent lookups - Optimizing for the wrong thing! Photo: Overloading of vehicles by Ore O.j (CC)

Slide 26

Slide 26 text

Log-Structured Merge Trees to the rescue

Slide 27

Slide 27 text

LSM Trees 1. Writes go in a memory buffer 2. Keys are timestamped Key Value Timestamp C 1 15 B 7 21 C 2 28 D 5 32 MemTable

Slide 28

Slide 28 text

LSM Trees 1. Writes go in a memory buffer 2. Keys are timestamped 3. At some threshold, buffer is: - Sorted by key - Flushed to disk (SSTable) 4. Append-only, no lookup! Key Value Timestamp C 1 15 B 7 21 C 2 28 D 5 32 MemTable Key Value Timestamp B 7 21 C 2 28 D 5 32 SSTable 1 (on disk)

Slide 29

Slide 29 text

LSM Trees: Reads Key Value Timestamp C 70 1012 P 15 990 MemTable Key Value Timestamp B 7 21 C 2 28 D 5 32 SSTable 1 Key Value Timestamp B 5 282 E 14 211 O 2 303 SSTable 2 Key Value Timestamp A 10 805 B 3 860 D 12 901 SSTable 3 What is the value of B?

Slide 30

Slide 30 text

LSM Trees: Reads Key Value Timestamp C 70 1012 P 15 990 MemTable Key Value Timestamp B 7 21 C 2 28 D 5 32 SSTable 1 Key Value Timestamp B 5 282 E 14 211 O 2 303 SSTable 2 What is the value of B? Key Value Timestamp A 10 805 B 3 860 D 12 901 SSTable 3

Slide 31

Slide 31 text

LSM Trees: Reads Key Value Timestamp C 70 1012 P 15 990 MemTable Key Value Timestamp B 7 21 C 2 28 D 5 32 SSTable 1 Key Value Timestamp B 5 282 E 14 211 O 2 303 SSTable 2 What is the value of B? = 3 Key Value Timestamp A 10 805 B 3 860 D 12 901 SSTable 3

Slide 32

Slide 32 text

Naïve LSM Trees 1. You probably noticed issues 2. Old versions are retained - useful but wasteful 3. All SSTables must be queried - I/O proportional to age - Bloom filters usually used to skip entire lookups Key Value Timestamp B 7 21 C 2 28 D 5 32 B, C, D SSTable Bloom filter

Slide 33

Slide 33 text

LSM Trees: Compaction Key Value Timestamp B 7 21 C 2 28 D 5 32 Key Value Timestamp B 5 282 E 14 211 O 2 303 Key Value Timestamp A 10 805 B 3 860 D 12 901 Key Value Timestamp C 70 1012 P 15 990 Key Value Timestamp B 3 860 C 70 1012 D 12 901 E 14 211 O 2 303 P 15 990 Merged SSTable

Slide 34

Slide 34 text

LSM Trees: Compaction 1. Compaction can happen concurrently 2. Controlling the number of SSTables is critical 3. Typically, merges happen on expontentially larger levels SSTable SSTable SSTable SSTable SSTable SSTable SSTable Level 0 Level 1 Level 2

Slide 35

Slide 35 text

LSM Trees: Summary 1. Optimized for writes - Append-only, no lookups - No write amplification 2. Not ideal for read workloads - Can be optimized though

Slide 36

Slide 36 text

So then, how do we choose?

Slide 37

Slide 37 text

Key Takeaways 1. Early on, the choice matters very little. Any off-the-shelf DMBS will probably work. 2. Base your choice on your opertional needs, data access patterns and finally scale. In that order. 3. The more you understand, the better your leverage.

Slide 38

Slide 38 text

tomer@substrate.co.il @tomerg https://github.com/holograph Thank you for listening Questions?