How shit works:
Databases
Tomer Gabel // @tomerg // substrate.co.il
Slide 2
Slide 2 text
Every journey begins
with a grumpy CEO.
Go make me
some money!
Photo: Grumpy Old Men by Michael Summers (CC)
Slide 3
Slide 3 text
The pipeline churns…
Product Manager Chief Architect Management
Photos: Andrea Appiani, Oren Jack Turner, Philippe de Champaigne (Public Domain)
Slide 4
Slide 4 text
... and finally!
Specification
Management
Product
Engineering
1. You have a product to build
2. It is divided into sane chunks
3. Your team owns a chunk
4. You’re all set to implement it
5. Now you need a database
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
It depends.
Slide 8
Slide 8 text
It really does
1. But that’s not helpful
2. Too many paradigms
3. Too many options
4. Too many nuances
5. A little knowledge goes a
long way.
Photo: He Who Thinks by Damien Galban (CC)
Slide 9
Slide 9 text
How shit works:
Databases
Tomer Gabel
at Joy of Coding, June 2023, Rotterdam
Slide 10
Slide 10 text
End-User License Agreement
1. I’m not an expert
2. We’ll barely scratch
the surface
3. We’ll simplify things
liberally
Image: Public Domain
Bullshit ahead!
Slide 11
Slide 11 text
Beginning at the Beginning
“In computing, a database is an organized collection
of data stored and accessed electronically.
Access to this data is usually provided by a "database
management system" (DBMS).”
-- Wikipedia
Slide 12
Slide 12 text
What problems does a database solve?
1. Data modelling
representation, schema
2. Data manipulation
insert, update, delete
3. Data retrieval
querying
4. Administration
security, backup, ACL…
Slide 13
Slide 13 text
There’s no avoiding it
1. Databases deal with data
2. Data must be stored
3. Stored where?
4. Stored how?
Photos: D-Kuru, Evan-Amos, Ryse93 (Creative Commons)
Slide 14
Slide 14 text
Common data representations
1. Relational
2. Document
3. Graph
4. Geospatial
5. Time-series
Slide 15
Slide 15 text
Common data representations
1. Relational
2. Document
3. Graph
4. Geospatial
5. Time-series
Glorified K/V Stores
Slide 16
Slide 16 text
B-Trees
They’re out to get you.
Slide 17
Slide 17 text
3
6
1 5
8
11
15
Remember Binary Search Trees?
1. A fundamentally simple idea
2. Keep your data sorted and
lookup becomes cheap
3. Ill-suited to disk storage
- Limited branching factor
Slide 18
Slide 18 text
Remember Binary Search Trees?
1. A fundamentally simple idea
2. Keep your data sorted and
lookup becomes cheap
3. Ill-suited to disk storage
- Limited branching factor
- Pathological behavior
5
6
3
1
11
8 15
Slide 19
Slide 19 text
Balanced Trees
3
6
1 5
11
8 15
1. Optimized data structures that:
- Limit the tree height
- Reduce write cost
- Avoid waste and fragmentation
2. Examples include:
- AVL trees
- Red/Black trees
- Splay trees
Height=2
Slide 20
Slide 20 text
Hello B-Tree My Old Friend
Order = 5
82
27 54
5 11 13 20
91 99
30 31 37 42 67 72 78 80
82
13 27 54
5 11
91 99
20 26 30 31 37 42
B-Tree Insertion
Median is
”pushed up”
Slide 23
Slide 23 text
B-Trees: Summary
1. A self-balancing tree
2. O(logN) select/insert/delete
3. Designed for on-disk storage
4. Many variants
5. Optimized for reads
Slide 24
Slide 24 text
Pretend it’s a product
1. Let’s build a social network!
2. Specifically, the storage
subsystem for likes/dislikes
3. Extremely frequent operation
4. Are B-Trees suitable?
Slide 25
Slide 25 text
Not a Good Fit
1. Writes are costly!
- Read-before-write
- Write amplification
- Wasted space
2. Data used in aggregate
- Infrequent lookups
- Optimizing for the wrong thing!
Photo: Overloading of vehicles by Ore O.j (CC)
Slide 26
Slide 26 text
Log-Structured Merge Trees
to the rescue
Slide 27
Slide 27 text
LSM Trees
1. Writes go in a memory buffer
2. Keys are timestamped
Key Value Timestamp
C 1 15
B 7 21
C 2 28
D 5 32
MemTable
Slide 28
Slide 28 text
LSM Trees
1. Writes go in a memory buffer
2. Keys are timestamped
3. At some threshold, buffer is:
- Sorted by key
- Flushed to disk (SSTable)
4. Append-only, no lookup!
Key Value Timestamp
C 1 15
B 7 21
C 2 28
D 5 32
MemTable
Key Value Timestamp
B 7 21
C 2 28
D 5 32
SSTable 1 (on disk)
Slide 29
Slide 29 text
LSM Trees: Reads
Key Value Timestamp
C 70 1012
P 15 990
MemTable
Key Value Timestamp
B 7 21
C 2 28
D 5 32
SSTable 1
Key Value Timestamp
B 5 282
E 14 211
O 2 303
SSTable 2
Key Value Timestamp
A 10 805
B 3 860
D 12 901
SSTable 3
What is the value of B?
Slide 30
Slide 30 text
LSM Trees: Reads
Key Value Timestamp
C 70 1012
P 15 990
MemTable
Key Value Timestamp
B 7 21
C 2 28
D 5 32
SSTable 1
Key Value Timestamp
B 5 282
E 14 211
O 2 303
SSTable 2
What is the value of B?
Key Value Timestamp
A 10 805
B 3 860
D 12 901
SSTable 3
Slide 31
Slide 31 text
LSM Trees: Reads
Key Value Timestamp
C 70 1012
P 15 990
MemTable
Key Value Timestamp
B 7 21
C 2 28
D 5 32
SSTable 1
Key Value Timestamp
B 5 282
E 14 211
O 2 303
SSTable 2
What is the value of B? = 3
Key Value Timestamp
A 10 805
B 3 860
D 12 901
SSTable 3
Slide 32
Slide 32 text
Naïve LSM Trees
1. You probably noticed issues
2. Old versions are retained -
useful but wasteful
3. All SSTables must be queried
- I/O proportional to age
- Bloom filters usually used to
skip entire lookups
Key Value Timestamp
B 7 21
C 2 28
D 5 32
B, C, D
SSTable
Bloom filter
Slide 33
Slide 33 text
LSM Trees: Compaction
Key Value Timestamp
B 7 21
C 2 28
D 5 32
Key Value Timestamp
B 5 282
E 14 211
O 2 303
Key Value Timestamp
A 10 805
B 3 860
D 12 901
Key Value Timestamp
C 70 1012
P 15 990
Key Value Timestamp
B 3 860
C 70 1012
D 12 901
E 14 211
O 2 303
P 15 990
Merged SSTable
Slide 34
Slide 34 text
LSM Trees: Compaction
1. Compaction can happen
concurrently
2. Controlling the number of
SSTables is critical
3. Typically, merges happen on
expontentially larger levels
SSTable SSTable SSTable
SSTable
SSTable
SSTable
SSTable
Level 0
Level 1
Level 2
Slide 35
Slide 35 text
LSM Trees: Summary
1. Optimized for writes
- Append-only, no lookups
- No write amplification
2. Not ideal for read workloads
- Can be optimized though
Slide 36
Slide 36 text
So then, how
do we choose?
Slide 37
Slide 37 text
Key Takeaways
1. Early on, the choice matters very little.
Any off-the-shelf DMBS will probably work.
2. Base your choice on your opertional needs,
data access patterns and finally scale.
In that order.
3. The more you understand, the better your leverage.
Slide 38
Slide 38 text
tomer@substrate.co.il
@tomerg
https://github.com/holograph
Thank you for listening
Questions?