Small Data: Databases in the Real World

Slide 1

Slide 1 text

Andrew Godwin @andrewgodwin SMALL DATA REAL WORLD DATABASES IN THE

Slide 2

Slide 2 text

Andrew Godwin Core Developer Senior Engineer

Slide 3

Slide 3 text

BIG DATA What does it mean? What is 'big'?

Slide 4

Slide 4 text

1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?

Slide 5

Slide 5 text

Scalable designs are a tradeoff: NOW LATER vs

Slide 6

Slide 6 text

Small company? Agency? Focus on ease of change, not scalability

Slide 7

Slide 7 text

You don't need to scale from day one But always leave yourself scaling points

Slide 8

Slide 8 text

Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'

Slide 9

Slide 9 text

Rapid development It's all about schema change overhead

Slide 10

Slide 10 text

Explicit Schema ID int Name text Weight uint 1 2 3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }

Slide 11

Slide 11 text

Silent Failure { "id": 342, "name": "David", "weight": 74, } { "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }

Slide 12

Slide 12 text

Continuous deployment It's 11pm. Do you know where your locks are?

Slide 13

Slide 13 text

Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type changes

Slide 14

Slide 14 text

Hardware choice ZOMG RUN IT ON THE CLOUD

Slide 15

Slide 15 text

VMs are TERRIBLE at IO Up to 10x slowdown, even with VT-d.

Slide 16

Slide 16 text

Memory is king Your database loves it. Don't let other apps steal it.

Slide 17

Slide 17 text

Adding more power goes far Especially with PostgreSQL or read-only replicas

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Sharding point Vertical split Consistency leeway

Slide 20

Slide 20 text

Sharding point Datasets paritioned by primary key

Slide 21

Slide 21 text

Migration plan Implement consistent hashing on primary key Make large number of logical shards (2048?) Map logical shards to single physical shard Migrate shards using replication

Slide 22

Slide 22 text

Vertical split Entirely unrelated tables

Slide 23

Slide 23 text

Migration plan Replicate database to new server Route split tables there, disable replication - or - Slowly backfill new datastore with fallback lookup

Slide 24

Slide 24 text

Denormalisation It's not free!

Slide 25

Slide 25 text

Migration plan Add NULL fields to dependent tables App code to fetch and fill if not present Possibly prefill on save of new items

Slide 26

Slide 26 text

Consistency leeway Can you take inconsistent views?

Slide 27

Slide 27 text

Migration plan Change your site! Talk to your designers! Deliberately introduce inconsistency!

Slide 28

Slide 28 text

Big Data isn't one thing It depends on type, size, complexity, throughput, latency...

Slide 29

Slide 29 text

Focus on the current problems Future problems don't matter if you never get there

Slide 30

Slide 30 text

Efficiency and iterating fast matters The smaller you are, the more time is worth