Slide 1

Slide 1 text

Time series data is the worst and best use case in distributed databases Paul Dix CEO @InfluxDB @pauldix paul@influxdb.com

Slide 2

Slide 2 text

What is time series data?

Slide 3

Slide 3 text

Stock trades and quotes

Slide 4

Slide 4 text

Metrics

Slide 5

Slide 5 text

Analytics

Slide 6

Slide 6 text

Events

Slide 7

Slide 7 text

Sensor data

Slide 8

Slide 8 text

Two kinds of time series data…

Slide 9

Slide 9 text

Regular time series t0 t1 t2 t3 t4 t6 t7 Samples at regular intervals

Slide 10

Slide 10 text

Irregular time series t0 t1 t2 t3 t4 t6 t7 Events whenever they come in

Slide 11

Slide 11 text

Inducing a regular time series from an irregular one query: select count(customer_id) from events where time > now() - 1h group by time(1m), customer_id

Slide 12

Slide 12 text

Data that you ask questions about over time

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

1. Databases

Slide 15

Slide 15 text

2. Distributed Systems

Slide 16

Slide 16 text

Access properties suck for databases

Slide 17

Slide 17 text

High write throughput

Slide 18

Slide 18 text

Example from DevOps • 2,000 servers, VMs, containers, or sensor units • 200 measurements per server/unit • every 10 seconds • = 3,456,000,000 distinct points per day

Slide 19

Slide 19 text

Use LSM Tree, optimized for writes!

Slide 20

Slide 20 text

Even higher read throughput

Slide 21

Slide 21 text

Aggregation and downsampling

Slide 22

Slide 22 text

Queries for dashboards

Slide 23

Slide 23 text

Queries for monitoring systems

Slide 24

Slide 24 text

LSM Tree optimized for writes

Slide 25

Slide 25 text

Use COW B+Tree, it’s optimized for reads!

Slide 26

Slide 26 text

Write throughput goes to hell

Slide 27

Slide 27 text

No compression

Slide 28

Slide 28 text

Large scale deletes

Slide 29

Slide 29 text

Aggregate, down-sample and phase out raw data

Slide 30

Slide 30 text

If clearing out point-by-point # of deletes = # of writes

Slide 31

Slide 31 text

LSM Tree deletes are wildly expensive

Slide 32

Slide 32 text

COW B+Tree deletes expensive if we want to reclaim disk

Slide 33

Slide 33 text

No perfect storage engine for these properties

Slide 34

Slide 34 text

Time series data + databases = great sadness

Slide 35

Slide 35 text

Access properties suck for distributed systems

Slide 36

Slide 36 text

Range scans of many keys

Slide 37

Slide 37 text

series: cpu region=uswest, host=serverA

Slide 38

Slide 38 text

series: cpu region=uswest, host=serverA query: select max(value) from cpu where time > now() - 6h group by time(5m)

Slide 39

Slide 39 text

series: cpu region=uswest, host=serverA query: select max(value) from cpu where region = ‘uswest’ AND time > now() - 6h group by time(5m) Series from all hosts from uswest merged into one

Slide 40

Slide 40 text

How to distribute the data?

Slide 41

Slide 41 text

By measurement? cpu

Slide 42

Slide 42 text

By measurement? cpu BOTTLENECK

Slide 43

Slide 43 text

By measurement + tags? cpu region=uswest, host=serverA

Slide 44

Slide 44 text

By measurement + tags? cpu region=uswest, host=serverA SERIES GROWS INDEFINITELY

Slide 45

Slide 45 text

By measurement + tags, time? cpu region=uswest, host=serverA, time

Slide 46

Slide 46 text

By measurement + tags, time? cpu region=uswest, host=serverA, time WHICH TIMES/KEYS EXIST?

Slide 47

Slide 47 text

By measurement + tags, time? cpu region=uswest, host=serverA, time NO DATA LOCALITY

Slide 48

Slide 48 text

High throughput

Slide 49

Slide 49 text

CAP Theorem

Slide 50

Slide 50 text

CAP Theorem C: Consistency

Slide 51

Slide 51 text

CAP Theorem C: Consistency A: Availability

Slide 52

Slide 52 text

CAP Theorem C: Consistency A: Availability P: In the face of Partitions

Slide 53

Slide 53 text

Pick either C or A

Slide 54

Slide 54 text

P is happening whether you have perfect network hardware or not

Slide 55

Slide 55 text

Pauses under load look like partitions

Slide 56

Slide 56 text

High throughput = load

Slide 57

Slide 57 text

Consistency under high write throughput

Slide 58

Slide 58 text

Time series queries do range scans of recent data that is always moving

Slide 59

Slide 59 text

Some sensors sample many times per second

Slide 60

Slide 60 text

Event streams can be even more frequent

Slide 61

Slide 61 text

Consistent view?

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

Time series data + distributed systems = great sadness

Slide 64

Slide 64 text

but…

Slide 65

Slide 65 text

Time series data has great properties for databases

Slide 66

Slide 66 text

No updates

Slide 67

Slide 67 text

Large ranges cold for writes

Slide 68

Slide 68 text

Immutable data structures and files

Slide 69

Slide 69 text

Like LSM, but more specific

Slide 70

Slide 70 text

Deletes mostly against ranges of old data

Slide 71

Slide 71 text

We partition data by ranges of time e.g. all data for a day or hour together

Slide 72

Slide 72 text

Drop entire files

Slide 73

Slide 73 text

Tombstone the one-offs

Slide 74

Slide 74 text

New storage engine

Slide 75

Slide 75 text

Great properties for distributed systems

Slide 76

Slide 76 text

No updates

Slide 77

Slide 77 text

Large scale deletes on cold areas of keyspace

Slide 78

Slide 78 text

Perfect for an AP system

Slide 79

Slide 79 text

Conflict resolution made easy i.e. no updates = no contention

Slide 80

Slide 80 text

Partition key space by ranges of time i.e. old data vs. new

Slide 81

Slide 81 text

Old data generally doesn’t change

Slide 82

Slide 82 text

Consistent view on new data is the union

Slide 83

Slide 83 text

Deletes against ranges that are cold for writes and queries

Slide 84

Slide 84 text

Cluster growth to increase storage capacity doesn’t require rebalancing

Slide 85

Slide 85 text

Data locality i.e. how we ship the code to where the data lives when scanning large ranges of data

Slide 86

Slide 86 text

Evenly distribute across cluster, per day cpu region=uswest, host=serverA cpu region=uswest, host=serverB cpu region=useast, host=serverC cpu region=useast, host=serverD Shard 1 Shard 1 Shard 2 Shard 2

Slide 87

Slide 87 text

Each shard lives on a server and # of replicas

Slide 88

Slide 88 text

Hits one shard query: select mean(value) from cpu where region = ‘uswest’ AND host = ‘serverB’ AND time > now() - 6h group by time(5m)

Slide 89

Slide 89 text

Decompose into map/reduce job query: select mean(value) from cpu where region = ‘uswest’ AND time > now() - 6h group by time(5m) Many series match this criteria, many shards to query

Slide 90

Slide 90 text

func MapMean(itr Iterator) interface{} { out := &meanMapOutput{} for _, k, v := itr.Next(); k != 0; _, k, v = itr.Next() { out.Count++ out.Mean += (v.(float64) - out.Mean) / float64(out.Count) } if out.Count > 0 { return out } return nil }

Slide 91

Slide 91 text

func ReduceMean(values []interface{}) interface{} { out := &meanMapOutput{} var countSum int for _, v := range values { if v == nil { continue } val := v.(*meanMapOutput) countSum = out.Count + val.Count out.Mean = val.Mean*(float64(val.Count)/float64(countSum)) + out.Mean*(float64(out.Count)/float64(countSum)) out.Count = countSum } if out.Count > 0 { return out.Mean } return nil }

Slide 92

Slide 92 text

We only transmit the summary ticks across the cluster one per 5 minute interval

Slide 93

Slide 93 text

there will be more…

Slide 94

Slide 94 text

Time series data has odd workloads

Slide 95

Slide 95 text

High write and read throughput

Slide 96

Slide 96 text

Append/insert only

Slide 97

Slide 97 text

Deletes against large ranges

Slide 98

Slide 98 text

Horrible and great for distributed databases

Slide 99

Slide 99 text

Thank you. Paul Dix paul@influxdb.com @pauldix