Time series data is the worst and best use case in distributed databases

Time series data is the worst and best use case in distributed databases

My talk from 2015 dotScale in Paris. Some lessons we've learned building InfluxDB, a distributed time series database

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

June 08, 2015
Tweet

Transcript

  1. Time series data is the worst and best use case

    in distributed databases Paul Dix CEO @InfluxDB @pauldix paul@influxdb.com
  2. What is time series data?

  3. Stock trades and quotes

  4. Metrics

  5. Analytics

  6. Events

  7. Sensor data

  8. Two kinds of time series data…

  9. Regular time series t0 t1 t2 t3 t4 t6 t7

    Samples at regular intervals
  10. Irregular time series t0 t1 t2 t3 t4 t6 t7

    Events whenever they come in
  11. Inducing a regular time series from an irregular one query:

    select count(customer_id) from events where time > now() - 1h group by time(1m), customer_id
  12. Data that you ask questions about over time

  13. None
  14. 1. Databases

  15. 2. Distributed Systems

  16. Access properties suck for databases

  17. High write throughput

  18. Example from DevOps • 2,000 servers, VMs, containers, or sensor

    units • 200 measurements per server/unit • every 10 seconds • = 3,456,000,000 distinct points per day
  19. Use LSM Tree, optimized for writes!

  20. Even higher read throughput

  21. Aggregation and downsampling

  22. Queries for dashboards

  23. Queries for monitoring systems

  24. LSM Tree optimized for writes

  25. Use COW B+Tree, it’s optimized for reads!

  26. Write throughput goes to hell

  27. No compression

  28. Large scale deletes

  29. Aggregate, down-sample and phase out raw data

  30. If clearing out point-by-point # of deletes = # of

    writes
  31. LSM Tree deletes are wildly expensive

  32. COW B+Tree deletes expensive if we want to reclaim disk

  33. No perfect storage engine for these properties

  34. Time series data + databases = great sadness

  35. Access properties suck for distributed systems

  36. Range scans of many keys

  37. series: cpu region=uswest, host=serverA

  38. series: cpu region=uswest, host=serverA query: select max(value) from cpu where

    time > now() - 6h group by time(5m)
  39. series: cpu region=uswest, host=serverA query: select max(value) from cpu where

    region = ‘uswest’ AND time > now() - 6h group by time(5m) Series from all hosts from uswest merged into one
  40. How to distribute the data?

  41. By measurement? cpu

  42. By measurement? cpu BOTTLENECK

  43. By measurement + tags? cpu region=uswest, host=serverA

  44. By measurement + tags? cpu region=uswest, host=serverA SERIES GROWS INDEFINITELY

  45. By measurement + tags, time? cpu region=uswest, host=serverA, time

  46. By measurement + tags, time? cpu region=uswest, host=serverA, time WHICH

    TIMES/KEYS EXIST?
  47. By measurement + tags, time? cpu region=uswest, host=serverA, time NO

    DATA LOCALITY
  48. High throughput

  49. CAP Theorem

  50. CAP Theorem C: Consistency

  51. CAP Theorem C: Consistency A: Availability

  52. CAP Theorem C: Consistency A: Availability P: In the face

    of Partitions
  53. Pick either C or A

  54. P is happening whether you have perfect network hardware or

    not
  55. Pauses under load look like partitions

  56. High throughput = load

  57. Consistency under high write throughput

  58. Time series queries do range scans of recent data that

    is always moving
  59. Some sensors sample many times per second

  60. Event streams can be even more frequent

  61. Consistent view?

  62. None
  63. Time series data + distributed systems = great sadness

  64. but…

  65. Time series data has great properties for databases

  66. No updates

  67. Large ranges cold for writes

  68. Immutable data structures and files

  69. Like LSM, but more specific

  70. Deletes mostly against ranges of old data

  71. We partition data by ranges of time e.g. all data

    for a day or hour together
  72. Drop entire files

  73. Tombstone the one-offs

  74. New storage engine

  75. Great properties for distributed systems

  76. No updates

  77. Large scale deletes on cold areas of keyspace

  78. Perfect for an AP system

  79. Conflict resolution made easy i.e. no updates = no contention

  80. Partition key space by ranges of time i.e. old data

    vs. new
  81. Old data generally doesn’t change

  82. Consistent view on new data is the union

  83. Deletes against ranges that are cold for writes and queries

  84. Cluster growth to increase storage capacity doesn’t require rebalancing

  85. Data locality i.e. how we ship the code to where

    the data lives when scanning large ranges of data
  86. Evenly distribute across cluster, per day cpu region=uswest, host=serverA cpu

    region=uswest, host=serverB cpu region=useast, host=serverC cpu region=useast, host=serverD Shard 1 Shard 1 Shard 2 Shard 2
  87. Each shard lives on a server and # of replicas

  88. Hits one shard query: select mean(value) from cpu where region

    = ‘uswest’ AND host = ‘serverB’ AND time > now() - 6h group by time(5m)
  89. Decompose into map/reduce job query: select mean(value) from cpu where

    region = ‘uswest’ AND time > now() - 6h group by time(5m) Many series match this criteria, many shards to query
  90. func MapMean(itr Iterator) interface{} { out := &meanMapOutput{} for _,

    k, v := itr.Next(); k != 0; _, k, v = itr.Next() { out.Count++ out.Mean += (v.(float64) - out.Mean) / float64(out.Count) } if out.Count > 0 { return out } return nil }
  91. func ReduceMean(values []interface{}) interface{} { out := &meanMapOutput{} var countSum

    int for _, v := range values { if v == nil { continue } val := v.(*meanMapOutput) countSum = out.Count + val.Count out.Mean = val.Mean*(float64(val.Count)/float64(countSum)) + out.Mean*(float64(out.Count)/float64(countSum)) out.Count = countSum } if out.Count > 0 { return out.Mean } return nil }
  92. We only transmit the summary ticks across the cluster one

    per 5 minute interval
  93. there will be more…

  94. Time series data has odd workloads

  95. High write and read throughput

  96. Append/insert only

  97. Deletes against large ranges

  98. Horrible and great for distributed databases

  99. Thank you. Paul Dix paul@influxdb.com @pauldix