Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Time Series (Software Circus, April 2019)

Introduction to Time Series (Software Circus, April 2019)

Where you at this talk? Feedback can be left:


Time-Series has been the fastest growing database category, rated, by DBEngines, for over 2 years; yet, less than 15% store their time-series data in a time-series database. Do you?

One could, accurately, say that time-series data is as old as the universe; but it wasn't until the mid-19th century that the first article was published on the concept: A Comparison of the Fluctuations in the Price of Wheat and in the Cotton and Silk Imports into Great Britain by J. H. Poynting (March 1884).

Time-Series data is so natural and common that you actually consume, evaluate, and utilise it everyday; when you're:

- Paying for your morning coffee
- Sighing at the "Delayed" notice on your commute
- Hugging your coffee mug as you process your email inbox

In this talk we will look at the different types of time-series data and how to use that to drive observations, understanding, and automation.

"All data becomes an order of magnitude more interesting on the time dimension" - Lets see why.

David McKay

April 25, 2019

More Decks by David McKay

Other Decks in Technology


  1. Goedenavond David McKay @rawkode Developer Relations Manager @InfluxDB | #InfluxDB

    Software Circus
  2. Introduction to Time Series

  3. Before we begin …

  4. Pop Quiz “Invented” When?

  5. Encoding First Used … ? First Used 410 BC

  6. Encoding “Documented” in The Lives of the Noble Grecians and

    Romans, by Roman historian Plutarch.
  7. Alcibiades suddenly raised the Athenian ensign in the admiral shop,

    and fell upon those galleys of the Peloponnesians …
  8. In the 14th century, things hadn’t actually advanced much more.

    The Black Book of Admiralty listed 2 signals: 1 flag or 2 flags Encoding
  9. By the 15th century there were 15 flags, each with

    a single meaning. Encoding
  10. Finally, in the late 17th century; a French system existed

    (Mahé de la Bourdonnais) with 10 coloured flags, representing 0-9 Encoding
  11. Sharding First Used … ? First Used 150 BC

  12. Sharding First “documented” example was in ~150 AD, invented and

    described by Polybius.
  13. We take the alphabet and divide it into five parts,

    each consisting of five letters.
  14. None
  15. None
  16. History of Time Series

  17. The earliest form of a company which issued public shares

    was the case of the publicani during the Roman Republic. The Romans Did It
  18. Like modern joint-stock companies, the publicani were legal bodies independent

    of their members whose ownership was divided into shares, or partes. There is evidence that these shares were sold to public investors and traded in a type of over-the-counter market in the Forum, near the Temple of Castor and Pollux. The shares fluctuated in value, encouraging the activity of speculators, or quaestors.
  19. In 1602 … First IPO: Dutch East India Company

  20. In 1873 … First US IPO: Bank of North America

  21. In 1884 … What was the price of wheat?

  22. A Comparison of the Fluctuations in the Price of Wheat

    and in the Cotton and Silk Imports into Great Britain First Documented Time Series J. H. Poynting Journal of the Statistical Society of London Vol. 47, No. 1 (Mar., 1884), pp. 34-74
  23. What is all this? This is the first (or one

    of) paper that added the dimension of time to statistical mathematics
  24. All data becomes an order of magnitude more interesting on

    the time dimension @rawkode
  25. None
  26. Most data is best understood in the dimension of time

    @pauldix, CTO
  27. Introduction to Time Series Finally!

  28. What Will We Cover? ➔ Time Series Data ➔ Time

    Series Databases ➔ Getting to Know InfluxDB ➔ Value of Time Series Data ➔ Advancing Monitoring to Time Series
  29. Time Series Data What is it?

  30. Time Series Data Data with a timestamp

  31. None
  32. None
  33. None
  34. None
  35. None
  36. None
  37. What is Time Series Data?

  38. Irregular (Events) ➔ Unpredictable ➔ Inconsistent Intervals What is Time

    Series Data? Regular (Metrics) ➔ Predictable ➔ Evenly Distributed
  39. Regular / Metrics ★ CPU Usage ★ Memory Usage ★

    Ping Time for Google.com ★ Number of Processes
  40. Irregular / Events ★ User Clicked Login ★ Authentication Failed

    ★ CI Published v1.3.1 ★ Network Cable Unplugged
  41. None
  42. Collecting Metrics & Events With Prometheus Exporters or Telegraf

  43. Collecting Metrics & Events Inputs: ➔ CloudWatch ➔ Elasticsearch ➔

    Kafka ➔ Jenkins ➔ Kubernetes ➔ Linux ➔ Puppet ➔ Windows ➔ x509 Outputs: ➔ CloudWatch ➔ Kafka ➔ DataDog ➔ Elasticsearch ➔ Graphite ➔ Prometheus Exporters: ➔ Atlassian ➔ Ceph ➔ Consul ➔ Kubernetes ➔ Memcached ➔ MySQL
  44. None
  45. Push AND Pull Metrics are pulled at a regular interval

    Consistent and reliable intervals Events NEED to be pushed as they happen Inconsistent intervals
  46. Time Series Data Use Cases

  47. IoT / Sensor ➔ Thermostats ➔ Electric Engines ➔ Smart

    Things ➔ GPS ➔ Fitbits Real Time Analytics ➔ Website Tracking ➔ Stock Prices ➔ Currency Exchange Rates Use Cases for Time Series Monitoring ➔ Infrastructure ➔ Applications ➔ Third Party Services
  48. Time Series Databases TSDB’s

  49. Time Series databases are optimized for collecting, storing, retrieving, and

    processing of Time Series data. Time Series Databases
  50. ➔ High Write Frequency ➔ Reads are range scans ➔

    TTL / Lifecycle Management ➔ Time Sensitive Time Series Databases
  51. None
  52. 12% Are you in the 88%?

  53. None
  54. None
  55. None
  56. None
  57. None
  58. 13% It’s Not Too Late!

  59. None
  60. Disclaimer Most of this isn’t unique to InfluxDB

  61. InfluxDB Introductions

  62. InfluxDB ➔ TSDB ➔ Open-Source ➔ FullStack (Telegraf, InfluxDB, Chronograf,

    and Kapacitor) ➔ v2 …
  63. Points At any point in time, this value was N

  64. load,host=vm1 1m=6.32,5m=8.20,15m=9.55 123456789 Point • Series • Fields • Timestamp

  65. • load,host=vm1 • stock_price,market=NASDAQ,ticker=GOOG • users,service=comments Series • Name •

    Tag Keys • Tag Values
  66. Fields ➔ Not Indexed ➔ Multiple Data Types Tags &

    Fields Tags ➔ Indexed ➔ String Types
  67. Value of Time Series Data Isn’t It Valuable Forever?

  68. Resolution The predictable interval at which we will collect our

    time series data
  69. The value of all time series data is directly correlated

    with the resolution that the data is available Value of Time Series Data
  70. Cost of Time Series Data Wait, Isn’t It Free?!

  71. Example cpu,machine=abc1 usage=1.66 timestamp

  72. Resolution ➔ 1 Measurement ➔ 1 Series ➔ 1s Resolution

    86400 Points Per Day
  73. Resolution ➔ 1 Measurement ➔ 2 Series ➔ 1s Resolution

    172800 Points Per Day
  74. Resolution 4320000 Points Per Day ➔ 5 Measurement ➔ 10

    Series ➔ 1s Resolution
  75. Nasdaq 28512000 0000 Points Per Day ➔ 1 Measurement ➔

    3300 Series ➔ 1ms Resolution
  76. Nasdaq 4752000 Points Per Day ➔ 1 Measurement ➔ 3300

    Series ➔ 1m Resolution
  77. Nasdaq 79200 Points Per Day ➔ 1 Measurement ➔ 3300

    Series ➔ 1h Resolution
  78. Nasdaq 13200 Points Per Day ➔ 1 Measurement ➔ 3300

    Series ➔ 6h Resolution
  79. Rollups Lowering the Resolution

  80. Rollups with Continuous Queries CREATE CONTINUOUS QUERY "rollup_1h" ON "nasdaq"

    BEGIN SELECT mean(price) INTO yearly FROM weekly GROUP BY time(1h) END
  81. Events? Outlier / Anomaly Detection InfluxDB Anomaly Detection

  82. Advancing Monitoring to Time Series Taking Small Steps for Giant

  83. Application Database CPU > 80% MEM > 80% Response Time

    > 300ms Black Friday
  84. Application Database How do we know when to send a

    page to SRE / Ops? When the application fails the health-check
  85. Application Database How do we know when to send a

    page to SRE / Ops? When we get more than 100 [ 5xx | Exceptions ] within a 5 minute period Application Application
  86. Service A Database A Service B Service B Service C

    Database B Database C Virtual Network Service Mesh Canary Ummm?
  87. Cloud Native Architectures Convenience Vs. Cost You can treat the

    symptoms for a while … Upgrade Your Monitoring
  88. Causality Treating the Disease

  89. ➔ Look at last weeks, months, and years of data

    ➔ Use tags to build correlation ➔ Get Statistical ◆ INTEGRAL() ◆ LINEAR_PREDICTION() ◆ DERIVATIVE() ◆ MOVING_AVERAGE() ◆ HOLT_WINTERS() Causality
  90. Have you ever been paged at 4am because the disk

    usage of a machine went above 85%? Could this have been determined during office hours? (Linear Growth) Can we use correlations to determine the cause during anomalies? Causality
  91. In our distributed application, our p99 reports that our users

    are being served healthy responses in under 2ms. Our pager is going off because we’ve getting too many exceptions in the code SELECT mode(*) FROM logs; Causality
  92. We run Big News Corp and we need to reduce

    our cloud costs. Instead of running at 30% utilisation, can we run at 80% utilisation? HOLT_WINTERS Proactive Ops
  93. Build Automation Through Causality, Historical Data, Prediction, and ML

  94. ➔ Use a TSDB ➔ Understand Cost / Select Tags

    Wisely ➔ Understand the resolution you need for 1m, 6m, > 12m Summary ➔ Rollup metrics ➔ Perform outlier detection on events ➔ Build automation, dashboarding, and reporting around your data (past, present, and future)
  95. Proost David McKay @rawkode Developer Relations Manager @InfluxDB | #InfluxDB

    That’s All Folks!