Slide 1

Slide 1 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Danilo Poccia, Chief Evangelist (EMEA) @danilop Managing Time-Series on AWS

Slide 2

Slide 2 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Time-series data Time-series data is a sequence of data points recorded over a time interval for measuring events that change over time

Slide 3

Slide 3 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is time-series data? 85 86 87 88 89 90 91 92 93 94 95 Humidity % WATER VAPOR 91.0 94.0 86.0 93.0 5:28:15 PM 5:28:30 PM 5:28:45 PM 5:29:05 PM

Slide 4

Slide 4 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Time-series use cases IoT Applications Collect motion or temperature data from the device sensors, interpolate to identify the time ranges without motion, or alert consumers to take actions such as turning off the lights to save energy DevOps Analysis Collect and analyze performance and health metrics such as CPU/memory utilization, network data, and IOPS to monitor health and optimize instance usage. App Analytics Easily store and analyze clickstream data at scale to understand the customer journey—the user activity across your applications over a period of time.

Slide 5

Slide 5 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream Fast, scalable, and serverless time-series database Purpose built for time-series data Built-in analytics using standard SQL with added interpolation and smoothing functions to identify trends, patterns, and anomalies Serverless and easy to use No servers to manage or instances to provision; software patches, indexes, and database optimizations are handled automatically Performance at scale Capable of ingesting trillions of events daily; the adaptive SQL query engine provides rapid point-in-time queries with its in-memory store, and fast analytical queries through its magnetic store Cost effective Reduces costs by simplifying the complex process of data lifecycle management; pay only for what you ingest, store and query Secure from the ground up All data is encrypted inflight, and at rest using AWS Key Management System (KMS) with customer managed keys (CMK)

Slide 6

Slide 6 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural concepts

Slide 7

Slide 7 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2020 GA Present Day Continuous releases No maintenance or downtime serverless architecture

Slide 8

Slide 8 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Terminology and concepts: Tables Encrypted container that holds records No data definition or columns are specified at creation Time based data retention policies for controlling data lifecycle within storage tiers

Slide 9

Slide 9 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Terminology and concepts: Storage tiers Two storage tiers: in-memory and magnetic In-memory tier • Handles the ingestion of all data • Automatically handles data deduplication • Optimized for latency sensitive point-in-time queries Magnetic disk tier • Optimized for high performance analytical queries • Cost effective long-term storage

Slide 10

Slide 10 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Terminology and concepts: Dimensions Are a set of attributes that uniquely describe a measurement Each table allows up to 128 unique dimensions All dimensions are represented as varchars Dimensions are dynamically added to the table during ingestion

Slide 11

Slide 11 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 Example: Dimensions in Amazon Timestream

Slide 12

Slide 12 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Terminology and concepts: Measures Each Amazon Timestream record contains a single measurement comprised of a name and value Each table supports up to 1024 unique measure names Measurements support: boolean, bigint, double, and varchar Measures are dynamically added to the table during ingestion

Slide 13

Slide 13 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 Example: Measures in Amazon Timestream

Slide 14

Slide 14 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Terminology and concepts: Time-series Sequence of records that are represented as data points over a time interval for given measurement Every record within the series is comprised of a timestamp, at least one dimensions and an associated measure name/value pair A time-series object can be constructed by using built-in time- series functions Missing data points within a time-series object can be filled with interpolation functions such as last-observation-carry-forward

Slide 15

Slide 15 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 Example: Time-series in Amazon Timestream time region az vpc hostname measure_name measure_value::double measure_value::bigint 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 35.0 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 38.2 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju cpu_utilization 45.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 54.9 null 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 42.6 null 2020-06-17 19:00:02.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju mem_utilization 33.3 null 2020-06-17 19:00:00.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 30000 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200 2020-06-17 19:00:01.000000000 us-east-1 1d vpc-1a2b3c4d host-24Gju networks_bytes null 15200

Slide 16

Slide 16 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Characteristics of Amazon Timestream data • All records require a timestamp, one or more dimensions, a measurement name and measurement value • Records cannot be deleted or updated • Records are only removed when they reach the retention limit within the magnetic tier (indefinite storage is an option) • First writer wins semantics for handling duplicates • Multiple measures are logically represented as multiple individual records (one measure per record) • Automatically scales to handle highspeed real-time data ingestion

Slide 17

Slide 17 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Ingestion

Slide 18

Slide 18 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data is written using the AWS SDK • Java, Python, Golang, Node.js, .NET, etc. • AWS CLI Connectivity: Data ingestion Adapters and plugins • AWS IoT Core • Amazon Kinesis Data Analytics for Apache Flink connector (GitHub) • Telegraf connector (GitHub)

Slide 19

Slide 19 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Amazon Timestream Ingestion (Python) common_attributes = { 'Dimensions': dimensions, 'MeasureValueType': 'DOUBLE', 'Time': current_timestamp } cpu_utilization = { 'MeasureName': 'cpu_utilization', 'MeasureValue': cpu_measurement } memory_utilization = { 'MeasureName': 'memory_utilization', 'MeasureValue': memory_measurement } records = [cpu_utilization, memory_utilization] result = self.client.write_records(DatabaseName=DATABASE_NAME, TableName=TABLE_NAME, Records=records, CommonAttributes=common_attributes)

Slide 20

Slide 20 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query processing

Slide 21

Slide 21 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Connectivity: Querying (Mostly) ANSI-2003 SQL for querying • Time-series, interpolation and gap filling functions • 250+ scalar, aggregate and windowing functions No proprietary query language to learn Data is queried using the AWS SDK • Java, Python, Golang, Node.js, .NET, etc. AWS CLI JDBC Driver

Slide 22

Slide 22 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Amazon Timestream query SELECT region, az, hostname, bin(time, 15s) AS binned_timestamp, round(avg(measure_value::double), 2) AS avg_cpu_utilization, round(approx_percentile(measure_value::double, 0.9), 2) AS p90_cpu_utilization, round(approx_percentile(measure_value::double, 0.95), 2) AS p95_cpu_utilization, round(approx_percentile(measure_value::double, 0.99), 2) AS p99_cpu_utilization FROM devops.host_metrics WHERE measure_name = 'cpu_utilization' -- Predicate on measure_name AND time > ago(2h) -- Predicate on time AND hostname = 'host-24Gju' -- Optional predicates on other dimensions GROUP BY region, hostname, az, bin(time, 15s) -- bin and GROUP BY time ORDER BY binned_timestamp ASC

Slide 23

Slide 23 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream integrations

Slide 24

Slide 24 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supported integrations and drivers Amazon QuickSight AWS IoT Core Grafana (Open Source Edition) Database Tools via JDBC • SQL Workbench/J, DataGrip, DBVisualizer, etc. AWSLabs (GitHub) • Kinesis Data Analytics for Apache Flink connector • Telegraf connector • Amazon SageMaker Notebook example

Slide 25

Slide 25 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream + JDBC

Slide 26

Slide 26 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream + Grafana

Slide 27

Slide 27 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream + Grafana

Slide 28

Slide 28 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream + Amazon QuickSight

Slide 29

Slide 29 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics with Timestream

Slide 30

Slide 30 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. IoT with Timestream

Slide 31

Slide 31 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. DevOps with Timestream

Slide 32

Slide 32 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 33

Slide 33 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream Server #2 Server #3 Server #1 cpu memory swap disk

Slide 34

Slide 34 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. SELECT country, city, hostname, bin(time, 10m) AS binned_time, avg(measure_value::double) AS avg_cpu_utilization FROM MyDatabase.MyTable WHERE measure_name = 'cpu_utilization' AND time > ago(2h) GROUP BY country, city, hostname, bin(time, 10m) ORDER BY country, city, hostname, binned_time

Slide 35

Slide 35 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. SELECT country, city, hostname, CREATE_TIME_SERIES(time, measure_value::double) AS cpu_utilization FROM MyDatabase.MyTable WHERE measure_name='cpu_utilization' AND time > ago(1m) GROUP BY country, city, hostname

Slide 36

Slide 36 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. WITH binned_timeseries AS ( SELECT country, city, hostname, bin(time, 10m) AS binned_time, avg(measure_value::double) AS avg_cpu_utilization FROM MyDatabase.MyTable WHERE measure_name = 'cpu_utilization' AND time > ago(2h) GROUP BY country, city, hostname, bin(time, 10m) ), interpolated_timeseries AS ( SELECT country, city, hostname, INTERPOLATE_LINEAR( CREATE_TIME_SERIES(binned_time, avg_cpu_utilization), SEQUENCE(min(binned_time), max(binned_time), 1m) ) AS interpolated_avg_cpu_utilization FROM binned_timeseries GROUP BY country, city, hostname ) SELECT * FROM interpolated_timeseries

Slide 37

Slide 37 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. WITH binned_timeseries AS ( SELECT country, city, hostname, bin(time, 10m) AS binned_time, avg(measure_value::double) AS avg_cpu_utilization FROM MyDatabase.MyTable WHERE measure_name = 'cpu_utilization' AND time > ago(2h) GROUP BY country, city, hostname, bin(time, 10m) ), interpolated_timeseries AS ( SELECT country, city, hostname, INTERPOLATE_LINEAR( CREATE_TIME_SERIES(binned_time, avg_cpu_utilization), SEQUENCE(min(binned_time), max(binned_time), 1m) ) AS interpolated_avg_cpu_utilization FROM binned_timeseries GROUP BY country, city, hostname ) SELECT country, city, hostname, time, round(avg(value), 2) AS interpolated_cpu FROM interpolated_timeseries CROSS JOIN UNNEST(interpolated_avg_cpu_utilization) GROUP BY country, city, hostname, time

Slide 38

Slide 38 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Additional Resources

Slide 39

Slide 39 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Additional Resources Store and Access Time Series Data at Any Scale with Amazon Timestream https://aws.amazon.com/blogs/aws/store-and-access-time-series-data-at-any-scale-with-amazon-timestream-now-generally-available/ Amazon Timestream Tools and Samples https://github.com/awslabs/amazon-timestream-tools

Slide 40

Slide 40 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Timestream Tools and Samples Sample Applications (available in various languages): https://github.com/awslabs/amazon-timestream-tools/tree/master/sample_apps Kinesis Data Analytics for Apache Flink connector example: https://github.com/awslabs/amazon-timestream-tools/tree/master/integrations/flink_connector SageMaker example: https://github.com/awslabs/amazon-timestream-tools/tree/master/integrations/sagemaker Telegraf example: https://github.com/awslabs/amazon-timestream-tools/tree/master/integrations/telegraf Continuous data-generator tools: https://github.com/awslabs/amazon-timestream-tools/tree/master/tools/continuous_data_ingestor

Slide 41

Slide 41 text

©, 2020 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Please give your feedback! J @danilop