Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analysing Data in Real Time

Analysing Data in Real Time

"Analysing Data in Real Time" as presented at AWS Summit Amsterdam 2019

Julio Faerman

April 17, 2019
Tweet

More Decks by Julio Faerman

Other Decks in Technology

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Analysing Data in Real-time Julio Faerman @faermanj AWS Technical Evangelist
  2. Timely decisions require new data and fast Source: Perishable insights,

    Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  3. Most common uses of streaming Industrial Automation Smart Home Smart

    City Data Lakes IoT Analytics Log Analytics
  4. 6 CUSTOMER EVENT STORE INTRODUCTION Charles van Kints Product Owner

    Customer Event Store ABN AMRO Abhishek Choudhary Lead Development Engineer Customer Event Store ABN AMRO
  5. 8 Slow batch driven processing Complexity in connecting to external

    sources THE PROBLEM: OUR CHALLENGES ON PROCESSING EVENT DATA Events: Interactions (touching points) of (potential) customers towards ABN AMRO, throughout devices, across channels Error prone process: bad records Huge increase in volumes of data Fast changing sources Limitations in consuming capabilities Diversity in data from different sources CUSTOMER EVENT STORE: WHY
  6. 9 CUSTOMER EVENT STORE: WHY PROBLEM STATEMENT “Not being able

    to handle important events in the life of the customer that impact their relation with ABN AMRO in an adequate way.” Bernard Faber Solution Architect, ABN AMRO “Strong increase of the digitalized touching points with our customers (called events), from a growing number of sources.” Charles Van Kints Product Owner, ABN AMRO “The continuous growth in event data sources and volume, the increasing demand towards using event data and the current solution within the Marketing Intelligence data warehouse.” Peter Kromhout Engineering Lead, ABN AMRO
  7. 10 KEY FEATURES Handle Changes Instantly & Metadata Driven Large

    Volumes Consuming Capabilities Customer Interactions Real – Time Future State Customer Event Store Building insights in the customer behaviour, customer journey and customer interactions with ABN AMRO in order to be able to act Personal and Relevant. CUSTOMER EVENT STORE : WHAT
  8. 11 JOURNEY SO FAR… ü Approval – License to Public

    ü Prepare for Go-Live March – 2018 Prototype ü Develop Prototype. ü Initiate License to Public April – 2018 Technical Go - Live ü Product Stack deployed ü 2 Sources Live ü Tune product for Business Go-Live Business Go – Live ü Add new sources ü Consuming Capabilities ü Enable data usage Approach ü Successful prototype ü Co-creation – Business & IT ü 2 Event Sources Go! August – 2018 September – 2018 December – 2018 CUSTOMER EVENT STORE: WHEN
  9. 12 CUSTOMER EVENT STORE: HOW CONCEPTUAL DESIGN Internal sources RDS

    Landing Zone External sources Access Mngmt Metadata Validation Streaming End-Point REST API Alerting Standardisation ETL Metadata ETL Streaming Batch Metadata Metadata Real-Time processing Batch processing EVENT STORE Profile Information Stitching Pre-Processing Lineage Orchestration Monitoring Access Management Stream Streaming Zone
  10. 13 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE : STREAM &

    BATCH Batch Bucket Nano - Batch Bucket EMR Glue Step Function Lambda SNS Enterprise Raw Data Store Auto-Scaling Group Snowplow Collector Fargate Fargate
  11. 14 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE: ONE PROCESS –

    STREAM & BATCH Nano - Batch Bucket Auto-Scaling Group Snowplow Collector Fargate Auto-Scaling Group Snowplow Enricher Fargate Kinesis Data Stream – Raw Kinesis Data Stream – Good Kinesis Data Stream – Bad Kinesis Data Firehose Schema Bucket Bad Events Bucket Enterprise Raw Data Store Batch Bucket
  12. 15 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE: STANDARDIZE Enterprise Raw

    Data Store Auto-Scaling Group Snowplow Collector Fargate Auto-Scaling Group Snowplow Enricher Fargate Kinesis Data Stream – Raw Kinesis Data Stream – Good Kinesis Data Stream – Bad CloudWatch Kinesis Data Stream – Standardized Kinesis Data Firehose - ORC Kinesis Data Firehose - JSON Standard Bucket Glue Crawler Athena DynamoDB Alarm Alarm Rule Schema Bucket
  13. 17 CUSTOMER EVENT STORE: SUMMARY AWS STEP FUNCTIONS 2018 Analysis

    Complex workflows involving iteration of Lambda functions can be implemented quickly. Complex Workflows Clear intermediate results. Debug Friendly New state machine can be created for only the failed states.. Restart – ability Preserves state between subsequent API calls. State Management Lambda, Glue, ECS, SageMaker. Serverless Orchestration Retrials can be triggered for specific errors. Other actions can also be configured. Error Handling
  14. 18 CUSTOMER EVENT STORE: SUMMARY WHEN TO USE GLUE AND/OR

    EMR Can be placed in custom VPC Horizontally scalable Server less & Pre-configured Limited customization Fully Managed Public Service Define Cluster – Choose Applications & Customize as you wish Vertically & horizontally scalable More actions than just SPARK AWS EMR AWS Glue Spin-up time Only Spark
  15. 19 KEY TAKE-AWAY Dev-Ops Security by Design Architecture by Evolution

    One Process Serverless & Native Components Technical Vs Business Go – Live CUSTOMER EVENT STORE: SUMMARY
  16. Streaming with Amazon Kinesis Easily collect, process, and analyze video

    and data streams in real-time Capture, process, and store video streams Amazon Kinesis Video Streams Load data streams into data stores Amazon Kinesis Data Firehose SQL Analyze data streams with SQL Amazon Kinesis Data Analytics Capture, process, and store data streams Amazon Kinesis Data Streams
  17. Amazon Kinesis Data Streams producers and consumers Producers Consumers Kinesis

    Agent Apache Kafka AWS SDK LOG4J Flume Fluentd AWS Mobile SDK for iOS Amazon Kinesis Producer Library Get* APIs Amazon Kinesis Client Library + Connector Library Apache Storm Amazon EMR AWS Lambda Apache Spark Amazon Kinesis
  18. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  19. Data ingestion from a variety of sources Kinesis Data Streams

    Transactions ERP Web logs/ cookies Connected devices AWS SDKs • Publish directly from application code via APIs • AWS Mobile SDK • Managed AWS sources: CloudWatch Logs, AWS IoT, Kinesis Data Analytics and more • RDS Aurora via Lambda Kinesis Agent • Monitors log files and forwards lines as messages to Kinesis Data Streams Kinesis Producer Library (KPL) • Background process aggregates and batches messages 3rd party and open source • Log4j appender • Apache Kafka • Flume, fluentd, and more …
  20. Data processing from a variety of consumers Fully managed service

    for real-time processing of streaming data Cost-effective: $0.014 per 1,000,000 PUT Payload Units Millions of sources producing 100’s of terabytes per hour Amazon Web Services Front End AZ AZ AZ Authentic authorization Durable, highly consistent storage replicas data across three data centers (availability zones) Ordered stream of events supports multiple readers Amazon Kinesis Client Library on EC2 Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics AWS Lambda
  21. Amazon Kinesis Data Streams: Standard consumers Shard 1 Shard 2

    Shard 3 Shard n Kinesis Data Stream Consumer application A GetRecords() Data GetRecords(): Five transactions per second, per shard Data: 2MB per second, per shard Data producer up to 1 MB or 1000 records per second, per shard With only one consumer application, records can be retrieved every 200 ms
  22. Amazon Kinesis Data Streams: Enhanced fan-out consumers Consumers do not

    poll. Messages are pushed to the consumer as they arrive Shard 1 Kinesis Data Stream Data producer Consumer application A SubscribeToShard() Uses HTTP/2 • Up to five mins connection • Data pushed to consumer persist
  23. Enhanced fan-out • Multiple consumer applications for the same Kinesis

    Data Stream • Default limit of five registered consuming applications. More can be supported with a service limit increase request • Low-latency requirements for data processing • Messages are typically delivered to a consumer in less than 70 ms Amazon Kinesis Data Streams Consumers Standard • Total number of consuming applications is low • Consumers are not latency- sensitive • Minimize cost
  24. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  25. Amazon Kinesis Data Firehose—How it works Ingest Transform Deliver Amazon

    S3 Amazon Redshift Amazon Elasticsearch Service AWS IoT Amazon Kinesis Agent Amazon Kinesis Streams Amazon CloudWatch Logs Amazon CloudWatch Events Apache Kafka
  26. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  27. SQL on streaming data? Aggregations (count, sum, min, … )

    take granular real-time data and turn it into insights Data is continuously processed so you need to tell the application when you want results Aggregation Windows
  28. Window types Sliding, tumbling, and stagger Tumbling windows are fixed

    size and grouped keys do not overlap Source Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
  29. Writing streaming SQL Pump (continuous query) using stagger window CREATE

    OR REPLACE PUMP calls_per_ip_pump AS INSERT INTO calls_per_ip_stream SELECT STREAM source_ip_address, COUNT(*) FROM source_sql_stream_001 WINDOWED BY STAGGER( PARTITION BY source_ip_address RANGE INTERVAL '1' MINUTE);
  30. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  31. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Traditional methods struggle with real-world forecasting complications Can’t handle seasonality
  32. N E W ! Amazon Forecast Any historical time-series Integrates

    with SAP and Oracle Supply Chain Custom forecasts with 3 clicks 50% more accurate 1/10th the cost Integrates with Amazon Timestream Retail demand Travel demand AWS usage Revenue forecasts Web traffic Advertising demand Generate forecasts for: Accurate time-series forecasting service, based on the same technology used at Amazon.com
  33. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  34. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Invoking lambda functions
  35. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  36. Thank you! S U M M I T © 2019,

    Amazon Web Services, Inc. or its affiliates. All rights reserved. Julio Faerman @faermanj