Analysing Data in Real Time

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Timely decisions require new data and fast Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making

Slide 3

Slide 3 text

What is streaming data? Typical characteristics Low-latency Continuous Ordered, incremental High volume

Slide 4

Slide 4 text

Most common uses of streaming Industrial Automation Smart Home Smart City Data Lakes IoT Analytics Log Analytics

Slide 5

Slide 5 text

5 CUSTOMER EVENT STORE A JOURNEY FROM DATA WAREHOUSE TO STREAMING DATA

Slide 6

Slide 6 text

6 CUSTOMER EVENT STORE INTRODUCTION Charles van Kints Product Owner Customer Event Store ABN AMRO Abhishek Choudhary Lead Development Engineer Customer Event Store ABN AMRO

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

8 Slow batch driven processing Complexity in connecting to external sources THE PROBLEM: OUR CHALLENGES ON PROCESSING EVENT DATA Events: Interactions (touching points) of (potential) customers towards ABN AMRO, throughout devices, across channels Error prone process: bad records Huge increase in volumes of data Fast changing sources Limitations in consuming capabilities Diversity in data from different sources CUSTOMER EVENT STORE: WHY

Slide 9

Slide 9 text

9 CUSTOMER EVENT STORE: WHY PROBLEM STATEMENT “Not being able to handle important events in the life of the customer that impact their relation with ABN AMRO in an adequate way.” Bernard Faber Solution Architect, ABN AMRO “Strong increase of the digitalized touching points with our customers (called events), from a growing number of sources.” Charles Van Kints Product Owner, ABN AMRO “The continuous growth in event data sources and volume, the increasing demand towards using event data and the current solution within the Marketing Intelligence data warehouse.” Peter Kromhout Engineering Lead, ABN AMRO

Slide 10

Slide 10 text

10 KEY FEATURES Handle Changes Instantly & Metadata Driven Large Volumes Consuming Capabilities Customer Interactions Real – Time Future State Customer Event Store Building insights in the customer behaviour, customer journey and customer interactions with ABN AMRO in order to be able to act Personal and Relevant. CUSTOMER EVENT STORE : WHAT

Slide 11

Slide 11 text

11 JOURNEY SO FAR… ü Approval – License to Public ü Prepare for Go-Live March – 2018 Prototype ü Develop Prototype. ü Initiate License to Public April – 2018 Technical Go - Live ü Product Stack deployed ü 2 Sources Live ü Tune product for Business Go-Live Business Go – Live ü Add new sources ü Consuming Capabilities ü Enable data usage Approach ü Successful prototype ü Co-creation – Business & IT ü 2 Event Sources Go! August – 2018 September – 2018 December – 2018 CUSTOMER EVENT STORE: WHEN

Slide 12

Slide 12 text

12 CUSTOMER EVENT STORE: HOW CONCEPTUAL DESIGN Internal sources RDS Landing Zone External sources Access Mngmt Metadata Validation Streaming End-Point REST API Alerting Standardisation ETL Metadata ETL Streaming Batch Metadata Metadata Real-Time processing Batch processing EVENT STORE Profile Information Stitching Pre-Processing Lineage Orchestration Monitoring Access Management Stream Streaming Zone

Slide 13

Slide 13 text

13 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE : STREAM & BATCH Batch Bucket Nano - Batch Bucket EMR Glue Step Function Lambda SNS Enterprise Raw Data Store Auto-Scaling Group Snowplow Collector Fargate Fargate

Slide 14

Slide 14 text

14 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE: ONE PROCESS – STREAM & BATCH Nano - Batch Bucket Auto-Scaling Group Snowplow Collector Fargate Auto-Scaling Group Snowplow Enricher Fargate Kinesis Data Stream – Raw Kinesis Data Stream – Good Kinesis Data Stream – Bad Kinesis Data Firehose Schema Bucket Bad Events Bucket Enterprise Raw Data Store Batch Bucket

Slide 15

Slide 15 text

15 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE: STANDARDIZE Enterprise Raw Data Store Auto-Scaling Group Snowplow Collector Fargate Auto-Scaling Group Snowplow Enricher Fargate Kinesis Data Stream – Raw Kinesis Data Stream – Good Kinesis Data Stream – Bad CloudWatch Kinesis Data Stream – Standardized Kinesis Data Firehose - ORC Kinesis Data Firehose - JSON Standard Bucket Glue Crawler Athena DynamoDB Alarm Alarm Rule Schema Bucket

Slide 16

Slide 16 text

16 CUSTOMER EVENT STORE: HOW TECHNICAL ARCHITECTURE: END 2 END Enterprise Raw Data Store

Slide 17

Slide 17 text

17 CUSTOMER EVENT STORE: SUMMARY AWS STEP FUNCTIONS 2018 Analysis Complex workflows involving iteration of Lambda functions can be implemented quickly. Complex Workflows Clear intermediate results. Debug Friendly New state machine can be created for only the failed states.. Restart – ability Preserves state between subsequent API calls. State Management Lambda, Glue, ECS, SageMaker. Serverless Orchestration Retrials can be triggered for specific errors. Other actions can also be configured. Error Handling

Slide 18

Slide 18 text

18 CUSTOMER EVENT STORE: SUMMARY WHEN TO USE GLUE AND/OR EMR Can be placed in custom VPC Horizontally scalable Server less & Pre-configured Limited customization Fully Managed Public Service Define Cluster – Choose Applications & Customize as you wish Vertically & horizontally scalable More actions than just SPARK AWS EMR AWS Glue Spin-up time Only Spark

Slide 19

Slide 19 text

19 KEY TAKE-AWAY Dev-Ops Security by Design Architecture by Evolution One Process Serverless & Native Components Technical Vs Business Go – Live CUSTOMER EVENT STORE: SUMMARY

Slide 20

Slide 20 text

Thank you!

Slide 21

Slide 21 text

Streaming with Amazon Kinesis Easily collect, process, and analyze video and data streams in real-time Capture, process, and store video streams Amazon Kinesis Video Streams Load data streams into data stores Amazon Kinesis Data Firehose SQL Analyze data streams with SQL Amazon Kinesis Data Analytics Capture, process, and store data streams Amazon Kinesis Data Streams

Slide 22

Slide 22 text

Amazon Kinesis Data Streams producers and consumers Producers Consumers Kinesis Agent Apache Kafka AWS SDK LOG4J Flume Fluentd AWS Mobile SDK for iOS Amazon Kinesis Producer Library Get* APIs Amazon Kinesis Client Library + Connector Library Apache Storm Amazon EMR AWS Lambda Apache Spark Amazon Kinesis

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Data ingestion from a variety of sources Kinesis Data Streams Transactions ERP Web logs/ cookies Connected devices AWS SDKs • Publish directly from application code via APIs • AWS Mobile SDK • Managed AWS sources: CloudWatch Logs, AWS IoT, Kinesis Data Analytics and more • RDS Aurora via Lambda Kinesis Agent • Monitors log files and forwards lines as messages to Kinesis Data Streams Kinesis Producer Library (KPL) • Background process aggregates and batches messages 3rd party and open source • Log4j appender • Apache Kafka • Flume, fluentd, and more …

Slide 25

Slide 25 text

Data processing from a variety of consumers Fully managed service for real-time processing of streaming data Cost-effective: $0.014 per 1,000,000 PUT Payload Units Millions of sources producing 100’s of terabytes per hour Amazon Web Services Front End AZ AZ AZ Authentic authorization Durable, highly consistent storage replicas data across three data centers (availability zones) Ordered stream of events supports multiple readers Amazon Kinesis Client Library on EC2 Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics AWS Lambda

Slide 26

Slide 26 text

Amazon Kinesis Data Streams: Standard consumers Shard 1 Shard 2 Shard 3 Shard n Kinesis Data Stream Consumer application A GetRecords() Data GetRecords(): Five transactions per second, per shard Data: 2MB per second, per shard Data producer up to 1 MB or 1000 records per second, per shard With only one consumer application, records can be retrieved every 200 ms

Slide 27

Slide 27 text

Amazon Kinesis Data Streams: Enhanced fan-out consumers Consumers do not poll. Messages are pushed to the consumer as they arrive Shard 1 Kinesis Data Stream Data producer Consumer application A SubscribeToShard() Uses HTTP/2 • Up to five mins connection • Data pushed to consumer persist

Slide 28

Slide 28 text

Enhanced fan-out • Multiple consumer applications for the same Kinesis Data Stream • Default limit of five registered consuming applications. More can be supported with a service limit increase request • Low-latency requirements for data processing • Messages are typically delivered to a consumer in less than 70 ms Amazon Kinesis Data Streams Consumers Standard • Total number of consuming applications is low • Consumers are not latency- sensitive • Minimize cost

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Amazon Kinesis Data Firehose—How it works Ingest Transform Deliver Amazon S3 Amazon Redshift Amazon Elasticsearch Service AWS IoT Amazon Kinesis Agent Amazon Kinesis Streams Amazon CloudWatch Logs Amazon CloudWatch Events Apache Kafka

Slide 31

Slide 31 text

Slide 32

Slide 32 text

SQL on streaming data? Aggregations (count, sum, min, … ) take granular real-time data and turn it into insights Data is continuously processed so you need to tell the application when you want results Aggregation Windows

Slide 33

Slide 33 text

Window types Sliding, tumbling, and stagger Tumbling windows are fixed size and grouped keys do not overlap Source Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

Slide 34

Slide 34 text

Writing streaming SQL Pump (continuous query) using stagger window CREATE OR REPLACE PUMP calls_per_ip_pump AS INSERT INTO calls_per_ip_stream SELECT STREAM source_ip_address, COUNT(*) FROM source_sql_stream_001 WINDOWED BY STAGGER( PARTITION BY source_ip_address RANGE INTERVAL '1' MINUTE);

Slide 35

Slide 35 text

Apache Flink: Stateful Stream Computations

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

DeepAR Probabilistic Forecasting with Autoregressive Recurrent Networks

Slide 41

Slide 41 text

https://github.com/awslabs/amazon-sagemaker-examples/

Slide 42

Slide 42 text

N E W ! Amazon Forecast Any historical time-series Integrates with SAP and Oracle Supply Chain Custom forecasts with 3 clicks 50% more accurate 1/10th the cost Integrates with Amazon Timestream Retail demand Travel demand AWS usage Revenue forecasts Web traffic Advertising demand Generate forecasts for: Accurate time-series forecasting service, based on the same technology used at Amazon.com