rights reserved. S U M M I T Ruben Hernando Technical Director Infinia Dr Frank Munz Senior Technical Evangelist Amazon Web Services Analysing Streaming Data
rights reserved. S U M M I T - Streaming Architectures - Amazon Kinesis - Serverless Stream Processing - Amazon Managed Streaming for Kafka (MSK) - Ruben Hernando from Infinia Agenda
rights reserved. S U M M I T Streaming Data Web Clickstream Application Logs IoT Sensors [Wed Oct 11 14:32:52 2018] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/ht docs/test Continuously generated, small size events, low latency requirements
rights reserved. S U M M I T Transform and Process Continuously Streaming Ingest video & data as it’s generated Process data on the fly Real-time analytics/ML, alerts, actions
rights reserved. S U M M I T Amazon Kinesis Real-time data streaming and analytics Easily collect, process, and analyze streams in real time Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL or Java Build custom applications that analyze data streams NEW!
rights reserved. S U M M I T Data Ingestion from a Variety of Sources Kinesis Data Streams Transactions ERP Web logs/ cookies Connected devices AWS SDKs • Publish directly from application code via APIs • AWS Mobile SDK • Managed AWS sources: CloudWatch Logs, AWS IoT, Kinesis Data Analytics and more • RDS Aurora via Lambda Kinesis Agent • Monitors log files and forwards lines as messages to Kinesis Data Streams 3rd party and open source • Log4j appender • Apache Kafka • Flume, fluentd, and more … Kinesis Producer Library (KPL) • Background process aggregates and batches messages
rights reserved. S U M M I T New: Lambda supports Kinesis Data Streams Enhanced Fan-Out and HTTP/2 for faster streaming Enhanced fan-out allows customers to scale the number of functions reading from a stream in parallel while maintaining performance. HTTP/2 data retrieval API improves data delivery speed between data producers and Lambda functions by more than 65% Amazon Kinesis Data Streams
rights reserved. S U M M I T The Serverless Operational Model No provisioning, no management Pay for value Automatic scaling Highly available and secure
rights reserved. S U M M I T Processing a Data Stream with AWS Lambda data producer Kinesis Data Streams Amazon SNS Continuously stream data Lambda service Lambda function A Lambda function B Continuously polls for new data, 1 poll per second Automatically invokes your function(s) when data found Lambda polls each shard once per second, reads records in batch Lambda’s maximum execution time is 15 minutes
rights reserved. S U M M I T Kinesis Streaming Data Analytics / Apache Flink Framework and engine for stateful processing of data streams. Simple programming High performance Stateful Processing Strong data integrity Easy to use and flexible APIs make building apps fast In-memory computing provides low latency & high throughput Durable application state saves Exactly-once processing and consistent state
rights reserved. S U M M I T Kinesis Data Firehose—How it Works Ingest Transform Deliver Amazon S3 Amazon Redshift Amazon Elasticsearch Service AWS IoT Amazon Kinesis Agent Amazon Kinesis Streams Amazon CloudWatch Logs Amazon CloudWatch Events Apache Kafka
rights reserved. S U M M I T Kinesis Data Firehose: Record format Conversion Kinesis Data Firehose Amazon S3 Glue Data Catalog Data Producer schema convert to columnar format JSON data /failed
rights reserved. S U M M I T Challenges operating Apache Kafka Difficult to setup, configure and operate Hard to achieve high availability Tricky to scale AWS integrations = development No console, no visible metrics
Amazon Web Services, Inc. or its affiliates. All rights reserved. frankmunz @frankmunz https://medium.com/@frank.munz (Blog) https://speakerdeck.com/fmunz (Slides)