Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architecting for Real-Time Insights with Stream...

Frank Munz
March 12, 2019
480

Architecting for Real-Time Insights with Streaming Data (AWS Kinesis / Apache Kafka)

Architecting for Real-Time Insights with Streaming Data (Amazon Kinesis, Lambda and Amazon Managed Service for Kafka)

Frank Munz

March 12, 2019
Tweet

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Antonello Mantuano Head of Software Engineering Cerved Dr Frank Munz Senior Technical Evangelist Amazon Web Services Architecting for Real-Time Insights with Streaming Data
  2. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introductory - 200 “These sessions provide an overview of AWS services and features, and they assume that attendees are new to the topic. These sessions highlight basic use cases, features, functions, and benefits."
  3. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T - Streaming Architectures - Amazon Kinesis - Serverless Stream Processing - Amazon Managed Streaming for Kafka (MSK) - Customer Success Story: Antonello Mantuano from Cerved Agenda
  4. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Streaming Data Web Clickstream Application Logs IoT Sensors [Wed Oct 11 14:32:52 2018] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/ht docs/test Continuously generated, small size events, low latency requirements
  5. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Transform and Process Continuously Streaming Ingest video & data as it’s generated Process data on the fly Real-time analytics/ML, alerts, actions
  6. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T From Batch to Streaming Analytics https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  7. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  8. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Kinesis Real-time data streaming and analytics Easily collect, process, and analyze streams in real time Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL or Java Build custom applications that analyze data streams NEW!
  9. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Kinesis Data Streams Overview
  10. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Data Ingestion from a Variety of Sources Kinesis Data Streams Transactions ERP Web logs/ cookies Connected devices AWS SDKs • Publish directly from application code via APIs • AWS Mobile SDK • Managed AWS sources: CloudWatch Logs, AWS IoT, Kinesis Data Analytics and more • RDS Aurora via Lambda Kinesis Agent • Monitors log files and forwards lines as messages to Kinesis Data Streams 3rd party and open source • Log4j appender • Apache Kafka • Flume, fluentd, and more … Kinesis Producer Library (KPL) • Background process aggregates and batches messages
  11. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Data Streams: Standard consumers
  12. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Data Streams: Standard consumers Shard 1 Shard 2 Shard 3 Shard n Consumer Application A GetRecords() Data GetRecords(): 5 transactions or 2MB per second, per shard Data Producer up to 1 MB or 1000 records per second, per shard With one consumer application, records can be retrieved every 200 ms. Stream
  13. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Data Streams: Enhanced fan-out consumers Every consumer gets dedicated 2MB per second, per shard. Latency is typically less than 70 msec. Shard 1 Data Producer Consumer Application B Consumer Application A RegisterStreamConsumer() EFO Pipe SubscribeToShard() Data: up to 2MB per second EFO Pipe HTTP/2: Consumers do not poll. Messages are pushed to the consumer as they arrive. RegisterStreamConsumer() SubscribeToShard() Data: up to 2MB per second Stream
  14. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  15. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T The Serverless Operational Model No provisioning, no management Pay for value Automatic scaling Highly available and secure
  16. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Processing a Data Stream with Lambda data producer Kinesis Data Streams Amazon SNS Continuously stream data Lambda service Lambda function A Lambda function B Continuously polls for new data, 1 poll per second Automatically invokes your function(s) when data found Lambda polls each shard once per second Lambda’s maximum execution time is 15 minutes
  17. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Streaming Data Analytics: SQL or Apache Flink (Java)
  18. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Streaming Data Analytics / SQL
  19. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Streaming Data Analytics / Apache Flink Framework and engine for stateful processing of data streams. Simple programming High performance Stateful Processing Strong data integrity Easy to use and flexible APIs make building apps fast In-memory computing provides low latency & high throughput Durable application state saves Exactly-once processing and consistent state
  20. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Data Firehose: Ingest Transform Load (ITL)
  21. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Data Firehose—How it Works Ingest Transform Deliver Amazon S3 Amazon Redshift Amazon Elasticsearch Service AWS IoT Amazon Kinesis Agent Amazon Kinesis Streams Amazon CloudWatch Logs Amazon CloudWatch Events Apache Kafka
  22. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Kinesis Data Firehose: Record format Conversion Kinesis Data Firehose Amazon S3 Glue Data Catalog Data Producer schema convert to columnar format JSON data /failed
  23. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Kinesis Data – Streams vs. Firehose Scalable and durable real-time data streaming service with provisioned throughput and sub-second latency that can continuously capture gigabytes of data per second from hundreds of thousands of sources. Kinesis Data Streams Kinesis Data Firehose Capture, transform, convert and load data streams into AWS data stores for near real-time analytics. Data latency 60 seconds.
  24. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  25. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Demo Architecture
  26. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Live Demo! Use your phone & connect to: XXX 2. modo ! 3. modo " 1. preparazione
  27. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  28. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Comparing Amazon Kinesis Data Streams to MSK Amazon Kinesis Data Streams Amazon MSK Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Shard 2 Shard 1 Shard 3 Writes from Producers Stream with 3 shards Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Writes from Producers Topic with 3 partitions
  29. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T TopicA Partition1 TopicA Partition3 Partition Replica Replica Producer Zoo- keeper Zoo- keeper Zoo- keeper State & Config TopicA Partition2 Replica Cluster Apache Kafka: Partitioned, Replicated Commit Log
  30. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Challenges operating Apache Kafka Difficult to setup, configure and operate Hard to achieve high availability Tricky to scale AWS integrations = development No console, no visible metrics
  31. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  32. Fast Data from Legacy to Cloud The battle to overcome

    the gravity Marzo ’19 Antonello Mantuano Head of Software Engineering antonellomantuano @manant74
  33. Cerved – the Data Driven Company 42 Credit Information Credit

    Management Marketing Solutions LEAD GENERATION CREDIT COLLECTION DATA PROVIDING & MARKETING ANALYSIS CREDIT INFORMATION CREDIT SCORING BAD CREDITS EVALUATION We are deeply passionate about data. Our data enables various financial services, from credit risk analysis to marketing solutions to managing non-performing loans and bad debt. •1M companies sites Web Data •4M info from open data set Open Data •70M payment •60M scoring Cerved Data •70M Real Estate Property •20M companies •16M shareholders Company Data 2.600 Persons 34.000 Customers 40M€ In Data & Technologies 30 M Decisions 1.400 TB Of Data
  34. Why Cerved in Cloud? 43 TIME-TO-MARKET Rapid implementation for basic

    services. Benefits of Cloud for Cerved PRIVACY & SEGREGATION Manage customers data in secure mode AVAILABILITY Services available 7x24 SCALABILITY Infrastructure quickly adaptable to the load
  35. The Data Gravity 44 As data accumulates, it begins to

    have gravity. This Data Gravity pulls services and applications closer to the data. - Dave McCrory, 2010 DATA Services Apps Latency Throughpu t This attraction (gravitational force) is caused by the need for services and applications to have higher bandwidth and/or lower latency access to the data.
  36. 45 Data Ecosystem in Cerved Sourcing Liv.2 Sourcing Liv. 1

    REPOS SYNTH Mondo Dati Lince Dati clienti NCA ERG EBS HUB DWH MBD Teradata Tabula Mongo4 DW DB4You XPCH 2 MATCH NEMO Quaes tio LUDO Tabula (su AWS) Aracne G4U MBD1 R3 Pragma Splunk CDR Mambo CAS Dedalo ELK CSS CR-RIBA (Payline)
  37. Cerved Data in Cloud Architecture Cerved DBs CDC DBs Operational

    Online Data OLTP Processes Batch Hadoop DataLake NoSql Tabula Cloud DB DynamoDB RDS S3 Streaming is the new ETL CDC Producer Raw Events Aggregator Basic Events Aggregator HL Events Aggregator NoSql Ingestion Synk Connector For Cloud Hadoop Ingestion Stream Processing Streaming is the Anti-Gravity
  38. Cerved API: a Data In Cloud use case Cerved DBs

    CDC DBs Operational Online Data OLTP Processes Batch Hadoop DataLake NoSql Tabula Cloud DB DynamoDB RDS S3 CDC Producer Raw Events Aggregator Basic Events Aggregator HL Events Aggregator NoSql Ingestion Sync Connector to Cloud Hadoop Ingestion Stream Processing Back End AWS Lambda Spring Boot API Gateway Front End ReactJs Redux Swagger
  39. The Results of API in Cloud 49 SLA API available

    7x24x365 99,998% in January 2019 PERFORMANCE High scalability with quickly adaptability to load COSTS With AWS Lambda, DynamoDB, S3, ecc… the cost of infrastructure grows with the load DATA SYNC Data are continuously updated in near real time mode
  40. Future use case of Data in Cloud Cerved DBs CDC

    DBs Operational Online Data OLTP Processes Batch Hadoop DataLake NoSql Tabula Cloud DB DynamoDB RDS S3 CDC Producer Raw Events Aggregator Basic Events Aggregator HL Events Aggregator NoSql Ingestion Sync Connector To Cloud Hadoop Ingestion Stream Processing Back End AWS Lambda API Gateway EMR SageMaker AWS Kinesis or Managed Streaming for Kafka Data Scientist in Cloud Real Time Apps DR & Backup Use Cases
  41. THANK YOU Moving Fast Data in cloud creates a new

    gravity for new and innovative apps and services Antonello Mantuano Head of Software Engineering antonellomantuano @manant74
  42. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  43. Thank you! S U M M I T © 2019,

    Amazon Web Services, Inc. or its affiliates. All rights reserved. antonellomantuano @manant74 frankmunz @frankmunz