Slide 1

Slide 1 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Let's Explore Apache Kafka, the easy way on AWS! Principal Developer Advocate Amazon Web Services Abhishek Gupta abhi_tweeter abhirockzz

Slide 2

Slide 2 text

© 2022, Amazon Web Services, Inc. or its affiliates. Agenda Kafka 101 MSK MSK Connect MSK Serverless Demos (of course!) 2

Slide 3

Slide 3 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101 3

Slide 4

Slide 4 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. What is Apache Kafka? 4 https://engineering.linkedin.com/distributed-systems/log-what-every-software- engineer-should-know-about-real-time-datas-unifying https://abhishek1987.medium.com/kafka-is-it-a-topic- or-a-queue-30c85386afd6

Slide 5

Slide 5 text

© 2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101: Topics 6 Producer Topic 1 Data consumer Apache Kafka Cluster Producer Topic 2 Topic 3 Data consumer

Slide 6

Slide 6 text

© 2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101: Partitions 7 Producer Partition 1 Data consumer Apache Kafka Cluster Producer Partition 2 Partition 3 Data consumer Kafka Topic

Slide 7

Slide 7 text

© 2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101: Writing to partitions Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Topic with 3 partitions Producer 9

Slide 8

Slide 8 text

© 2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101: Reads from partitions Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Consumer Consumer Consumer Consumer group 4 2 0 = Next consumer offset 0 10

Slide 9

Slide 9 text

© 2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101: Cluster 11 Broker 1 Topic A Partition 0 Topic A Partition 2 Broker 2 Topic A Partition 1 Topic A Partition 0 Broker 3 Topic A Partition 2 Topic A Partition 1 Apache Zookeeper

Slide 10

Slide 10 text

© 2022, Amazon Web Services, Inc. or its affiliates. Challenges operation Apache Kafka Difficult to setup Tricky to scale Hard to achieve high availability Integration required development Error prone and complex to manage Expensive to maintain 12

Slide 11

Slide 11 text

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon Managed Streaming for Apache Kafka A fully managed service for Apache Kafka and Kafka Connect

Slide 12

Slide 12 text

© 2022, Amazon Web Services, Inc. or its affiliates. App Dev/ Optimization Scaling High Availability Kafka Install/ Patching OS Patching Rolling Version Upgrades Broker/ ZK Maintenance Within-cluster Data Xfer cost Encryption Hardware Lifecycle Power/ Network/ HVAC OS Install Hardware Maintenance App Dev/ Optimization Scaling High Availability Kafka Install/ Patching OS Patching Rolling Version Upgrades Broker/ ZK Maintenance Within-cluster Data Xfer cost Encryption Hardware Lifecycle Power/ Network/ HVAC OS Install Hardware Maintenance App Dev/ Optimization Scaling High Availability Kafka Install/ Patching OS Patching Rolling Version Upgrades Broker/ ZK Maintenance Within-cluster Data Xfer cost Encryption Hardware Lifecycle Power/ Network/ HVAC OS Install Hardware Maintenance Self Managed Kafka Fully Managed AWS Managed More focus on creating Streaming Applications than managing infrastructure On-Premises Amazon EC2 Amazon MSK

Slide 13

Slide 13 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key Features Highly secure Protect your data with multiple levels of security, including VPC network isolation, encryption at-rest and in-transit, IAM access control

Slide 14

Slide 14 text

© 2022, Amazon Web Services, Inc. or its affiliates. Highly Secure Amazon MSK VPC

Slide 15

Slide 15 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key Features Highly secure Protect your data with multiple levels of security, including VPC network isolation, encryption at-rest and in-transit, IAM access control Highly available Take advantage of multi-AZ replication within an AWS Region

Slide 16

Slide 16 text

© 2022, Amazon Web Services, Inc. or its affiliates. Highly Available Availability Zone 1 Availability Zone 2 Availability Zone 3 Amazon MSK VPC

Slide 17

Slide 17 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key Features Highly secure Protect your data with multiple levels of security, including VPC network isolation, encryption at-rest and in-transit, IAM access control Highly available Take advantage of multi-AZ replication within an AWS Region Fully compatible Run your existing Apache Kafka applications on AWS without changes to source code

Slide 18

Slide 18 text

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon MSK Compatibility Open source Apache Kafka Kafka Connect MirrorMaker Kafka Streams Apache Kafka tooling and frameworks AWS Glue Schema Registry or 3rd party schema registries REST proxies Additonal 3rd party tools: Burrow, Kafdrop, CMAK, etc. Tools that load .jar files on brokers Confluent Control Center Confluent Auto Data Balancer Uber uReplicator 22

Slide 19

Slide 19 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Demo – Getting started with Kafka 24

Slide 20

Slide 20 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key Features Deep AWS integrations AWS IoT as a data source, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon Kinesis Data Analytics

Slide 21

Slide 21 text

© 2022, Amazon Web Services, Inc. or its affiliates. Deep AWS Service integration 26 Amazon VPC for network isolation and security Amazon CloudWatch for metrics Amazon KMS for storage volume encryption Amazon IAM for authentication of cluster APIs and data APIs AWS Certificate Manager for Private CAs used for client TLS authentication AWS CloudFormation for Amazon MSK clusters & configurations AWS CloudTrail for AWS API logs Amazon MSK as an event source for AWS Lambda

Slide 22

Slide 22 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key Features Deep AWS integrations AWS IoT as a data source, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon Kinesis Data Analytics Scalability Add brokers, change broker sizes, add more storage

Slide 23

Slide 23 text

© 2022, Amazon Web Services, Inc. or its affiliates. Scaling Amazon MSK 28 A M A Z O N M S K A L L O W S H O R I Z O N T A L A N D V E R T I C A L S C A L I N G Horizontal Scaling Add Kafka brokers Must be a multiple of used AZs Only scale-up operation supported Requires reassigning of partitions Vertical Scaling Change the size or family of Kafka brokers Scale-up and down operations No cluster I/O interruption

Slide 24

Slide 24 text

© 2022, Amazon Web Services, Inc. or its affiliates. Scaling Storage in Amazon MSK 29 • Scale storage in 10 GiB increments • Start scaling action via AWS Console or AWS CLI • Configure storage auto-scaling to automatically expand storage

Slide 25

Slide 25 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key Features Deep AWS integrations AWS IoT as a data source, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon Kinesis Data Analytics Scalability Add brokers, change broker sizes, add more storage Observability Monitor logs and metrics via Amazon CloudWatch or extract JMX metrics with Open Monitoring for Prometheus

Slide 26

Slide 26 text

© 2022, Amazon Web Services, Inc. or its affiliates. Monitoring MSK You can set three levels of monitoring with in CloudWatch for MSK, Default (at no cost to you), PER_BROKER and PER_TOPIC_PER_BROKER level. CloudWatch Metrics You can enable open monitoring with Prometheus and expand your monitoring capability to third party compatible tools such as Datadog, Lenses, New Relic and Sumo Logic Open Monitoring with Prometheus Continuously stream Apache Kafka broker logs to Amazon CloudWatch Logs, Amazon S3, or Amazon Opensearch Service via Amazon Kinesis Data Firehose Broker Logs to CW logs, S3 and AES Consumer lag monitoring https://docs.aws.amazon.com/msk/latest/developerguide/consumer-lag.html

Slide 27

Slide 27 text

© 2022, Amazon Web Services, Inc. or its affiliates. Where is Apache Zookeeper? 35 Apache Zookeeper is under the hood It is highly available, fully managed, automatically provisioned, dedicated, and included with each cluster at no additional cost

Slide 28

Slide 28 text

© 2022, Amazon Web Services, Inc. or its affiliates. Private subnet Private subnet Private subnet Amazon MSK Connectivity AWS Cloud Amazon MSK Service VPC Amazon MSK Broker Amazon MSK Broker Amazon MSK Broker Availability Zone 1 Customer VPC Availability Zone 2 Availability Zone 3 Elastic network interface Elastic network interface Elastic network interface Kafka Producer Kafka Consumer Topic Creator 36 Public access (optional)

Slide 29

Slide 29 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. MSK Connect – Integrate all the things!! 37

Slide 30

Slide 30 text

© 2022, Amazon Web Services, Inc. or its affiliates. 36 such as framework for databases, key-value stores, search indexes, and file systems Kafka Connect Kafka Data Sources Kafka Consumer ApacheKafka Connect Kafka Producer Data Destinations Kafka Connect

Slide 31

Slide 31 text

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon MSK Connect Run fully managed Kafka Connect clusters with Amazon MSK Easily deploy, monitor and scale connectors that move data in and out of Apache Kafka and Amazon MSK Eliminates the need to provision and maintain cluster infrastructure Connectors scale automatically in response to increases in usage and you pay only for the resources you use Fully compatible with Kafka Connect that makes it easy to migrate workloads without code changes

Slide 32

Slide 32 text

© 2022, Amazon Web Services, Inc. or its affiliates. MSK Connect concepts - Plugin - Connectors - Workers

Slide 33

Slide 33 text

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon MSK Connect- Architecture 38 Worker 1 Task 1 Task 4 Worker 2 Task 2 Worker 3 Task 3 Connector Application 1 MCU = 1 VCPU, 4 GiB number of workers * MCU = Provisioned /Auto Scaled © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 34

Slide 34 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Demo – Getting started with MSK Connect, the easy way 42

Slide 35

Slide 35 text

© 2022, Amazon Web Services, Inc. or its affiliates. High-level architecture 43 https://medium.com/towards-data-science/build-a-data-pipeline-on-aws-with-kafka-kafka-connect- and-dynamodb-97642cdb0cfb

Slide 36

Slide 36 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Demo – Building data pipeline with MSK Connect (bridging SQL and NoSQL) 44

Slide 37

Slide 37 text

© 2022, Amazon Web Services, Inc. or its affiliates. Data pipeline architecture 45 https://medium.com/towards-data-science/mysql-to-dynamodb-build-a-streaming-data-pipeline-on-aws- using-kafka-c2cf0b6e35b6

Slide 38

Slide 38 text

© 2022, Amazon Web Services, Inc. or its affiliates. Change Data Capture 46 https://martin.kleppmann.com/2015/06/02/change-capture-at-berlin-buzzwords.html

Slide 39

Slide 39 text

© 2022, Amazon Web Services, Inc. or its affiliates. Customer requirements Autoscaling Partition Management Lower cost

Slide 40

Slide 40 text

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon MSK Serverless

Slide 41

Slide 41 text

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon MSK Serverless Easily run Apache Kafka clusters without needing to right-size cluster capacity or worrying about overprovisioning Instantly scale I/O without needing to worry about scaling capacity up and down or reassigning partitions Pay for the data volume you stream and retain with throughput based pricing Cost effective for highly variable workloads

Slide 42

Slide 42 text

© 2022, Amazon Web Services, Inc. or its affiliates. When to usewhat… ü Offload capacity management ü Specific type of workloads ü Just getting started ü Control capacity and configuration ü Stable and predictable workloads ü Large workloads

Slide 43

Slide 43 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity

Slide 44

Slide 44 text

© 2022, Amazon Web Services, Inc. or its affiliates. 200 MBps write capacity

Slide 45

Slide 45 text

© 2022, Amazon Web Services, Inc. or its affiliates. 200 MBps write capacity 400 MBps read capacity

Slide 46

Slide 46 text

© 2022, Amazon Web Services, Inc. or its affiliates. 200 MBps write capacity 400 MBps read capacity 5 MBps write capacity 10 MBps read capacity

Slide 47

Slide 47 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity Throughput based pricing

Slide 48

Slide 48 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity Throughput based pricing Auto partition placement

Slide 49

Slide 49 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity Throughput based pricing Auto partition placement Same security as MSK Same high availability Fully compatible

Slide 50

Slide 50 text

© 2022, Amazon Web Services, Inc. or its affiliates. Private subnet Private subnet High level architecture Availability Zone 1 Your VPC Producer Consumer Availability Zone 2 Amazon MSK VPC MSK Serverless Private subnet Availability Zone 3 Admin Client

Slide 51

Slide 51 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Demo – MSK Serverless and AWS Lambda in action 60

Slide 52

Slide 52 text

© 2022, Amazon Web Services, Inc. or its affiliates. High level architecture 61

Slide 53

Slide 53 text

© 2022, Amazon Web Services, Inc. or its affiliates. Recap We covered - Kafka, MS, MSK Connect, MSK serverless - Demos – MSK 101, data pipelines, serverless data processing

Slide 54

Slide 54 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Clickstream analytics https://catalog.us-east-1.prod.workshops.aws/workshops/c2b72b6f-666b- 4596-b8bc-bafa5dcca741/en-US/mskkdaflinklab/overview

Slide 55

Slide 55 text

© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Lift and shift migration Environment A Your AWS VPC Apache Kafka https://catalog.us-east-1.prod.workshops.aws/workshops/c2b72b6f-666b-4596-b8bc-bafa5dcca741/en-US/migration

Slide 56

Slide 56 text

© 2022, Amazon Web Services, Inc. or its affiliates. Resources - MSK Workshop – excellent resource! - MSK Blogs – keep up with it! - MSK Documentation – the source of truth! - MSK IAM auth plugin (for Java clients) – open-source plugin 65

Slide 57

Slide 57 text

© 2022, Amazon Web Services, Inc. or its affiliates. Thank you! © 2022, Amazon Web Services, Inc. or its affiliates. 66 https://eventbox.dev/survey/ZG31VBM abhi_tweeter abhirockzz