Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Let's Explore Apache Kafka, the easy way on AWS

Abhishek Gupta
November 18, 2022
42

Let's Explore Apache Kafka, the easy way on AWS

Abhishek Gupta

November 18, 2022
Tweet

Transcript

  1. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Let's Explore Apache Kafka, the easy way on AWS! Principal Developer Advocate Amazon Web Services Abhishek Gupta abhi_tweeter abhirockzz
  2. © 2022, Amazon Web Services, Inc. or its affiliates. Agenda

    Kafka 101 MSK MSK Connect MSK Serverless Demos (of course!) 2
  3. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Apache Kafka 101 3
  4. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. What is Apache Kafka? 4 https://engineering.linkedin.com/distributed-systems/log-what-every-software- engineer-should-know-about-real-time-datas-unifying https://abhishek1987.medium.com/kafka-is-it-a-topic- or-a-queue-30c85386afd6
  5. © 2022, Amazon Web Services, Inc. or its affiliates. Apache

    Kafka 101: Topics 6 Producer Topic 1 Data consumer Apache Kafka Cluster Producer Topic 2 Topic 3 Data consumer
  6. © 2022, Amazon Web Services, Inc. or its affiliates. Apache

    Kafka 101: Partitions 7 Producer Partition 1 Data consumer Apache Kafka Cluster Producer Partition 2 Partition 3 Data consumer Kafka Topic
  7. © 2022, Amazon Web Services, Inc. or its affiliates. Apache

    Kafka 101: Writing to partitions Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Topic with 3 partitions Producer 9
  8. © 2022, Amazon Web Services, Inc. or its affiliates. Apache

    Kafka 101: Reads from partitions Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Consumer Consumer Consumer Consumer group 4 2 0 = Next consumer offset 0 10
  9. © 2022, Amazon Web Services, Inc. or its affiliates. Apache

    Kafka 101: Cluster 11 Broker 1 Topic A Partition 0 Topic A Partition 2 Broker 2 Topic A Partition 1 Topic A Partition 0 Broker 3 Topic A Partition 2 Topic A Partition 1 Apache Zookeeper
  10. © 2022, Amazon Web Services, Inc. or its affiliates. Challenges

    operation Apache Kafka Difficult to setup Tricky to scale Hard to achieve high availability Integration required development Error prone and complex to manage Expensive to maintain 12
  11. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon

    Managed Streaming for Apache Kafka A fully managed service for Apache Kafka and Kafka Connect
  12. © 2022, Amazon Web Services, Inc. or its affiliates. App

    Dev/ Optimization Scaling High Availability Kafka Install/ Patching OS Patching Rolling Version Upgrades Broker/ ZK Maintenance Within-cluster Data Xfer cost Encryption Hardware Lifecycle Power/ Network/ HVAC OS Install Hardware Maintenance App Dev/ Optimization Scaling High Availability Kafka Install/ Patching OS Patching Rolling Version Upgrades Broker/ ZK Maintenance Within-cluster Data Xfer cost Encryption Hardware Lifecycle Power/ Network/ HVAC OS Install Hardware Maintenance App Dev/ Optimization Scaling High Availability Kafka Install/ Patching OS Patching Rolling Version Upgrades Broker/ ZK Maintenance Within-cluster Data Xfer cost Encryption Hardware Lifecycle Power/ Network/ HVAC OS Install Hardware Maintenance Self Managed Kafka Fully Managed AWS Managed More focus on creating Streaming Applications than managing infrastructure On-Premises Amazon EC2 Amazon MSK
  13. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key Features Highly secure Protect your data with multiple levels of security, including VPC network isolation, encryption at-rest and in-transit, IAM access control
  14. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key Features Highly secure Protect your data with multiple levels of security, including VPC network isolation, encryption at-rest and in-transit, IAM access control Highly available Take advantage of multi-AZ replication within an AWS Region
  15. © 2022, Amazon Web Services, Inc. or its affiliates. Highly

    Available Availability Zone 1 Availability Zone 2 Availability Zone 3 Amazon MSK VPC
  16. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key Features Highly secure Protect your data with multiple levels of security, including VPC network isolation, encryption at-rest and in-transit, IAM access control Highly available Take advantage of multi-AZ replication within an AWS Region Fully compatible Run your existing Apache Kafka applications on AWS without changes to source code
  17. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon

    MSK Compatibility Open source Apache Kafka Kafka Connect MirrorMaker Kafka Streams Apache Kafka tooling and frameworks AWS Glue Schema Registry or 3rd party schema registries REST proxies Additonal 3rd party tools: Burrow, Kafdrop, CMAK, etc. Tools that load .jar files on brokers Confluent Control Center Confluent Auto Data Balancer Uber uReplicator 22
  18. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Demo – Getting started with Kafka 24
  19. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key Features Deep AWS integrations AWS IoT as a data source, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon Kinesis Data Analytics
  20. © 2022, Amazon Web Services, Inc. or its affiliates. Deep

    AWS Service integration 26 Amazon VPC for network isolation and security Amazon CloudWatch for metrics Amazon KMS for storage volume encryption Amazon IAM for authentication of cluster APIs and data APIs AWS Certificate Manager for Private CAs used for client TLS authentication AWS CloudFormation for Amazon MSK clusters & configurations AWS CloudTrail for AWS API logs Amazon MSK as an event source for AWS Lambda
  21. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key Features Deep AWS integrations AWS IoT as a data source, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon Kinesis Data Analytics Scalability Add brokers, change broker sizes, add more storage
  22. © 2022, Amazon Web Services, Inc. or its affiliates. Scaling

    Amazon MSK 28 A M A Z O N M S K A L L O W S H O R I Z O N T A L A N D V E R T I C A L S C A L I N G Horizontal Scaling Add Kafka brokers Must be a multiple of used AZs Only scale-up operation supported Requires reassigning of partitions Vertical Scaling Change the size or family of Kafka brokers Scale-up and down operations No cluster I/O interruption
  23. © 2022, Amazon Web Services, Inc. or its affiliates. Scaling

    Storage in Amazon MSK 29 • Scale storage in 10 GiB increments • Start scaling action via AWS Console or AWS CLI • Configure storage auto-scaling to automatically expand storage
  24. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key Features Deep AWS integrations AWS IoT as a data source, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon Kinesis Data Analytics Scalability Add brokers, change broker sizes, add more storage Observability Monitor logs and metrics via Amazon CloudWatch or extract JMX metrics with Open Monitoring for Prometheus
  25. © 2022, Amazon Web Services, Inc. or its affiliates. Monitoring

    MSK You can set three levels of monitoring with in CloudWatch for MSK, Default (at no cost to you), PER_BROKER and PER_TOPIC_PER_BROKER level. CloudWatch Metrics You can enable open monitoring with Prometheus and expand your monitoring capability to third party compatible tools such as Datadog, Lenses, New Relic and Sumo Logic Open Monitoring with Prometheus Continuously stream Apache Kafka broker logs to Amazon CloudWatch Logs, Amazon S3, or Amazon Opensearch Service via Amazon Kinesis Data Firehose Broker Logs to CW logs, S3 and AES Consumer lag monitoring https://docs.aws.amazon.com/msk/latest/developerguide/consumer-lag.html
  26. © 2022, Amazon Web Services, Inc. or its affiliates. Where

    is Apache Zookeeper? 35 Apache Zookeeper is under the hood It is highly available, fully managed, automatically provisioned, dedicated, and included with each cluster at no additional cost
  27. © 2022, Amazon Web Services, Inc. or its affiliates. Private

    subnet Private subnet Private subnet Amazon MSK Connectivity AWS Cloud Amazon MSK Service VPC Amazon MSK Broker Amazon MSK Broker Amazon MSK Broker Availability Zone 1 Customer VPC Availability Zone 2 Availability Zone 3 Elastic network interface Elastic network interface Elastic network interface Kafka Producer Kafka Consumer Topic Creator 36 Public access (optional)
  28. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. MSK Connect – Integrate all the things!! 37
  29. © 2022, Amazon Web Services, Inc. or its affiliates. 36

    such as framework for databases, key-value stores, search indexes, and file systems Kafka Connect Kafka Data Sources Kafka Consumer ApacheKafka Connect Kafka Producer Data Destinations Kafka Connect
  30. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon

    MSK Connect Run fully managed Kafka Connect clusters with Amazon MSK Easily deploy, monitor and scale connectors that move data in and out of Apache Kafka and Amazon MSK Eliminates the need to provision and maintain cluster infrastructure Connectors scale automatically in response to increases in usage and you pay only for the resources you use Fully compatible with Kafka Connect that makes it easy to migrate workloads without code changes
  31. © 2022, Amazon Web Services, Inc. or its affiliates. MSK

    Connect concepts - Plugin - Connectors - Workers
  32. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon

    MSK Connect- Architecture 38 Worker 1 Task 1 Task 4 Worker 2 Task 2 Worker 3 Task 3 Connector Application 1 MCU = 1 VCPU, 4 GiB number of workers * MCU = Provisioned /Auto Scaled © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  33. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Demo – Getting started with MSK Connect, the easy way 42
  34. © 2022, Amazon Web Services, Inc. or its affiliates. High-level

    architecture 43 https://medium.com/towards-data-science/build-a-data-pipeline-on-aws-with-kafka-kafka-connect- and-dynamodb-97642cdb0cfb
  35. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Demo – Building data pipeline with MSK Connect (bridging SQL and NoSQL) 44
  36. © 2022, Amazon Web Services, Inc. or its affiliates. Data

    pipeline architecture 45 https://medium.com/towards-data-science/mysql-to-dynamodb-build-a-streaming-data-pipeline-on-aws- using-kafka-c2cf0b6e35b6
  37. © 2022, Amazon Web Services, Inc. or its affiliates. Change

    Data Capture 46 https://martin.kleppmann.com/2015/06/02/change-capture-at-berlin-buzzwords.html
  38. © 2022, Amazon Web Services, Inc. or its affiliates. Customer

    requirements Autoscaling Partition Management Lower cost
  39. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon

    MSK Serverless Easily run Apache Kafka clusters without needing to right-size cluster capacity or worrying about overprovisioning Instantly scale I/O without needing to worry about scaling capacity up and down or reassigning partitions Pay for the data volume you stream and retain with throughput based pricing Cost effective for highly variable workloads
  40. © 2022, Amazon Web Services, Inc. or its affiliates. When

    to usewhat… ü Offload capacity management ü Specific type of workloads ü Just getting started ü Control capacity and configuration ü Stable and predictable workloads ü Large workloads
  41. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity
  42. © 2022, Amazon Web Services, Inc. or its affiliates. 200

    MBps write capacity 400 MBps read capacity
  43. © 2022, Amazon Web Services, Inc. or its affiliates. 200

    MBps write capacity 400 MBps read capacity 5 MBps write capacity 10 MBps read capacity
  44. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity Throughput based pricing
  45. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity Throughput based pricing Auto partition placement
  46. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Key features On-demand streaming capacity Throughput based pricing Auto partition placement Same security as MSK Same high availability Fully compatible
  47. © 2022, Amazon Web Services, Inc. or its affiliates. Private

    subnet Private subnet High level architecture Availability Zone 1 Your VPC Producer Consumer Availability Zone 2 Amazon MSK VPC MSK Serverless Private subnet Availability Zone 3 Admin Client
  48. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Demo – MSK Serverless and AWS Lambda in action 60
  49. © 2022, Amazon Web Services, Inc. or its affiliates. Recap

    We covered - Kafka, MS, MSK Connect, MSK serverless - Demos – MSK 101, data pipelines, serverless data processing
  50. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Clickstream analytics https://catalog.us-east-1.prod.workshops.aws/workshops/c2b72b6f-666b- 4596-b8bc-bafa5dcca741/en-US/mskkdaflinklab/overview
  51. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Lift and shift migration Environment A Your AWS VPC Apache Kafka https://catalog.us-east-1.prod.workshops.aws/workshops/c2b72b6f-666b-4596-b8bc-bafa5dcca741/en-US/migration
  52. © 2022, Amazon Web Services, Inc. or its affiliates. Resources

    - MSK Workshop – excellent resource! - MSK Blogs – keep up with it! - MSK Documentation – the source of truth! - MSK IAM auth plugin (for Java clients) – open-source plugin 65
  53. © 2022, Amazon Web Services, Inc. or its affiliates. Thank

    you! © 2022, Amazon Web Services, Inc. or its affiliates. 66 https://eventbox.dev/survey/ZG31VBM abhi_tweeter abhirockzz