Apache Kafka on AWS: Amazon MSK

Apache Kafka on AWS: Amazon MSK

Stop press. Webcast here: https://www.youtube.com/watch?v=HtU9pb18g5Q

I created a newer slide deck that shows many of the new features of Amazon Managed Streaming for Kafka with quick live demos: cluster creation and security, custom configurations, integration with CloudWatch, resizing of broker storage and much much more. There will be a web cast soon, check here: aws-kafka

Apache Kafka is one the most popular open-source projects for building messaging and streaming applications. Kafka takes data, makes it available to different applications, and therefore helps to eliminate daily batch jobs.

Kafka plays an important role for Change Data Capture (CDC) and in the world of microservices. This presentation gives an overview of the new Amazon Managed Streaming for Kafka (Amazon MSK).

Based on knowledge gained from several on-prem Kafka implementation projects I will cover the technical underpinnings first. You will learn about brokers, topics, and Zookeeper. Then I will explain what makes Kafka special, analyse major pain points in on-prem Kafka projects, critically analyse how Kafka differs from Kinesis, and why the cloud is the best way to use Kafka

643cd45dcfa73b072018046e39ed36d1?s=128

Frank Munz

June 27, 2019
Tweet

Transcript

  1. 1.

    © 2019, Amazon Web Services, Inc. or its Affiliates. Apache

    Kafka on AWS Amazon Managed Streaming for Apache Kafka Dr. Frank Munz Senior Technical Evangelist Amazon Web Services @frankmunz
  2. 2.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T About me • Software Architect / DevOps Engineer • Technical Evangelist @ AWS • Published an AWS book • Containers, serverless and a sprinkle of ML & big / fast data @frankmunz
  3. 3.

    © 2019, Amazon Web Services, Inc. or its Affiliates. Table

    of contents • Streaming Data • Modern Streaming Architectures • Apache Kafka • Amazon Managed Streaming for Apache Kafka (MSK) • Apache Kafka or Amazon Kinesis? • Q & A
  4. 4.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Streaming Data
  5. 5.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Streaming Data Web Clickstream Application Logs IoT Sensors [Wed Oct 11 14:32:52 2018] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/ht docs/test Continuously generated, small size events, low latency requirements
  6. 6.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Timely Decisions Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  7. 7.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Less Surreal, Modern Architectures
  8. 8.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T How Kafka Started: LinkedIn Reduced Complexity Decoupling
  9. 9.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Better Decoupling: Microservices Event Sourcing CQRS Choreography time-ordered, processable events Separates read (query) from write (command) operations. Writes are event sourced. choreography orchestration
  10. 10.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ log.retention.hours = -1 Kafka as Data or Event Store
  11. 11.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Apache Kafka
  12. 12.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Commit Log 0 1 2 3 4 … n Message Offset Producer Consumer A Consumer B https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it-is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that new old Topic A
  13. 13.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T TopicA Partition1 TopicA Partition3 Partition Replica Replica Producer Zoo- keeper Zoo- keeper Zoo- keeper State & Config TopicA Partition2 Replica Cluster Partitioned, Replicated Commit Log
  14. 14.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Challenges Operating Apache Kafka Difficult to setup, configure and operate Hard to achieve high availability Tricky to scale AWS integrations No console, no visible metrics Operational experience
  15. 15.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T How to run Apache Kafka on AWS? Self managed on EC2 Amazon Managed Streaming for Kafka (this talk!) On top of Kubernetes, e.g. as K8s operator
  16. 16.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Managed Streaming for Apache Kafka (MSK)
  17. 17.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Apache Zookeeper (ZK) ? Zookeeper runs under the hood ZK is set up highly available No additional cost
  18. 19.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Security Data is always encrypted at rest and can be encrypted in transit
  19. 20.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Cluster Wide Storage Scaling You can increase storage after creation but not decrease it aws kafka update-broker-storage --cluster-arn ClusterArn --current-version Current- Cluster-Version --target-broker-ebs-volume-info '{"KafkaBrokerNodeId": "All", "VolumeSizeGB": Target-Volume-in-GiB}'
  20. 21.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Cloud Formation Support for MSK https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-msk-cluster.html
  21. 22.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T CloudWatch Integration https://docs.aws.amazon.com/msk/latest/developerguide/monitoring.html MSK monitoring levels: DEFAULT, PER_BROKER, or PER_TOPIC_PER_BROKER
  22. 23.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Custom Configuration Option Default configuration for brokers, topics, and Apache ZooKeeper nodes: You can create custom configurations and use them for cluster creation
  23. 24.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T MSK Pricing On-demand, hourly pricing for broker and storage prorated to the second: kafka.m5.large $0.21/hr $0.10 per GB-month You don’t pay for the number of topics or replication traffic or ZK.
  24. 25.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Kinesis or Managed Streaming for Apache Kafka?
  25. 26.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Kinesis Real-time data streaming and analytics Easily collect, process, and analyze streams in real time Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL or Java Build custom applications that analyze data streams NEW!
  26. 27.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Comparing Amazon Kinesis Data Streams to MSK Amazon Kinesis Data Streams Amazon MSK Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Shard 2 Shard 1 Shard 3 Writes from Producers Stream with 3 shards Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Writes from Producers Topic with 3 partitions
  27. 28.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS API Amazon Kinesis Data Streams Throughput Provisioning Model Seamless Scaling Deep AWS Integration Retention Time 1d (max 7d) Open-Source Cluster Provisioning Model Scaling not seamless to client Retention 7d (max is unlimitted) Strong 3rd party tooling Apache Kafka
  28. 29.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Conclusion Streaming is about actionable data Apache Kafka is an open-source, versatile, and popular streaming platform Managed Streaming for Kafka (MSK): We run Apache Kafka for you Go build with MSK or Kinesis
  29. 30.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Additional Resources bit.ly/aws-kafka
  30. 31.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. Thank you! frankmunz @frankmunz https://medium.com/@frank.munz (Blog) https://speakerdeck.com/fmunz (Slides)