Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architectures with Apache Kafka in AWS / Manage...

Frank Munz
February 26, 2019

Architectures with Apache Kafka in AWS / Managed Streaming for Apache Kafka (MSK)

Apache Kafka is one the most popular open-source projects for building messaging and streaming applications. Kafka takes data, makes it available to different applications, and therefore helps to eliminate daily batch jobs.

Kafka plays an important role for Change Data Capture (CDC) and in the world of microservices. This presentation gives an overview of the new Amazon Managed Streaming for Kafka (Amazon MSK).

Based on knowledge gained from several on-prem Kafka implementation projects I will cover the technical underpinnings first. You will learn about brokers, topics, and Zookeeper. Then I will explain what makes Kafka special, analyse major pain points in on-prem Kafka projects, critically analyse how Kafka differs from Kinesis, and why the cloud is the best way to use Kafka

Frank Munz

February 26, 2019
Tweet

More Decks by Frank Munz

Other Decks in Programming

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Dr Frank Munz Senior Technical Evangelist Amazon Web Services B E R S U M 1 9 - 7 8 Designing Less Surreal Architectures with Apache Kafka in AWS @frankmunz
  2. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introductory - 200 “These sessions provide an overview of AWS services and features, and they assume that attendees are new to the topic. These sessions highlight basic use cases, features, functions, and benefits."
  3. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Agenda • Streaming Data • Modern Streaming Architectures • Apache Kafka • Amazon Managed Streaming for Kafka (MSK) • Apache Kafka or Amazon Kinesis? • Q & A
  4. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  5. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Streaming Data Web Clickstream Application Logs IoT Sensors [Wed Oct 11 14:32:52 2018] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/ht docs/test Continuously generated, small size events, low latency requirements
  6. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Transform and Process Continuously Streaming Ingest video & data as it’s generated Process data on the fly Real-time analytics/ML, alerts, actions
  7. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Timely Decisions Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  8. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  9. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T From Batch to Streaming Analytics https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  10. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T How Kafka Started: LinkedIn Reduced Complexity Decoupling
  11. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Better Decoupled Microservices Event Sourcing CQRS ! Choreography time-ordered, processable events Separates read (query) from write (command) operations. Writes are event sourced. choreography orchestration
  12. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ log.retention.hours = -1 Kafka as Data or Event Store
  13. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  14. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Commit Log 0 1 2 3 4 … n Message Offset Producer Consumer A Consumer B https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it-is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that new old Topic A
  15. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T TopicA Partition1 TopicA Partition3 Partition Replica Replica Producer Zoo- keeper Zoo- keeper Zoo- keeper State & Config TopicA Partition2 Replica Cluster Partitioned, Replicated Commit Log
  16. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Challenges Operating Apache Kafka Difficult to setup, configure and operate Hard to achieve high availability Tricky to scale AWS integrations No console, no visible metrics
  17. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T How to run Apache Kafka on AWS? Self managed on EC2 Amazon Managed Streaming for Kafka (this talk!) On top of Kubernetes, e.g. as K8s operator
  18. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  19. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Why Amazon MSK?
  20. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Apache Zookeeper (ZK) ? Zookeeper runs under the hood ZK is set up highly available No additional cost
  21. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T MSK VPC AWS Managed Streaming for Kafka Availability Zone 1 Availability Zone 2 Availability Zone 2 Control Plane: Zookeeper Instances Use Zookeeper Connect String for clients 172.31.4.240:2181,172.31.44.125:2181,172.31.20.136:2181 Data Plane: Broker Instances What you do: What we do…
  22. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Plans for MSK Planned for MSK Global Availability Service level agreement (SLA) Version upgrades Scale a cluster horizontally & vertically Supports Apache Kafka partition reassignment tooling Define custom cluster configurations Auto scale storage Deeper AWS integration: Tagging, AWS CloudTrail, AWS CloudFormation Already in MSK Preview Apache Kafka 2.1 (or 1.1.1) 3 Regions: N Virginia, Ohio, Ireland Console and API provisioning M5 Broker with GP2 Storage AWS Cloud Watch, VPC, IAM and KMS Auto-healing Patches applied automatically
  23. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon MSK Defaults Config Default Setting offsets.topic.replication.factor 3 transaction.state.log.replication.factor 3 transaction.state.log.min.isr 2 auto.create.topics.enable False default.replication.factor 3 min.insync.replicas 2 unclean.leader.election.enable True auto.leader.rebalance.enable True authorizer.class.name kafka.security.auth.SimpleAclAuthorizer group.initial.rebalance.delay.ms 3000 log.retention.hours 168
  24. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T MSK Pricing On-demand, hourly pricing for broker and storage prorated to the second: kafka.m5.large $0.21/hr $0.10 per GB-month You don’t pay for the number of topics or replication traffic or ZK.
  25. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Comparing Amazon Kinesis Data Streams to MSK Amazon Kinesis Data Streams Amazon MSK Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Shard 2 Shard 1 Shard 3 Writes from Producers Stream with 3 shards Newest data Oldest data 5 0 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Writes from Producers Topic with 3 partitions
  26. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS API Amazon Kinesis Data Streams Throughput Provisioning Model Seamless Scaling Deep AWS Integration Retention Time 1d (max 7d) Open-Source Cluster Provisioning Model Scaling not seamless to client Retention 7d (max is unlimitted) Strong 3rd Party Tooling Apache Kafka
  27. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Conclusion Streaming is about actionable data: Apache Kafka is an open-source, versatile, and popular streaming platform Managed Streaming for Kafka (MSK) We run Apache Kafka for you Go build with MSK or Kinesis
  28. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. @frankmunz