Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Kafka on AWS: Amazon MSK

Apache Kafka on AWS: Amazon MSK

Stop press. Webcast here: https://www.youtube.com/watch?v=HtU9pb18g5Q

I created a newer slide deck that shows many of the new features of Amazon Managed Streaming for Kafka with quick live demos: cluster creation and security, custom configurations, integration with CloudWatch, resizing of broker storage and much much more. There will be a web cast soon, check here: aws-kafka

Apache Kafka is one the most popular open-source projects for building messaging and streaming applications. Kafka takes data, makes it available to different applications, and therefore helps to eliminate daily batch jobs.

Kafka plays an important role for Change Data Capture (CDC) and in the world of microservices. This presentation gives an overview of the new Amazon Managed Streaming for Kafka (Amazon MSK).

Based on knowledge gained from several on-prem Kafka implementation projects I will cover the technical underpinnings first. You will learn about brokers, topics, and Zookeeper. Then I will explain what makes Kafka special, analyse major pain points in on-prem Kafka projects, critically analyse how Kafka differs from Kinesis, and why the cloud is the best way to use Kafka

Frank Munz

June 27, 2019
Tweet

More Decks by Frank Munz

Other Decks in Programming

Transcript

  1. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Apache Kafka on AWS
    Amazon Managed Streaming for Apache Kafka
    Dr. Frank Munz
    Senior Technical Evangelist
    Amazon Web Services @frankmunz

    View Slide

  2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    About me
    • Software Architect / DevOps Engineer
    • Technical Evangelist @ AWS
    • Published an AWS book
    • Containers, serverless and a sprinkle
    of ML & big / fast data
    @frankmunz

    View Slide

  3. © 2019, Amazon Web Services, Inc. or its Affiliates.
    Table of contents
    • Streaming Data
    • Modern Streaming Architectures
    • Apache Kafka
    • Amazon Managed Streaming for Apache Kafka (MSK)
    • Apache Kafka or Amazon Kinesis?
    • Q & A

    View Slide

  4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Streaming Data

    View Slide

  5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Streaming Data
    Web Clickstream Application Logs
    IoT Sensors
    [Wed Oct 11 14:32:52
    2018] [error] [client
    127.0.0.1] client
    denied by server
    configuration:
    /export/home/live/ap/ht
    docs/test
    Continuously generated, small size events,
    low latency requirements

    View Slide

  6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Timely Decisions
    Source: Perishable insights, Mike Gualtieri, Forrester
    Data loses value quickly over time
    Real time Seconds Minutes Hours Days Months
    Value of data to decision-making
    Preventive/Predictive
    Actionable Reactive Historical
    Time critical decisions Traditional “batch” business intelligence
    Information half-life
    in decision-making

    View Slide

  7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Less Surreal,
    Modern Architectures

    View Slide

  8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    How Kafka Started: LinkedIn
    Reduced Complexity
    Decoupling

    View Slide

  9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Better Decoupling: Microservices
    Event Sourcing CQRS Choreography
    time-ordered,
    processable
    events
    Separates read (query)
    from write (command)
    operations. Writes are
    event sourced.
    choreography
    orchestration

    View Slide

  10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
    log.retention.hours = -1
    Kafka as Data or Event Store

    View Slide

  11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Apache Kafka

    View Slide

  12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Commit Log
    0 1 2 3 4 … n
    Message Offset
    Producer
    Consumer A
    Consumer B
    https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it-is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that
    new
    old
    Topic A

    View Slide

  13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    TopicA Partition1
    TopicA Partition3
    Partition
    Replica
    Replica
    Producer
    Zoo-
    keeper
    Zoo-
    keeper
    Zoo-
    keeper
    State
    & Config
    TopicA Partition2 Replica
    Cluster
    Partitioned, Replicated Commit Log

    View Slide

  14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Challenges Operating Apache Kafka
    Difficult to setup,
    configure and operate
    Hard to achieve high
    availability
    Tricky to scale
    AWS integrations
    No console, no visible
    metrics Operational
    experience

    View Slide

  15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    How to run Apache Kafka on AWS?
    Self managed on EC2 Amazon Managed
    Streaming for Kafka
    (this talk!)
    On top of Kubernetes,
    e.g. as K8s operator

    View Slide

  16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Amazon Managed Streaming
    for Apache Kafka (MSK)

    View Slide

  17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Apache Zookeeper (ZK) ?
    Zookeeper runs under the hood
    ZK is set up highly available
    No additional cost

    View Slide

  18. Getting started with Amazon MSK is easy!

    View Slide

  19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Security
    Data is always encrypted at rest and can be encrypted in transit

    View Slide

  20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Cluster Wide Storage Scaling
    You can increase storage after creation but not decrease it
    aws kafka update-broker-storage --cluster-arn ClusterArn --current-version Current-
    Cluster-Version --target-broker-ebs-volume-info '{"KafkaBrokerNodeId": "All",
    "VolumeSizeGB": Target-Volume-in-GiB}'

    View Slide

  21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Cloud Formation Support for MSK
    https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-msk-cluster.html

    View Slide

  22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    CloudWatch Integration
    https://docs.aws.amazon.com/msk/latest/developerguide/monitoring.html
    MSK monitoring levels: DEFAULT, PER_BROKER, or PER_TOPIC_PER_BROKER

    View Slide

  23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Custom Configuration Option
    Default configuration for brokers, topics, and Apache ZooKeeper nodes:
    You can create custom configurations and use them for cluster creation

    View Slide

  24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    MSK Pricing
    On-demand, hourly pricing for broker and storage prorated to
    the second:
    kafka.m5.large
    $0.21/hr
    $0.10 per GB-month
    You don’t pay for the number of topics or replication traffic or ZK.

    View Slide

  25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Amazon Kinesis or
    Managed Streaming for Apache Kafka?

    View Slide

  26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Amazon Kinesis
    Real-time data streaming and analytics
    Easily collect, process, and analyze streams in real time
    Kinesis
    Video Streams
    Kinesis
    Data Streams
    Kinesis
    Data Firehose
    Kinesis
    Data Analytics
    Capture, process,
    and store video
    streams for
    analytics
    Load data streams
    into AWS data
    stores
    Analyze data streams
    with SQL or Java
    Build custom
    applications that
    analyze data
    streams NEW!

    View Slide

  27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Comparing Amazon Kinesis Data Streams to MSK
    Amazon Kinesis Data Streams Amazon MSK
    Newest data
    Oldest data
    5
    0 1 2 3 4
    0 1 2 3
    0 1 2 3 4
    Shard 2
    Shard 1
    Shard 3
    Writes
    from
    Producers
    Stream with 3 shards
    Newest data
    Oldest data
    5
    0 1 2 3 4
    0 1 2 3
    0 1 2 3 4
    Partition 2
    Partition 1
    Partition 3
    Writes
    from
    Producers
    Topic with 3 partitions

    View Slide

  28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    AWS API
    Amazon Kinesis
    Data Streams
    Throughput Provisioning Model
    Seamless Scaling
    Deep AWS Integration
    Retention Time 1d (max 7d)
    Open-Source Cluster Provisioning Model
    Scaling not seamless to client
    Retention 7d (max is unlimitted)
    Strong 3rd party tooling
    Apache Kafka

    View Slide

  29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Conclusion
    Streaming is about actionable data
    Apache Kafka is an open-source, versatile,
    and popular streaming platform
    Managed Streaming for Kafka (MSK):
    We run Apache Kafka for you
    Go build with MSK or Kinesis

    View Slide

  30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    Additional Resources
    bit.ly/aws-kafka

    View Slide

  31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    © 2019, Amazon Web Services, Inc. or its Affiliates.
    Thank you!
    frankmunz
    @frankmunz https://medium.com/@frank.munz (Blog)
    https://speakerdeck.com/fmunz (Slides)

    View Slide