Upgrade to Pro — share decks privately, control downloads, hide ads and more …

4 Different Ways of Working with Kafka on Azure @ Global Azure 2021

4 Different Ways of Working with Kafka on Azure @ Global Azure 2021

Abstract:

Life doesn't happen in batch mode, which is why for several years now, we see a very strong tendency towards stream processing throughout various industries. Companies need, or least partially have to rethink their existing data architectures in order to enable near real-time business cases. Besides traditional database systems and batch-driven tooling, many companies put Apache Kafka® - the de facto standard for robust and scalable event streaming - to good use.

This talk explores the following four different ways to run Kafka on Azure:

* Kafka on HDInsight (open-source core Kafka)
* EventHubs (Microsoft's very own "Kafka look-alike")
* Confluent Cloud (a vendor-backed Kafka distribution)
* Kafka in Azure Kubernetes Service

You will walk away with a better understanding about the most important benefits, drawbacks and implications for each of the discussed alternatives.

Speaker Bio:

Hans-Peter (@hpgrahsl) is a technical trainer at NETCONOMY. As an independent engineer and consultant he helps customers to build cloud-based or on-premises data architectures using modern technology stacks and NoSQL data stores. He is also an associate lecturer for Software Engineering at CAMPUS 02 and is speaking at tech-related and developer conferences. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB ecosystems, Hans-Peter received the Confluent Community Catalyst award twice and became one of the founding members of the MongoDB Champions Program.

Event Page: https://globalazure.at/sessions/kafka/

Recording: https://www.youtube.com/watch?v=4AZrWmkRixE

Hans-Peter Grahsl

April 16, 2021
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. FOUR
    Different Ways
    of Working with
    Kafka on Azure
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    View Slide

  2. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 2

    View Slide

  3. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 3

    View Slide

  4. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 4

    View Slide

  5. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 5

    View Slide

  6. Diminishing Value of Data
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 6

    View Slide

  7. Diminishing Value of Data
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 7

    View Slide

  8. Diminishing Value of Data
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 8

    View Slide

  9. Diminishing Value of Data
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 9

    View Slide

  10. Hans-Peter Grahsl
    • based in Graz, Austria
    • technical trainer at NETCONOMY
    • independent engineer & consultant
    • Confluent Community Catalyst
    • MongoDB Champion
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 10

    View Slide

  11. Stream Processing
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 11

    View Slide

  12. "... data processing
    that is designed with
    infinite data sets
    in mind."
    — Tyler Akidau
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 12

    View Slide

  13. ‛ messaging
    ‛ integration
    ‛ processing
    plus storage
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 13

    View Slide

  14. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 14

    View Slide

  15. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 15

    View Slide

  16. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 16

    View Slide

  17. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 17

    View Slide

  18. central
    nervous system
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 18

    View Slide

  19. Kafka with
    Azure HDInsight
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 19

    View Slide

  20. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 20

    View Slide

  21. HDInsight Services "Family"
    • large-scale parallel batch processing
    • general purpose data warehousing
    • stream processing for IoT
    • data science & machine learning
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 21

    View Slide

  22. HDInsight Services "Family"
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 22

    View Slide

  23. HDInsight Services "Family"
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 23

    View Slide

  24. HDInsight
    Apache Kafka®
    • broker + zookeeper nodes
    • managed disks / storage
    • flexible provisioning
    • 99.9% SLA uptime
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 24

    View Slide

  25. ? Client Access ?
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 25

    View Slide

  26. ? Client Access ?
    YES:
    !
    when run in same VNet
    !
    with VNet peering + IP advertising
    !
    from on-premises with VPN gateway
    !
    by using Kafka REST proxy
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 26

    View Slide

  27. Apache Kafka® HDInsight
    • main benefits:

    easy provisioning with flexible pricing

    open-source Kafka components only

    supported by Microsoft SLAs
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 27

    View Slide

  28. Apache Kafka® HDInsight
    • main drawbacks:

    outdated version (Kafka 2.1.1)

    only "core" Kafka components

    per default no external broker access
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 28

    View Slide

  29. Azure
    Event Hubs
    for Kafka
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 29

    View Slide

  30. Azure Event Hubs
    • fully-managed PaaS
    • distributed event ingestion service
    • supports auto-scaling capabilities
    • well-integrated with complementary services
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 30

    View Slide

  31. The Big Picture
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 31

    View Slide

  32. Look-alikes
    "Conceptually, Kafka and Event Hubs are very similar:
    they're both partitioned logs built for streaming data,
    whereby the client controls which part of the retained log it
    wants to read."
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 32

    View Slide

  33. Event Hubs for Kafka
    • overlay on top of Event Hubs
    • protocol compatible with Kafka 1.0+
    • transparent re-use (code + tools)
    • migration benefits in both ways
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 33

    View Slide

  34. same same
    but different
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 34

    View Slide

  35. The Virtual Promise...
    "Update the connection string in configurations to point to
    the Kafka endpoint exposed by your event hub instead of
    pointing to your Kafka cluster. Then, you can start streaming
    events from your applications that use the Kafka protocol
    into Event Hubs."
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 35

    View Slide

  36. The devil is in the details
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 36

    View Slide

  37. Unsupported Kafka Features
    !
    idempotent producers & transactions
    !
    compression of messages
    !
    size-based retention or log compaction
    !
    HTTP access via Kafka REST proxy
    !
    Kafka Streams & ksqlDB connections
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 37

    View Slide

  38. Customer Feedback
    !
    10 hubs (=topics) per namespace
    https://bit.ly/3dvQCA1
    !
    1 MB message size limit
    https://bit.ly/3sQDlIN
    !
    no Kafka Streams / ksqlDB connections
    https://bit.ly/3mi4hyu
    https://bit.ly/39Huu4s
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 38

    View Slide

  39. Event Hubs for Kafka
    • main benefits:

    hybrid messaging scenarios OOTB

    auto-inflate for elastic scaling

    "Azure-native & Kafka-like" experience
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 39

    View Slide

  40. Event Hubs for Kafka
    • main drawbacks:

    fundamental Kafka (protocol) features missing

    selected quotas & limits ➜ show-stoppers ?

    Kafka Streams / ksqlDB clients unsupported
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 40

    View Slide

  41. Confluent Cloud
    on Azure
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 41

    View Slide

  42. Confluent Cloud
    • most complete and versatile service
    • cloud-native with elastic scalability
    • ready for hybrid & multi-cloud
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 42

    View Slide

  43. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 43

    View Slide

  44. Confluent Cloud
    hosts fully-managed:
    • Kafka Connect
    • 100+ Connectors
    • ksqlDB
    • Schema Registry
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 44

    View Slide

  45. Tiered Storage
    • currently unique to Confluent Cloud
    • infinite data growth
    • retention time unlimited
    !
    BUT NO Azure Blob Storage yet
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 45

    View Slide

  46. provisioning options
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 46

    View Slide

  47. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 47

    View Slide

  48. Confluent Cloud on Azure
    • main benefits:

    fully-managed Kafka by its original creators

    ready for hybrid- / multi-cloud

    widest & smoothest ecosystem integration
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 48

    View Slide

  49. Confluent Cloud on Azure
    • main drawbacks:

    compare pricing ➜ not cheap

    underlying infra not customizable

    higher degree of vendor dependence
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 49

    View Slide

  50. Kafka on
    Kubernetes
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 50

    View Slide

  51. Kubernetes
    • open-source container orchestration
    • deploying / managing / scaling
    • CNCF graduate project
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 51

    View Slide

  52. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 52

    View Slide

  53. AKS
    Azure Kubernetes Service
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 53

    View Slide

  54. remaining
    challenges:
    Network
    Storage
    Security
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 54

    View Slide

  55. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 55

    View Slide

  56. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 56

    View Slide

  57. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 57

    View Slide

  58. • Operators (cluster / topic / user)
    • Kafka Connect + managed Connectors
    • replication with MirrorMaker
    • HTTP Bridge for Kafka
    • Cruise Control cluster balancing
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 58

    View Slide

  59. Kafka on AKS with Strimzi
    • main benefits:

    k8s-native experience with built-in security

    tweakable / customizable in various ways

    ease of use for "non-ops-savvy folks" ➜ ME
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 59

    View Slide

  60. Kafka on AKS with Strimzi
    • main drawbacks:

    Kafka is OUR OWN responsibility

    k8s knowledge despite "operator magic"

    no Microsoft support offering
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 60

    View Slide

  61. don't just roll the dice...
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 61

    View Slide

  62. dig deeper & navigate further!
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 62

    View Slide

  63. Thanks!
    Q & A
    http://bit.ly/kafka-ga21
    @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    View Slide