Upgrade to Pro — share decks privately, control downloads, hide ads and more …

4 Different Ways of Working with Kafka on Azure @ Global Azure 2021

4 Different Ways of Working with Kafka on Azure @ Global Azure 2021

Abstract:

Life doesn't happen in batch mode, which is why for several years now, we see a very strong tendency towards stream processing throughout various industries. Companies need, or least partially have to rethink their existing data architectures in order to enable near real-time business cases. Besides traditional database systems and batch-driven tooling, many companies put Apache Kafka® - the de facto standard for robust and scalable event streaming - to good use.

This talk explores the following four different ways to run Kafka on Azure:

* Kafka on HDInsight (open-source core Kafka)
* EventHubs (Microsoft's very own "Kafka look-alike")
* Confluent Cloud (a vendor-backed Kafka distribution)
* Kafka in Azure Kubernetes Service

You will walk away with a better understanding about the most important benefits, drawbacks and implications for each of the discussed alternatives.

Speaker Bio:

Hans-Peter (@hpgrahsl) is a technical trainer at NETCONOMY. As an independent engineer and consultant he helps customers to build cloud-based or on-premises data architectures using modern technology stacks and NoSQL data stores. He is also an associate lecturer for Software Engineering at CAMPUS 02 and is speaking at tech-related and developer conferences. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB ecosystems, Hans-Peter received the Confluent Community Catalyst award twice and became one of the founding members of the MongoDB Champions Program.

Event Page: https://globalazure.at/sessions/kafka/

Recording: https://www.youtube.com/watch?v=4AZrWmkRixE

744f1c2c6cbea2ff5104b0ac512936bd?s=128

Hans-Peter Grahsl

April 16, 2021
Tweet

Transcript

  1. FOUR Different Ways of Working with Kafka on Azure @hpgrahsl

    | @Azure #GlobalAzure, 16th April 2021, Linz - Austria
  2. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    2
  3. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    3
  4. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    4
  5. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    5
  6. Diminishing Value of Data @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 6
  7. Diminishing Value of Data @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 7
  8. Diminishing Value of Data @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 8
  9. Diminishing Value of Data @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 9
  10. Hans-Peter Grahsl • based in Graz, Austria • technical trainer

    at NETCONOMY • independent engineer & consultant • Confluent Community Catalyst • MongoDB Champion @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 10
  11. Stream Processing @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz

    - Austria 11
  12. "... data processing that is designed with infinite data sets

    in mind." — Tyler Akidau @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 12
  13. ‛ messaging ‛ integration ‛ processing plus storage @hpgrahsl |

    @Azure #GlobalAzure, 16th April 2021, Linz - Austria 13
  14. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    14
  15. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    15
  16. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    16
  17. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    17
  18. central nervous system @hpgrahsl | @Azure #GlobalAzure, 16th April 2021,

    Linz - Austria 18
  19. Kafka with Azure HDInsight @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 19
  20. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    20
  21. HDInsight Services "Family" • large-scale parallel batch processing • general

    purpose data warehousing • stream processing for IoT • data science & machine learning @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 21
  22. HDInsight Services "Family" @hpgrahsl | @Azure #GlobalAzure, 16th April 2021,

    Linz - Austria 22
  23. HDInsight Services "Family" @hpgrahsl | @Azure #GlobalAzure, 16th April 2021,

    Linz - Austria 23
  24. HDInsight Apache Kafka® • broker + zookeeper nodes • managed

    disks / storage • flexible provisioning • 99.9% SLA uptime @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 24
  25. ? Client Access ? @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 25
  26. ? Client Access ? YES: ! when run in same

    VNet ! with VNet peering + IP advertising ! from on-premises with VPN gateway ! by using Kafka REST proxy @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 26
  27. Apache Kafka® HDInsight • main benefits: ✅ easy provisioning with

    flexible pricing ✅ open-source Kafka components only ✅ supported by Microsoft SLAs @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 27
  28. Apache Kafka® HDInsight • main drawbacks: ⛔ outdated version (Kafka

    2.1.1) ⛔ only "core" Kafka components ⛔ per default no external broker access @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 28
  29. Azure Event Hubs for Kafka @hpgrahsl | @Azure #GlobalAzure, 16th

    April 2021, Linz - Austria 29
  30. Azure Event Hubs • fully-managed PaaS • distributed event ingestion

    service • supports auto-scaling capabilities • well-integrated with complementary services @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 30
  31. The Big Picture @hpgrahsl | @Azure #GlobalAzure, 16th April 2021,

    Linz - Austria 31
  32. Look-alikes "Conceptually, Kafka and Event Hubs are very similar: they're

    both partitioned logs built for streaming data, whereby the client controls which part of the retained log it wants to read." @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 32
  33. Event Hubs for Kafka • overlay on top of Event

    Hubs • protocol compatible with Kafka 1.0+ • transparent re-use (code + tools) • migration benefits in both ways @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 33
  34. same same but different @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 34
  35. The Virtual Promise... "Update the connection string in configurations to

    point to the Kafka endpoint exposed by your event hub instead of pointing to your Kafka cluster. Then, you can start streaming events from your applications that use the Kafka protocol into Event Hubs." @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 35
  36. The devil is in the details @hpgrahsl | @Azure #GlobalAzure,

    16th April 2021, Linz - Austria 36
  37. Unsupported Kafka Features ! idempotent producers & transactions ! compression

    of messages ! size-based retention or log compaction ! HTTP access via Kafka REST proxy ! Kafka Streams & ksqlDB connections @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 37
  38. Customer Feedback ! 10 hubs (=topics) per namespace https://bit.ly/3dvQCA1 !

    1 MB message size limit https://bit.ly/3sQDlIN ! no Kafka Streams / ksqlDB connections https://bit.ly/3mi4hyu https://bit.ly/39Huu4s @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 38
  39. Event Hubs for Kafka • main benefits: ✅ hybrid messaging

    scenarios OOTB ✅ auto-inflate for elastic scaling ✅ "Azure-native & Kafka-like" experience @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 39
  40. Event Hubs for Kafka • main drawbacks: ⛔ fundamental Kafka

    (protocol) features missing ⛔ selected quotas & limits ➜ show-stoppers ? ⛔ Kafka Streams / ksqlDB clients unsupported @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 40
  41. Confluent Cloud on Azure @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 41
  42. Confluent Cloud • most complete and versatile service • cloud-native

    with elastic scalability • ready for hybrid & multi-cloud @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 42
  43. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    43
  44. Confluent Cloud hosts fully-managed: • Kafka Connect • 100+ Connectors

    • ksqlDB • Schema Registry @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 44
  45. Tiered Storage • currently unique to Confluent Cloud • infinite

    data growth • retention time unlimited ! BUT NO Azure Blob Storage yet @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 45
  46. provisioning options @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz

    - Austria 46
  47. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    47
  48. Confluent Cloud on Azure • main benefits: ✅ fully-managed Kafka

    by its original creators ✅ ready for hybrid- / multi-cloud ✅ widest & smoothest ecosystem integration @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 48
  49. Confluent Cloud on Azure • main drawbacks: ⛔ compare pricing

    ➜ not cheap ⛔ underlying infra not customizable ⛔ higher degree of vendor dependence @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 49
  50. Kafka on Kubernetes @hpgrahsl | @Azure #GlobalAzure, 16th April 2021,

    Linz - Austria 50
  51. Kubernetes • open-source container orchestration • deploying / managing /

    scaling • CNCF graduate project @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 51
  52. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    52
  53. AKS Azure Kubernetes Service @hpgrahsl | @Azure #GlobalAzure, 16th April

    2021, Linz - Austria 53
  54. remaining challenges: Network Storage Security @hpgrahsl | @Azure #GlobalAzure, 16th

    April 2021, Linz - Austria 54
  55. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    55
  56. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    56
  57. @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria

    57
  58. • Operators (cluster / topic / user) • Kafka Connect

    + managed Connectors • replication with MirrorMaker • HTTP Bridge for Kafka • Cruise Control cluster balancing @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 58
  59. Kafka on AKS with Strimzi • main benefits: ✅ k8s-native

    experience with built-in security ✅ tweakable / customizable in various ways ✅ ease of use for "non-ops-savvy folks" ➜ ME @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 59
  60. Kafka on AKS with Strimzi • main drawbacks: ⛔ Kafka

    is OUR OWN responsibility ⛔ k8s knowledge despite "operator magic" ⛔ no Microsoft support offering @hpgrahsl | @Azure #GlobalAzure, 16th April 2021, Linz - Austria 60
  61. don't just roll the dice... @hpgrahsl | @Azure #GlobalAzure, 16th

    April 2021, Linz - Austria 61
  62. dig deeper & navigate further! @hpgrahsl | @Azure #GlobalAzure, 16th

    April 2021, Linz - Austria 62
  63. Thanks! Q & A http://bit.ly/kafka-ga21 @hpgrahsl | @Azure #GlobalAzure, 16th

    April 2021, Linz - Austria