Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Stream Processing the Right Way in the Cloud using Apache Kafka

Making Stream Processing the Right Way in the Cloud using Apache Kafka

Nov 13th, 2018: Apache Kafka® Meetup in Raleigh, NC

Ricardo Ferreira

November 13, 2018
Tweet

More Decks by Ricardo Ferreira

Other Decks in Technology

Transcript

  1. 1 Making Stream Processing the Right Way in the Cloud

    using Apache Kafka® Raleigh Apache Kafka Meetup Nov 13th, 2018 | Ricardo Ferreira [email protected] @riferrei
  2. 4 …and how his story will help us with Kafka

    in the Cloud. Bob Sheppard, 24 years
  3. 5 In the following events – Bob will be depicted

    in the story as a regular Software Engineer.
  4. 6 One day during a sprint planning… Product Owner: "Guys,

    I need an estimate about how long would take to put Apache Kafka in the Cloud" Software Engineer: "Sure Mr. Product Owner. Give me a day so I can come up with something solid to include in your plan." Product Owner: "And be as accurate as possible because DEV needs this for coding." "Hahaha"
  5. 7 Workflow of the Software Engineer evaluating the complexity… Hello

    World with Apache Kafka on his laptop. Easy peasy! Handling data with producers and consumers. Done! Enough from this child thing: loading data from PROD Updating my LinkedIn profile with a new Skill… Sweet: Confluent has several Docker Images pre-built. Build a Docker image from the Hello World and run in the Cloud
  6. 9 During the next day meeting… Product Owner: "That is

    fantastic. Gonna write this down here with blood and fire, plus recording all…" Software Engineer: "Mr. Product Owner, I can easily put Apache Kafka in the Cloud in a couple weeks…" "Hahaha"
  7. 10 …and throughout the two weeks, there was lots of

    interesting findings… Zookeeper ports cannot be exposed? Keeping data secure while in-transit? Keeping data secure at-rest? Plan ahead IP addresses to avoid overlapping? Enabling remote SSH access? Developer authentication should use BASIC AUTH or OAuth 2.0? Partitions auto- rebalancing? How do I prevent people to change the infrastructure? Who should own the infrastructure? How do I manage the upgrades of the Apache Kafka binaries? SOC-2 Type II What if I have a hybrid center of gravity? Avoid vendor lock-in or sell my soul to XYZ? Scale in or scale out? And without downtime?
  8. 13 If you didn't get the joke, please grab your

    phone and buy this movie now. As an IT professional, you owe yourself this.
  9. 14 Bob's story teaches us three important lessons: You don't

    want to build infrastructure. You want to build software. 1 Distributed Systems are hard to manage. And Cloud makes it even harder. 2 Do you really know who Bob was in this story? No? Then check it out… 3
  10. 15 About me: • Developer Advocate @ Confluent • Having

    Fun with Coding since 1997 • Ex-Oracle, Red Hat, IONA Technologies • Blog: https://riferrei.net • Twitter: @riferrei o Geek stuff and Apache Kafka® o DC/Marvel, Mindhunter Book • Brazilian, Husband and Father
  11. 16 Options for Apache Kafka in the Cloud Running and

    Managing by Yourself 1 Running with Docker and Kubernetes 2 Infrastructure as Code, Ansible, Chef 3 Apache Kafka as a Service 4
  12. 17 Running and Managing by Yourself Running and Managing by

    Yourself 1 • Pretty much you need to build everything; except of course for Apache Kafka. • Pros: o Ability to own and control everything. o Lots of learning sources out there. • Cons: o Extremely complex to manage and scale. o High cost with TCO, infrastructure costs.
  13. 18 Running with Docker and Kubernetes Running with Docker and

    Kubernetes 2 • Makes transparent where Apache Kafka is running; whether On-Premise or in the Cloud. • Pros: o Abstracts cluster and deployment details. o Tooling support: Confluent Operator, Pivotal PKS, Red Hat OpenShift, Cloud Extensions. • Cons: o Underlying infrastructure is still necessary. o There is people and infrastructure costs.
  14. 19 Running with Docker and Kubernetes • Makes transparent where

    Apache Kafka is running; whether On-Premise or in the Cloud. • Pros: o Abstracts cluster and deployment details. o Tooling support: Confluent Operator, Pivotal PKS, Red Hat OpenShift, Cloud Extensions. • Cons: o Underlying infrastructure is still necessary. o There is people and infrastructure costs.
  15. 20 Infrastructure as Code, Ansible, Chef Infrastructure as Code, Ansible,

    Chef 3 • Makes transparent where Apache Kafka is running; as well as all the infrastructure details. • Pros: o Abstracts cluster and deployment details. o Creates a immutable, repeatable, disposable infrastructure that can be managed as code. • Cons: o Vendor lock-in, provisioning complexity. o There is people and infrastructure costs.
  16. 21 Apache Kafka as a Service Apache Kafka as a

    Service 4 • Apache Kafka is delivered to you as SaaS; leaving more time for coding and innovation. • Pros: o SLAs about performance and availability. o Pay-as-you-Go, predictable costs up ahead. o Multiple Cloud vendor choices = No lock-in. • Cons: o You don't have control over anything. o Something new to learn and be good at it.
  17. 22 Apache Kafka as a Service = Confluent Cloud™ •

    Meet Confluent Cloud: a scalable streaming data service based on Apache Kafka that is delivered to you 100% as a service. • Supported by the best Kafka engineers. • Best hybrid center of gravity support. • Multi-Cloud support: AWS, GCP, Others. • Apache Kafka APIs = No proprietary stuff. • Access to the Confluent Platform Tools • Professional and Enterprise Plans.
  18. 24 Confluent Cloud Tools Project • How about using the

    Confluent Platform ecosystem in a snap with your cluster? • Say hello to Confluent Cloud Tools: • https://github.com/confluentinc/ccloud-tools • Creates a highly-available, Multi-AZ, fully secure within a VPC – set of tools that are automatically connected into you Confluent Cloud cluster.
  19. 25 Confluent Cloud Tools Project VPC VPC Availability Zone 1

    Private Subnet Schema Registry 1 Private Subnet REST Proxy 1 Private Subnet KSQL Server 1 Availability Zone 2 Private Subnet Schema Registry 2 Private Subnet REST Proxy 2 Private Subnet KSQL Server 2 Multi AZ for HA Public Subnet Public Internet VPC Peering OR Scaling Out Across AZ's Schema Registry REST Proxy KSQL Server Bastion Server
  20. 29 Confluent Cloud "The Cube" Demo Name Motion (X) Motion

    (Y) Motion (Z) Alice -45 0 12 Bob 1 -185 -90 John 0 -1 90 Steve -180 0 180 Number X Y Z 1 1 0 0 2 1 -90 1 3 -180 0 180 4 1 90 -1 Stream of Events Numbers Table Getting the Number 3 Time
  21. 30 Confluent Cloud "The Cube" Demo VPC VPC KSQL CLI

    SELECT CONCAT('AND THE WINNER IS ----------> ', NAME) AS MESSAGE FROM SELECTED_WINNERS;