Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Everything You Wanted to Know About Apache Kafka But Were Too Afraid to Ask!

Everything You Wanted to Know About Apache Kafka But Were Too Afraid to Ask!

Presented in the Data Science Meetup in San Antonio, TX.

Ricardo Ferreira

June 27, 2019
Tweet

More Decks by Ricardo Ferreira

Other Decks in Programming

Transcript

  1. Join the Confluent
    Community Slack
    Subscribe to the
    Confluent blog
    cnfl.io/slack cnfl.io/read
    Welcome to the San Antonio Apache Kafka® Meetup!
    6:00pm
    Doors open
    6:00pm - 6:30pm
    Pizza, Drinks and Networking
    6:30pm - 7:30pm
    Ricardo Ferreira, Confluent
    7:30pm - 7:45pm
    Additional Q&A & Networking
    Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no
    affiliation with and does not endorse the materials provided at this event.

    View full-size slide

  2. @riferrei | #kafkameetup | @CONFLUENTINC
    Everything you wanted
    to know about kafka
    But you were too afraid to ask!
    @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  3. @riferrei | #kafkameetup | @CONFLUENTINC
    Wakanda? Forever!
    Hulk? Smash!
    Apache
    Kafka?
    Its Like
    Messaging?

    View full-size slide

  4. About Us:
    ● Ricardo Ferreira
    ❑ Developer Advocate @ Confluent
    ❑ Ex-Oracle, Red Hat, IONA Tech
    [email protected]
    ❑ https://riferrei.net
    ● Alexa (echo dot)
    ❑ The voice behind Amazon
    ❑ Ex-Raspberry Pi, Arduino
    ❑ She is a female in character!
    @riferrei
    @alexa99

    View full-size slide

  5. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    What is a distributed
    streaming platform?

    View full-size slide

  6. @riferrei | #kafkameetup | @CONFLUENTINC
    ? ? ?

    View full-size slide

  7. @riferrei | #kafkameetup | @CONFLUENTINC
    Let's do some
    time travel
    Shaw we?

    View full-size slide

  8. @riferrei | #kafkameetup | @CONFLUENTINC
    SQL DBs 25 years
    ago…
    SQL DBs Today
    Dude, you're embarrassing
    me in front of the wizards…

    View full-size slide

  9. @riferrei | #kafkameetup | @CONFLUENTINC
    ETL / Batch

    View full-size slide

  10. @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  11. @riferrei | #kafkameetup | @CONFLUENTINC
    What does it cost
    extracting data from
    the transactional DB?

    View full-size slide

  12. @riferrei | #kafkameetup | @CONFLUENTINC
    latency

    View full-size slide

  13. @riferrei | #kafkameetup | @CONFLUENTINC
    plumbing

    View full-size slide

  14. @riferrei | #kafkameetup | @CONFLUENTINC
    Mgmt
    complexity

    View full-size slide

  15. @riferrei | #kafkameetup | @CONFLUENTINC
    15
    Industry
    solutions

    View full-size slide

  16. @riferrei | #kafkameetup | @CONFLUENTINC
    Solution for
    "Combining"
    Processing and
    Data: NoSQL
    Solution for
    Large Amounts
    of Data: big
    data

    View full-size slide

  17. @riferrei | #kafkameetup | @CONFLUENTINC
    How about
    Messaging?

    View full-size slide

  18. @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  19. @riferrei | #kafkameetup | @CONFLUENTINC
    Let's do some
    time travel
    Shaw we?

    View full-size slide

  20. @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  21. @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  22. @riferrei | #kafkameetup | @CONFLUENTINC
    “The truth is the log.
    The database is a cache
    of a subset of the log.”
    — pat helland
    Immutability changes everything
    http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

    View full-size slide

  23. 23
    ETL/Data Integration Messaging
    Batch
    Expensive
    Time Consuming
    Difficult to Scale
    No Persistence After
    Consumption
    No Replay
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    What is happening
    in the world
    What happened
    in the world

    View full-size slide

  24. 24
    ETL/Data Integration Messaging
    Batch
    Expensive
    Time Consuming
    Difficult to Scale
    No Persistence After
    Consumption
    No Replay
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    What is happening
    in the world
    What happened
    in the world
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    Event Streaming Thinking

    View full-size slide

  25. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    What is a distributed
    streaming platform?

    View full-size slide

  26. @riferrei | #kafkameetup | @CONFLUENTINC
    01
    Messaging
    done right
    02
    Scalable data
    pipelines
    03
    Stream
    processing
    @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  27. 27
    Event-Driven App
    (Location Tracking)
    Only Real-Time Events
    Messaging Queues and
    pub/sub Platforms
    can do this
    Contextual
    Event-Driven App
    (ETA)
    Real-Time combined
    with stored data
    Only Streaming data
    Platforms can do
    this
    Where is my
    driver?
    When will my driver
    get here?
    Where is my
    driver?
    When will my
    driver arrive?
    Why Combine Real-time
    With Historical Context?
    2
    min

    View full-size slide

  28. @riferrei | #kafkameetup | @CONFLUENTINC
    28
    demo

    View full-size slide

  29. @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  30. @riferrei | #kafkameetup | @CONFLUENTINC
    https://github.com/riferrei/the-song-is

    View full-size slide

  31. @riferrei | #kafkameetup | @CONFLUENTINC
    Have you
    EVER HEARD ABOUT THE
    #ASKCONFLUENT INITIATIVE?

    View full-size slide

  32. @riferrei | #kafkameetup | @CONFLUENTINC
    There was an idea…

    View full-size slide

  33. @riferrei | #kafkameetup | @CONFLUENTINC
    TO BRING TOGETHER A GROUP
    OF REMARKABLE PEOPLE

    View full-size slide

  34. @riferrei | #kafkameetup | @CONFLUENTINC
    THAT COULD ANSWER
    THE QUESTIONS…

    View full-size slide

  35. @riferrei | #kafkameetup | @CONFLUENTINC
    THAT they NEVER COULD...

    View full-size slide

  36. @riferrei | #kafkameetup | @CONFLUENTINC
    @tlberglund @gwenshap

    View full-size slide

  37. @riferrei | #kafkameetup | @CONFLUENTINC
    https://www.youtube.com/playlist?list=
    PLa7VYi0yPIH0snucuYWkuUXwasMr-HR7Y
    Kafka is
    so Cool!

    View full-size slide

  38. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    IS THERE SUCH A THING OF
    OVERSUBSCRIPING KAFKA?

    View full-size slide

  39. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    WHAT IS MOST COSTLY...
    UP CONVERT OR DOWN CONVERT?

    View full-size slide

  40. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    HOW MANY PARTITIONS
    A TOPIC CAN HAVE?

    View full-size slide

  41. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    WHAT ARE THE PROS AND CONS
    OF THE EXACTLY-ONCE FEATURE?

    View full-size slide

  42. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    WHAT IS THE IMPACT OF CONSUMER
    APPS BEING DOWN FOR 4 DAYS?

    View full-size slide

  43. @riferrei | #kafkameetup | @CONFLUENTINC
    Question:
    NOT USING KEYS WHILE PRODUCING
    RECORDS IS A BAD PRACTICE?

    View full-size slide

  44. @riferrei | #kafkameetup | @CONFLUENTINC
    45
    announcements

    View full-size slide

  45. NOMINATE YOURSELF OR A PEER AT
    CONFLUENT.IO/NOMINATE

    View full-size slide

  46. KS19Meetup.
    CONFLUENT COMMUNITY DISCOUNT CODE
    25% OFF*
    *Standard Priced Conference pass

    View full-size slide

  47. @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide