Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Boston Meetup, Fev 2020

Boston Meetup, Fev 2020

Ricardo Ferreira

February 19, 2020
Tweet

More Decks by Ricardo Ferreira

Other Decks in Programming

Transcript

  1. Rediscovering the value of
    apache kafka® in modern
    data architectures
    @riferrei | #kafkameetup | @CONFLUENTINC

    View full-size slide

  2. About me
    @riferrei | @kafkameetup | @CONFLUENTINC
    • RICARDO FERREIRA
    • Works for confluent
    • Developer advocate
    [email protected]
    • HTTPS://RIFERREI.NET

    View full-size slide

  3. Origins of apache kafka
    @riferrei | @kafkameetup | @CONFLUENTINC
    ”there were lots of databases and
    other systems built to store data,
    but what was missing in our
    architecture was something that
    would help us to handle continuous
    flows of data.” – jay kreps

    View full-size slide

  4. @riferrei | @kafkameetup | @CONFLUENTINC

    View full-size slide

  5. @riferrei | @kafkameetup | @CONFLUENTINC
    First realization
    >
    I changed my job
    from oracle to
    confluent
    I work at
    confluent
    event state

    View full-size slide

  6. @riferrei | @kafkameetup | @CONFLUENTINC
    Events are both
    notification State transfer
    +

    View full-size slide

  7. @riferrei | @kafkameetup | @CONFLUENTINC
    Event-driven application
    Job change recommendation engine
    Search engine
    Email service

    View full-size slide

  8. @riferrei | @kafkameetup | @CONFLUENTINC
    SQL
    SQL
    SQL
    Recommendation engine
    Search engine
    Email service
    database
    LOG
    Let’s implement this!

    View full-size slide

  9. @riferrei | @kafkameetup | @CONFLUENTINC
    second realization
    database
    1000x more volume
    Non-transactional events
    Transactional events
    LOG

    View full-size slide

  10. Databases, 30
    years ago...

    View full-size slide

  11. Developer
    Databases,
    these days...

    View full-size slide

  12. @riferrei | @kafkameetup | @CONFLUENTINC
    Databases
    are limited

    View full-size slide

  13. Limited?
    Are you
    kidding me?

    View full-size slide

  14. @riferrei | @kafkameetup | @CONFLUENTINC
    ARE DATABASES LIMITED?
    YES, THEY ARE. WHY
    DO WE HAVE TO MOVE
    DATA FROM ONE DB TO
    ANOTHER JUST TO DO
    ANALYTICS?

    View full-size slide

  15. @riferrei | @kafkameetup | @CONFLUENTINC
    SHARED STATE = MORE DB’S
    Business line 1 Business line 2 Business line 3

    View full-size slide

  16. @riferrei | @kafkameetup | @CONFLUENTINC
    THIRD REALIZATION
    User
    tracking
    Historical
    data
    Operational
    metrics
    Nosql
    database
    Graph
    database
    Sql
    database
    microservices
    ...
    HADOOP
    Elastic
    search
    grafana
    Machine
    learning
    REC.
    ENGINE SEARCH SECURITY EMAIL
    SOCIAL
    GRAPH

    View full-size slide

  17. “The truth is the log.
    The database is a cache
    of a subset of the log.”
    — pat helland
    Immutability changes everything
    http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

    View full-size slide

  18. @riferrei | @kafkameetup | @CONFLUENTINC
    log as first-class citizen
    database
    LOG
    0 1 2 3 4 5 6 7 8
    LOG
    reads
    writes
    Destination System a
    (time = 1)
    Destination System b
    (time = 3)

    View full-size slide

  19. @riferrei | @kafkameetup | @CONFLUENTINC
    SOLUTION: BUILD A COMMIT LOG
    Commit LOG
    User
    tracking
    Historical
    data
    Operational
    metrics
    Nosql
    database
    Graph
    database
    Sql
    database
    microservices
    ...
    HADOOP
    Elastic
    search
    grafana
    Machine
    learning
    REC.
    ENGINE SEARCH SECURITY EMAIL
    SOCIAL
    GRAPH

    View full-size slide

  20. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
    Do you THINK is a table
    that you RUN QUERIES?

    View full-size slide

  21. @riferrei | @kafkameetup | @CONFLUENTINC
    STREAMS AND TABLES DUALITY
    {"user":"riferrei","score":"1001"}
    {"user":"riferrei","score":"1002"}
    {"user":"riferrei","score":"1003"}
    {"user":"riferrei","score":"1004"}
    {"user":"riferrei","score":"1005"}
    {"user":"riferrei","score":"1005"}
    stream
    table

    View full-size slide

  22. Origins of apache kafka
    @riferrei | @kafkameetup | @CONFLUENTINC
    ”WE’VE COME TO THINK OF KAFKA AS A
    STREAMING PLATFORM: A SYSTEM THAT
    LETS YOU PUBLISH AND SUBSCRIBE TO
    STREAMS OF DATA, STORE THEM, AND
    PROCESS THEM, AND THAT IS EXACTLY
    WHAT APACHE KAFKA IS BUILT TO BE.”
    – jay kreps

    View full-size slide

  23. @riferrei | @kafkameetup | @CONFLUENTINC
    ORIGINS OF APACHE KAFKA
    Databases Messaging
    Batch
    Expensive
    Time Consuming
    Difficult to Scale
    No Persistence After
    Consumption
    No Replay
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)

    View full-size slide

  24. @riferrei | @kafkameetup | @CONFLUENTINC
    ORIGINS OF APACHE KAFKA
    Databases Messaging
    Batch
    Expensive
    Time Consuming
    Difficult to Scale
    No Persistence After
    Consumption
    No Replay
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    Distributed
    Commit log

    View full-size slide

  25. @riferrei | @kafkameetup | @CONFLUENTINC
    ORIGINS OF APACHE KAFKA
    Databases Messaging
    Batch
    Expensive
    Time Consuming
    Difficult to Scale
    No Persistence After
    Consumption
    No Replay
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    Highly Scalable
    Durable
    Persistent
    Ordered
    Fast (Low Latency)
    Stream processing
    Continuous flows
    Scalable integration
    Distributed
    Streaming platform

    View full-size slide

  26. @riferrei | @confluentinc | @itau

    View full-size slide

  27. Origins of apache kafka
    @riferrei | @kafkameetup | @CONFLUENTINC
    ”the ability to combine these three
    areas – to bring all the streams of
    data together across all the use
    cases – is what makes the idea of a
    streaming platform so appealing
    to people” – jay kreps

    View full-size slide

  28. 01
    Well done
    messaging
    02
    Durable
    storage
    03
    Stream
    processing
    WHAT IS APACHE KAFKA?

    View full-size slide

  29. @riferrei | @kafkameetup | @CONFLUENTINC
    Time for some fun
    1. Get the game 2. Name yourself

    View full-size slide

  30. @riferrei | @KAFKAMEETUP | @CONFLUENTINC
    https://github.com/confluentinc/demo-scene
    <>
    Source-code

    View full-size slide

  31. @riferrei | @kafkameetup | @CONFLUENTINC
    Source: USER_GAME TOPIC

    View full-size slide

  32. @riferrei | @kafkameetup | @CONFLUENTINC
    Creating User_game stream

    View full-size slide

  33. @riferrei | @kafkameetup | @CONFLUENTINC
    Querying USER_GAME STREAM

    View full-size slide

  34. @riferrei | @kafkameetup | @CONFLUENTINC
    Creating Stats_per_user table

    View full-size slide

  35. @riferrei | @kafkameetup | @CONFLUENTINC
    Querying STATS_PER_USER TABLE

    View full-size slide

  36. @riferrei | @kafkameetup | @CONFLUENTINC
    Low latency Pull queries

    View full-size slide

  37. @riferrei | @kafkameetup | @CONFLUENTINC
    Source: User_losses topic

    View full-size slide

  38. @riferrei | @kafkameetup | @CONFLUENTINC
    Creating USER_LOSSES STREAM

    View full-size slide

  39. @riferrei | @kafkameetup | @CONFLUENTINC
    querying USER_LOSSES STREAM

    View full-size slide

  40. @riferrei | @kafkameetup | @CONFLUENTINC
    Creating LOSSES_PER_USER TABLE

    View full-size slide

  41. @riferrei | @kafkameetup | @CONFLUENTINC
    Querying LOSSES_PER_USER TABLE

    View full-size slide

  42. @riferrei | @kafkameetup | @CONFLUENTINC
    Creating SCOREBOARD TABLE

    View full-size slide

  43. @riferrei | @kafkameetup | @CONFLUENTINC
    Querying SCOREBOARD TABLE

    View full-size slide

  44. @riferrei | @kafkameetup | @CONFLUENTINC
    Complete scoreboard
    USER_GAME
    USER_losses
    Stats_per_user
    losses_per_user
    SCOREBOARD
    storage process storage process storage

    View full-size slide

  45. @riferrei | @kafkameetup | @CONFLUENTINC
    how can I
    learn more?

    View full-size slide

  46. @riferrei | @kafkameetup | @CONFLUENTINC
    Get kafka: confluent cloud
    Try free:
    https://cnfl.io/confluent-cloud

    View full-size slide

  47. @riferrei | @kafkameetup | @CONFLUENTINC
    https://cnfl.io/tutorials
    Get examples: kafka tutorials

    View full-size slide

  48. @riferrei | @kafkameetup | @CONFLUENTINC
    https://cnfl.io/books
    Get books: o’reilly bundle

    View full-size slide

  49. @riferrei | @kafkameetup | @CONFLUENTINC
    https://kafka-summit.org/events/kafka-summit-austin-2020
    join kafka summit
    https://myeventi.events/kafka20/aus
    Use 25% discount code: KSL20Meetup

    View full-size slide

  50. @riferrei | @kafkameetup | @CONFLUENTINC
    Thank you

    View full-size slide