Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[DevNexus-2018] Apache Kafka A Streaming Data Platform

Viktor Gamov
February 22, 2018

[DevNexus-2018] Apache Kafka A Streaming Data Platform

Viktor Gamov

February 22, 2018
Tweet

More Decks by Viktor Gamov

Other Decks in Technology

Transcript

  1. @
    Apache Kafka
    A Streaming Data Platform

    View Slide

  2. @
    @gamussa @confluentinc
    Who am I?

    View Slide

  3. @
    @gamussa @confluentinc
    Solutions Architect
    Who am I?

    View Slide

  4. @
    @gamussa @confluentinc
    Solutions Architect
    Developer Advocate
    Who am I?

    View Slide

  5. @
    @gamussa @confluentinc
    Solutions Architect
    Developer Advocate
    @gamussa in internetz
    Who am I?

    View Slide

  6. @
    @gamussa @confluentinc
    Solutions Architect
    Developer Advocate
    @gamussa in internetz
    Hey you, yes, you,
    go follow me in twitter ©
    Who am I?

    View Slide

  7. @
    @gamussa @confluentinc

    View Slide

  8. @
    @gamussa @confluentinc
    A company is build on

    View Slide

  9. @
    @gamussa @confluentinc
    A company is build on
    DATA FLOWS
    but
    All we have is
    DATA STORES

    View Slide

  10. @
    @gamussa @confluentinc

    View Slide

  11. @
    @gamussa @confluentinc

    View Slide

  12. @
    @gamussa @confluentinc

    View Slide

  13. @
    @gamussa @confluentinc

    View Slide

  14. @
    @gamussa @confluentinc

    View Slide

  15. @
    @gamussa @confluentinc

    View Slide

  16. @
    @gamussa @confluentinc
    Streaming Platform
    1. Pub/Sub
    2. Store
    3. Process

    View Slide

  17. @
    @gamussa @confluentinc
    Streaming Platform
    1. Pub/Sub
    2. Store
    3. Process

    View Slide

  18. @
    @gamussa @confluentinc
    Core abstraction

    View Slide

  19. @
    @gamussa @confluentinc
    Core abstraction
    DB - table

    View Slide

  20. @
    @gamussa @confluentinc
    Core abstraction
    DB - table
    Hadoop - file

    View Slide

  21. @
    @gamussa @confluentinc
    Core abstraction
    DB - table
    Hadoop - file
    Messaging -?

    View Slide

  22. @
    @gamussa @confluentinc
    LOGS

    View Slide

  23. @
    @gamussa @confluentinc
    Producing to Kafka
    Time

    View Slide

  24. @
    @gamussa @confluentinc
    Producing to Kafka
    Time
    C C
    C

    View Slide

  25. @
    @gamussa @confluentinc
    Producing to Kafka - With Key
    Time
    A
    B
    C
    D
    hash(key) %
    numPartitions = N

    View Slide

  26. @
    @gamussa @confluentinc
    Producing to Kafka - No Key
    Time
    Messages will be produced in a
    round robin fashion

    View Slide

  27. @
    @gamussa @confluentinc
    Producing to Kafka - No Key
    Time
    Messages will be produced in a
    round robin fashion

    View Slide

  28. @
    @gamussa @confluentinc
    Producing to Kafka - No Key
    Time
    Messages will be produced in a
    round robin fashion

    View Slide

  29. @
    @gamussa @confluentinc
    Producing to Kafka - No Key
    Time
    Messages will be produced in a
    round robin fashion

    View Slide

  30. @
    @gamussa @confluentinc
    Consuming From Kafka - Single Consumer
    C

    View Slide

  31. @
    @gamussa @confluentinc
    Consuming From Kafka - Grouped Consumers
    C
    C
    C1
    C
    C
    C2

    View Slide

  32. @
    @gamussa @confluentinc
    Consuming From Kafka - Grouped Consumers
    C C
    C C

    View Slide

  33. @
    @gamussa @confluentinc
    Consuming From Kafka - Grouped Consumers
    0 1
    2 3

    View Slide

  34. @
    @gamussa @confluentinc
    Consuming From Kafka - Grouped Consumers
    0 1
    2 3

    View Slide

  35. @
    @gamussa @confluentinc
    Consuming From Kafka - Grouped Consumers
    0, 3 1
    2 3

    View Slide

  36. @
    @gamussa @confluentinc
    Producers Consumers

    View Slide

  37. @
    @gamussa @confluentinc

    View Slide

  38. @
    @gamussa @confluentinc

    View Slide

  39. @
    @gamussa @confluentinc

    View Slide

  40. @
    @gamussa @confluentinc
    Kafka Connect does hard work so you don’t

    View Slide

  41. @
    @gamussa @confluentinc
    Kafka Connect does hard work so you don’t
    1. Scale out

    View Slide

  42. @
    @gamussa @confluentinc
    Kafka Connect does hard work so you don’t
    1. Scale out

    View Slide

  43. @
    @gamussa @confluentinc
    Kafka Connect does hard work so you don’t
    1. Scale out

    View Slide

  44. @
    @gamussa @confluentinc
    Kafka Connect does hard work so you don’t
    1. Scale out

    View Slide

  45. @
    @gamussa @confluentinc

    View Slide

  46. @
    @gamussa @confluentinc

    View Slide

  47. @
    @gamussa @confluentinc

    View Slide

  48. @
    @gamussa @confluentinc

    View Slide

  49. @
    @gamussa @confluentinc
    Streaming Platform
    1. Pub/Sub
    2. Store
    3. Process

    View Slide

  50. @
    @gamussa @confluentinc
    Why
    Store?

    View Slide

  51. @
    @gamussa @confluentinc
    Scalability of a filesystem

    View Slide

  52. @
    @gamussa @confluentinc
    Scalability of a filesystem
    Throughput 100s mb/s

    View Slide

  53. @
    @gamussa @confluentinc
    Scalability of a filesystem
    Throughput 100s mb/s
    TBs per server

    View Slide

  54. @
    @gamussa @confluentinc
    Scalability of a filesystem
    Throughput 100s mb/s
    TBs per server
    Commodity Hardware

    View Slide

  55. @
    @gamussa @confluentinc
    Scalability of a filesystem
    Throughput 100s mb/s
    TBs per server
    Commodity Hardware
    O(1) writes

    View Slide

  56. @
    @gamussa @confluentinc
    Guarantees of a database

    View Slide

  57. @
    @gamussa @confluentinc
    Guarantees of a database
    Persistence

    View Slide

  58. @
    @gamussa @confluentinc
    Guarantees of a database
    Persistence
    Strict ordering

    View Slide

  59. @
    @gamussa @confluentinc
    Distributed by Design

    View Slide

  60. @
    @gamussa @confluentinc
    Replication
    Distributed by Design

    View Slide

  61. @
    @gamussa @confluentinc
    Replication
    Fault Tolerance
    Distributed by Design

    View Slide

  62. @
    @gamussa @confluentinc
    Replication
    Fault Tolerance
    Partitioning
    Distributed by Design

    View Slide

  63. @
    @gamussa @confluentinc
    Replication
    Fault Tolerance
    Partitioning
    Scale
    Distributed by Design

    View Slide

  64. @
    @gamussa @confluentinc

    View Slide

  65. @
    @gamussa @confluentinc
    Partition Leadership and Replication
    Broker 1
    Topic1
    partition1
    Broker 2 Broker 3 Broker 4
    Topic1
    partition1
    Topic1
    partition1
    Leader Follower
    Topic1
    partition2
    Topic1
    partition2
    Topic1
    partition2
    Topic1
    partition3
    Topic1
    partition4
    Topic1
    partition3
    Topic1
    partition3
    Topic1
    partition4
    Topic1
    partition4

    View Slide

  66. @
    @gamussa @confluentinc
    Partition Leadership and Replication - node failure
    Broker 1
    Topic1
    partition1
    Broker 2 Broker 3 Broker 4
    Topic1
    partition1
    Topic1
    partition1
    Leader Follower
    Topic1
    partition2
    Topic1
    partition2
    Topic1
    partition2
    Topic1
    partition3
    Topic1
    partition4
    Topic1
    partition3
    Topic1
    partition3
    Topic1
    partition4
    Topic1
    partition4

    View Slide

  67. @
    @gamussa @confluentinc
    Streaming Platform
    1. Pub/Sub
    2. Store
    3. Process

    View Slide

  68. @
    @gamussa @confluentinc
    What is Stream Processing?
    A machine for combining streams of events

    View Slide

  69. @
    @gamussa @confluentinc

    View Slide

  70. @
    @gamussa @confluentinc

    View Slide

  71. @
    @gamussa @confluentinc
    https://www.confluent.io/download/

    View Slide

  72. @
    @gamussa @confluentinc
    We are hiring!
    https://www.confluent.io/careers/

    View Slide

  73. @
    @gamussa @confluentinc
    One more thing…

    View Slide

  74. @
    @gamussa @confluentinc

    View Slide

  75. @
    @gamussa @confluentinc

    View Slide

  76. @
    @gamussa @confluentinc

    View Slide

  77. @
    @gamussa @confluentinc

    View Slide

  78. @
    @gamussa @confluentinc

    View Slide

  79. @
    @gamussa @confluentinc
    A Major New Paradigm

    View Slide

  80. @
    @gamussa @confluentinc
    Thanks!
    questions?
    @gamussa
    [email protected]
    We are hiring!
    https://www.confluent.io/careers/

    View Slide