Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka Will Get The Message Across, Guaranteed.

Kafka Will Get The Message Across, Guaranteed.

Presentation given at PHP Benelux 2017 near Antwerp, Belgium.

David Zuelke

January 28, 2017
Tweet

More Decks by David Zuelke

Other Decks in Programming

Transcript

  1. KAFKA WILL GET
    THE MESSAGE ACROSS.
    GUARANTEED.
    PHP Benelux 2017
    Belgium

    View Slide

  2. David Zuelke

    View Slide

  3. View Slide

  4. View Slide

  5. @dzuelke

    View Slide

  6. KAFKA

    View Slide

  7. LinkedIn

    View Slide

  8. APACHE KAFKA

    View Slide

  9. "uh oh, another Apache project?!"

    View Slide

  10. View Slide

  11. KEEP CALM
    AND LOOK AT
    THE WEBSITE

    View Slide

  12. View Slide

  13. "Basically it is a massively scalable pub/sub message queue.
    architected as a distributed transaction log."

    View Slide

  14. "so it's a queue?"

    View Slide

  15. it's not just a queue

    View Slide

  16. queues are not multi-subscriber :(

    View Slide

  17. "so it's a pubsub thing?"

    View Slide

  18. it's not just a pubsub thing

    View Slide

  19. pubsub broadcasts to all subscribers :(

    View Slide

  20. it's a log

    View Slide

  21. View Slide

  22. not that kind of log

    View Slide

  23. WAL

    View Slide

  24. Write-Ahead Log

    View Slide

  25. WRITE-AHEAD LOG

    View Slide

  26. View Slide

  27. 1 foo
    2 bar
    3 baz
    4 hi

    View Slide

  28. 1 create document: "foo", data: "…"
    2 update document: "foo", data: "…"
    3 create document: "bar", data: "…"
    4 remove document: "foo"

    View Slide

  29. View Slide

  30. never corrupts

    View Slide

  31. sequential I/O

    View Slide

  32. View Slide

  33. sequential I/O

    View Slide

  34. every message will be read at least once, no random access

    View Slide

  35. FileChannel.transferTo
    (shovels data straight from e.g. disk cache to network interface, no copying via RAM)

    View Slide

  36. "HI, I AM KAFKA"
    "Buckle up while we process (m|b|tr)illions of messages/s."

    View Slide

  37. TOPICS

    View Slide

  38. streams of records

    View Slide

  39. 1 2 3 4 5 6 7 …

    View Slide

  40. 1 2 3 4 5 6 7 8 …
    producer writes
    consumer reads

    View Slide

  41. can have many subscribers

    View Slide

  42. 1 2 3 4 5 6 7 8 …
    producer writes
    consumerB reads
    consumerA reads

    View Slide

  43. can be partitioned

    View Slide

  44. P0 1 2 3 4 5 6 7 …
    P1 1 2 3 4 …
    P2 1 2 3 4 5 6 7 8 …
    P3 1 2 3 4 5 6 …

    View Slide

  45. partitions let you scale storage!

    View Slide

  46. partitions let you scale consuming!

    View Slide

  47. View Slide

  48. all records are retained, whether consumed or not, up to a configurable limit

    View Slide

  49. PRODUCERS

    View Slide

  50. byte[]

    View Slide

  51. (typically JSON, XML, Avro, Thrift, Protobufs)

    View Slide

  52. (typically not funny GIFs)

    View Slide

  53. can choose explicit partition, or a key (which is used for auto-partitioning)

    View Slide

  54. https://github.com/edenhill/librdkafka & https://arnaud-lb.github.io/php-rdkafka/

    View Slide

  55. BASIC PRODUCER
    $rk = new RdKafka\Producer();
    $rk->addBrokers("127.0.0.1");
    $topic = $rk->newTopic("test");
    $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Unassigned partition, let Kafka choose");
    $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Yay consistent hashing", $user->getId());
    $topic->produce(1, 0, "This will always be sent to partition 1");

    View Slide

  56. CONSUMERS

    View Slide

  57. cheap

    View Slide

  58. only metadata stored per consumer: offset

    View Slide

  59. guaranteed to always have messages in right order (within a partition)

    View Slide

  60. can themselves produce new messages!

    (but there is also a Streams API for pure transformations)

    View Slide

  61. View Slide

  62. BASIC CONSUMER
    $conf = new RdKafka\Conf();
    $conf->set('group.id', 'myConsumerGroup');
    $rk = new RdKafka\Consumer($conf);
    $rk->addBrokers("127.0.0.1");
    $topicConf = new RdKafka\TopicConf();
    $topicConf->set('auto.commit.interval.ms', 100);
    $topic = $rk->newTopic("test", $topicConf);
    $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);
    while (true) {
    $msg = $topic->consume(0, 120*10000);
    do_something($msg);
    }

    View Slide

  63. AT-MOST ONCE DELIVERY
    $conf = new RdKafka\Conf();
    $conf->set('group.id', 'myConsumerGroup');
    $rk = new RdKafka\Consumer($conf);
    $rk->addBrokers("127.0.0.1");
    $topicConf = new RdKafka\TopicConf();
    $topicConf->set('auto.commit.enable', false);
    $topic = $rk->newTopic("test", $topicConf);
    $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);
    while (true) {
    $msg = $topic->consume(0, 120*10000);
    $topic->offsetStore($msg->partition, $msg->offset);
    do_something($msg);
    }

    View Slide

  64. AT-LEAST ONCE DELIVERY
    $conf = new RdKafka\Conf();
    $conf->set('group.id', 'myConsumerGroup');
    $rk = new RdKafka\Consumer($conf);
    $rk->addBrokers("127.0.0.1");
    $topicConf = new RdKafka\TopicConf();
    $topicConf->set('auto.commit.enable', false);
    $topic = $rk->newTopic("test", $topicConf);
    $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);
    while (true) {
    $msg = $topic->consume(0, 120*10000);
    do_something($msg);
    $topic->offsetStore($msg->partition, $msg->offset);
    }

    View Slide

  65. EXACTLY-ONCE DELIVERY

    View Slide

  66. you cannot have exactly-once delivery

    View Slide

  67. THE BYZANTINE GENERALS


    "together we can beat the monsters.
    let's both attack at 07:00?"
    "confirm, we attack at 07:00"



    View Slide

  68. USE CASES

    View Slide

  69. • LinkedIn
    • Yahoo
    • Twitter
    • Netflix
    • Square
    • Spotify
    • Pinterest
    • Uber
    • Goldman
    Sachs
    • Tumblr
    • PayPal
    • Airbnb
    • Mozilla
    • Cisco
    • Etsy
    • Foursquare
    • Shopify
    • CloudFlare

    View Slide

  70. ingest the Twitter firehose and turn it into a pointless demo ;)

    View Slide

  71. View Slide

  72. messaging, of course

    View Slide

  73. track user activity

    View Slide

  74. record runtime metrics

    View Slide

  75. aggregate logs

    View Slide

  76. IoT

    (you could still e.g. use MQTT over the wire, and bridge to Kafka)

    View Slide

  77. replicate information between data centers

    (also see Connector API)

    View Slide

  78. Event Sourcing broker :)

    View Slide

  79. WAL / Commit Log for another system

    View Slide

  80. billing!

    View Slide

  81. "shock absorber" between systems to avoid overload of DBs, APIs, etc.

    View Slide

  82. in PHP: mostly producing messages; better languages exist for consuming

    View Slide

  83. The End

    View Slide

  84. THANK YOU FOR LISTENING!
    Questions? Ask me: @dzuelke & [email protected]

    View Slide