Kafka Will Get The Message Across, Guaranteed.

Kafka Will Get The Message Across, Guaranteed.

Presentation given at PHP Benelux 2017 near Antwerp, Belgium.

D6ccd6409910643d05ddaea3b2cd6f13?s=128

David Zuelke

January 28, 2017
Tweet

Transcript

  1. KAFKA WILL GET THE MESSAGE ACROSS. GUARANTEED. PHP Benelux 2017

    Belgium
  2. David Zuelke

  3. None
  4. dz@heroku.com

  5. @dzuelke

  6. KAFKA

  7. LinkedIn

  8. APACHE KAFKA

  9. "uh oh, another Apache project?!"

  10. None
  11. KEEP CALM AND LOOK AT THE WEBSITE

  12. None
  13. "Basically it is a massively scalable pub/sub message queue. architected

    as a distributed transaction log."
  14. "so it's a queue?"

  15. it's not just a queue

  16. queues are not multi-subscriber :(

  17. "so it's a pubsub thing?"

  18. it's not just a pubsub thing

  19. pubsub broadcasts to all subscribers :(

  20. it's a log

  21. None
  22. not that kind of log

  23. WAL

  24. Write-Ahead Log

  25. WRITE-AHEAD LOG

  26. None
  27. 1 foo 2 bar 3 baz 4 hi

  28. 1 create document: "foo", data: "…" 2 update document: "foo",

    data: "…" 3 create document: "bar", data: "…" 4 remove document: "foo"
  29. None
  30. never corrupts

  31. sequential I/O

  32. None
  33. sequential I/O

  34. every message will be read at least once, no random

    access
  35. FileChannel.transferTo (shovels data straight from e.g. disk cache to network

    interface, no copying via RAM)
  36. "HI, I AM KAFKA" "Buckle up while we process (m|b|tr)illions

    of messages/s."
  37. TOPICS

  38. streams of records

  39. 1 2 3 4 5 6 7 …

  40. 1 2 3 4 5 6 7 8 … producer

    writes consumer reads
  41. can have many subscribers

  42. 1 2 3 4 5 6 7 8 … producer

    writes consumerB reads consumerA reads
  43. can be partitioned

  44. P0 1 2 3 4 5 6 7 … P1

    1 2 3 4 … P2 1 2 3 4 5 6 7 8 … P3 1 2 3 4 5 6 …
  45. partitions let you scale storage!

  46. partitions let you scale consuming!

  47. None
  48. all records are retained, whether consumed or not, up to

    a configurable limit
  49. PRODUCERS

  50. byte[]

  51. (typically JSON, XML, Avro, Thrift, Protobufs)

  52. (typically not funny GIFs)

  53. can choose explicit partition, or a key (which is used

    for auto-partitioning)
  54. https://github.com/edenhill/librdkafka & https://arnaud-lb.github.io/php-rdkafka/

  55. BASIC PRODUCER $rk = new RdKafka\Producer(); $rk->addBrokers("127.0.0.1"); $topic = $rk->newTopic("test");

    $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Unassigned partition, let Kafka choose"); $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Yay consistent hashing", $user->getId()); $topic->produce(1, 0, "This will always be sent to partition 1");
  56. CONSUMERS

  57. cheap

  58. only metadata stored per consumer: offset

  59. guaranteed to always have messages in right order (within a

    partition)
  60. can themselves produce new messages!
 (but there is also a

    Streams API for pure transformations)
  61. None
  62. BASIC CONSUMER $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk =

    new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.interval.ms', 100); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); do_something($msg); }
  63. AT-MOST ONCE DELIVERY $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk

    = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.enable', false); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); $topic->offsetStore($msg->partition, $msg->offset); do_something($msg); }
  64. AT-LEAST ONCE DELIVERY $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk

    = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.enable', false); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); do_something($msg); $topic->offsetStore($msg->partition, $msg->offset); }
  65. EXACTLY-ONCE DELIVERY

  66. you cannot have exactly-once delivery

  67. THE BYZANTINE GENERALS "together we can beat the monsters. let's

    both attack at 07:00?" "confirm, we attack at 07:00" ☠
  68. USE CASES

  69. • LinkedIn • Yahoo • Twitter • Netflix • Square

    • Spotify • Pinterest • Uber • Goldman Sachs • Tumblr • PayPal • Airbnb • Mozilla • Cisco • Etsy • Foursquare • Shopify • CloudFlare
  70. ingest the Twitter firehose and turn it into a pointless

    demo ;)
  71. None
  72. messaging, of course

  73. track user activity

  74. record runtime metrics

  75. aggregate logs

  76. IoT
 (you could still e.g. use MQTT over the wire,

    and bridge to Kafka)
  77. replicate information between data centers
 (also see Connector API)

  78. Event Sourcing broker :)

  79. WAL / Commit Log for another system

  80. billing!

  81. "shock absorber" between systems to avoid overload of DBs, APIs,

    etc.
  82. in PHP: mostly producing messages; better languages exist for consuming

  83. The End

  84. THANK YOU FOR LISTENING! Questions? Ask me: @dzuelke & dz@heroku.com