Save 37% off PRO during our Black Friday Sale! »

Kafka Will Get The Message Across, Guaranteed.

D6ccd6409910643d05ddaea3b2cd6f13?s=47 David Zuelke
December 02, 2016

Kafka Will Get The Message Across, Guaranteed.

Presentation given at SymfonyCon 2016 in Berlin, Germany.

D6ccd6409910643d05ddaea3b2cd6f13?s=128

David Zuelke

December 02, 2016
Tweet

Transcript

  1. KAFKA WILL GET THE MESSAGE ACROSS. GUARANTEED. SymfonyCon 2016 Berlin,

    Germany
  2. David Zuelke

  3. None
  4. dz@heroku.com

  5. @dzuelke

  6. KAFKA

  7. LinkedIn

  8. APACHE KAFKA

  9. "uh oh, another Apache project?!"

  10. None
  11. KEEP CALM AND LOOK AT THE WEBSITE

  12. None
  13. "Basically it is a massively scalable pub/sub message queue. architected

    as a distributed transaction log."
  14. "so it's a queue?"

  15. it's not a queue

  16. queues are not multi-subscriber :(

  17. "so it's a pubsub thing?"

  18. it's not a pubsub thing

  19. pubsub broadcasts to all subscribers :(

  20. it's a log

  21. None
  22. not that kind of log

  23. WAL

  24. Write-Ahead Log

  25. WRITE-AHEAD LOG

  26. None
  27. 1 foo 2 bar 3 baz 4 hi

  28. 1 create document: "foo", data: "…" 2 update document: "foo",

    data: "…" 3 create document: "bar", data: "…" 4 remove document: "foo"
  29. None
  30. never corrupts

  31. sequential I/O

  32. None
  33. sequential I/O

  34. every message will be read at least once, no random

    access
  35. FileChannel.transferTo (shovels data straight from e.g. disk cache to network

    interface, no copying via RAM)
  36. "HI, I AM KAFKA" "Buckle up while we process (m|b|tr)illions

    of messages/s."
  37. TOPICS

  38. streams of records

  39. 1 2 3 4 5 6 7 …

  40. 1 2 3 4 5 6 7 8 … producer

    writes consumer reads
  41. can have many subscribers

  42. 1 2 3 4 5 6 7 8 … producer

    writes consumerB reads consumerA reads
  43. can be partitioned

  44. P0 1 2 3 4 5 6 7 … P1

    1 2 3 4 … P2 1 2 3 4 5 6 7 8 … P3 1 2 3 4 5 6 …
  45. partitions let you scale storage!

  46. partitions let you scale consuming!

  47. None
  48. all records are retained, whether consumed or not, up to

    a configurable limit
  49. PRODUCERS

  50. byte[]

  51. (typically JSON, XML, Avro, Thrift, Protobufs)

  52. (typically not funny GIFs)

  53. can choose explicit partition, or a key (which is used

    for auto-partitioning)
  54. https://github.com/edenhill/librdkafka & https://arnaud-lb.github.io/php-rdkafka/

  55. BASIC PRODUCER $rk = new RdKafka\Producer(); $rk->addBrokers("127.0.0.1"); $topic = $rk->newTopic("test");

    $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Unassigned partition, let Kafka choose"); $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Yay consistent hashing", $user->getId()); $topic->produce(1, 0, "This will always be sent to partition 1");
  56. CONSUMERS

  57. cheap

  58. only metadata stored per consumer: offset

  59. guaranteed to always have messages in right order (within a

    partition)
  60. can themselves produce new messages!

  61. None
  62. BASIC CONSUMER $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk =

    new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.interval.ms', 100); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); do_something($msg); }
  63. AT-MOST ONCE DELIVERY $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk

    = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.enable', false); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); $topic->offsetStore($msg->partition, $msg->offset); do_something($msg); }
  64. AT-LEAST ONCE DELIVERY $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk

    = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.enable', false); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); do_something($msg); $topic->offsetStore($msg->partition, $msg->offset); }
  65. EXACTLY-ONCE DELIVERY

  66. you cannot have exactly-once delivery

  67. THE BYZANTINE GENERALS "together we can beat the monsters. let's

    both attack at 07:00?" "confirm, we attack at 07:00" ☠
  68. USE CASES

  69. • LinkedIn • Yahoo • Twitter • Netflix • Square

    • Spotify • Pinterest • Uber • Goldman Sachs • Tumblr • PayPal • Airbnb • Mozilla • Cisco • Etsy • Foursquare • Shopify • CloudFlare
  70. ingest Twitter firehose and turn it into a pointless demo

    ;)
  71. None
  72. messaging, of course

  73. track user activity

  74. record runtime metrics

  75. IoT

  76. replicate information between data centers

  77. billing!

  78. "shock absorber" between systems to avoid overload of DBs, APIs,

    etc.
  79. in PHP: mostly producing messages; better languages exist for consuming

  80. The End

  81. THANK YOU FOR LISTENING! Questions? Ask me: @dzuelke & dz@heroku.com