Presentation given at PHP Benelux 2017 near Antwerp, Belgium.
KAFKA WILL GETTHE MESSAGE ACROSS.GUARANTEED.PHP Benelux 2017Belgium
View Slide
David Zuelke
[email protected]
@dzuelke
KAFKA
LinkedIn
APACHE KAFKA
"uh oh, another Apache project?!"
KEEP CALMAND LOOK ATTHE WEBSITE
"Basically it is a massively scalable pub/sub message queue.architected as a distributed transaction log."
"so it's a queue?"
it's not just a queue
queues are not multi-subscriber :(
"so it's a pubsub thing?"
it's not just a pubsub thing
pubsub broadcasts to all subscribers :(
it's a log
not that kind of log
WAL
Write-Ahead Log
WRITE-AHEAD LOG
1 foo2 bar3 baz4 hi
1 create document: "foo", data: "…"2 update document: "foo", data: "…"3 create document: "bar", data: "…"4 remove document: "foo"
never corrupts
sequential I/O
every message will be read at least once, no random access
FileChannel.transferTo(shovels data straight from e.g. disk cache to network interface, no copying via RAM)
"HI, I AM KAFKA""Buckle up while we process (m|b|tr)illions of messages/s."
TOPICS
streams of records
1 2 3 4 5 6 7 …
1 2 3 4 5 6 7 8 …producer writesconsumer reads
can have many subscribers
1 2 3 4 5 6 7 8 …producer writesconsumerB readsconsumerA reads
can be partitioned
P0 1 2 3 4 5 6 7 …P1 1 2 3 4 …P2 1 2 3 4 5 6 7 8 …P3 1 2 3 4 5 6 …
partitions let you scale storage!
partitions let you scale consuming!
all records are retained, whether consumed or not, up to a configurable limit
PRODUCERS
byte[]
(typically JSON, XML, Avro, Thrift, Protobufs)
(typically not funny GIFs)
can choose explicit partition, or a key (which is used for auto-partitioning)
https://github.com/edenhill/librdkafka & https://arnaud-lb.github.io/php-rdkafka/
BASIC PRODUCER$rk = new RdKafka\Producer();$rk->addBrokers("127.0.0.1");$topic = $rk->newTopic("test");$topic->produce(RD_KAFKA_PARTITION_UA, 0, "Unassigned partition, let Kafka choose");$topic->produce(RD_KAFKA_PARTITION_UA, 0, "Yay consistent hashing", $user->getId());$topic->produce(1, 0, "This will always be sent to partition 1");
CONSUMERS
cheap
only metadata stored per consumer: offset
guaranteed to always have messages in right order (within a partition)
can themselves produce new messages! (but there is also a Streams API for pure transformations)
BASIC CONSUMER$conf = new RdKafka\Conf();$conf->set('group.id', 'myConsumerGroup');$rk = new RdKafka\Consumer($conf);$rk->addBrokers("127.0.0.1");$topicConf = new RdKafka\TopicConf();$topicConf->set('auto.commit.interval.ms', 100);$topic = $rk->newTopic("test", $topicConf);$topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);while (true) {$msg = $topic->consume(0, 120*10000);do_something($msg);}
AT-MOST ONCE DELIVERY$conf = new RdKafka\Conf();$conf->set('group.id', 'myConsumerGroup');$rk = new RdKafka\Consumer($conf);$rk->addBrokers("127.0.0.1");$topicConf = new RdKafka\TopicConf();$topicConf->set('auto.commit.enable', false);$topic = $rk->newTopic("test", $topicConf);$topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);while (true) {$msg = $topic->consume(0, 120*10000);$topic->offsetStore($msg->partition, $msg->offset);do_something($msg);}
AT-LEAST ONCE DELIVERY$conf = new RdKafka\Conf();$conf->set('group.id', 'myConsumerGroup');$rk = new RdKafka\Consumer($conf);$rk->addBrokers("127.0.0.1");$topicConf = new RdKafka\TopicConf();$topicConf->set('auto.commit.enable', false);$topic = $rk->newTopic("test", $topicConf);$topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);while (true) {$msg = $topic->consume(0, 120*10000);do_something($msg);$topic->offsetStore($msg->partition, $msg->offset);}
EXACTLY-ONCE DELIVERY
you cannot have exactly-once delivery
THE BYZANTINE GENERALS "together we can beat the monsters.let's both attack at 07:00?""confirm, we attack at 07:00"☠
USE CASES
• LinkedIn• Yahoo• Twitter• Netflix• Square• Spotify• Pinterest• Uber• GoldmanSachs• Tumblr• PayPal• Airbnb• Mozilla• Cisco• Etsy• Foursquare• Shopify• CloudFlare
ingest the Twitter firehose and turn it into a pointless demo ;)
messaging, of course
track user activity
record runtime metrics
aggregate logs
IoT (you could still e.g. use MQTT over the wire, and bridge to Kafka)
replicate information between data centers (also see Connector API)
Event Sourcing broker :)
WAL / Commit Log for another system
billing!
"shock absorber" between systems to avoid overload of DBs, APIs, etc.
in PHP: mostly producing messages; better languages exist for consuming
The End
THANK YOU FOR LISTENING!Questions? Ask me: @dzuelke & [email protected]