Presentation given at SymfonyCon 2016 in Berlin, Germany.
KAFKA WILL GETTHE MESSAGE ACROSS.GUARANTEED.SymfonyCon 2016Berlin, Germany
View Slide
David Zuelke
[email protected]
@dzuelke
KAFKA
LinkedIn
APACHE KAFKA
"uh oh, another Apache project?!"
KEEP CALMAND LOOK ATTHE WEBSITE
"Basically it is a massively scalable pub/sub message queue.architected as a distributed transaction log."
"so it's a queue?"
it's not a queue
queues are not multi-subscriber :(
"so it's a pubsub thing?"
it's not a pubsub thing
pubsub broadcasts to all subscribers :(
it's a log
not that kind of log
WAL
Write-Ahead Log
WRITE-AHEAD LOG
1 foo2 bar3 baz4 hi
1 create document: "foo", data: "…"2 update document: "foo", data: "…"3 create document: "bar", data: "…"4 remove document: "foo"
never corrupts
sequential I/O
every message will be read at least once, no random access
FileChannel.transferTo(shovels data straight from e.g. disk cache to network interface, no copying via RAM)
"HI, I AM KAFKA""Buckle up while we process (m|b|tr)illions of messages/s."
TOPICS
streams of records
1 2 3 4 5 6 7 …
1 2 3 4 5 6 7 8 …producer writesconsumer reads
can have many subscribers
1 2 3 4 5 6 7 8 …producer writesconsumerB readsconsumerA reads
can be partitioned
P0 1 2 3 4 5 6 7 …P1 1 2 3 4 …P2 1 2 3 4 5 6 7 8 …P3 1 2 3 4 5 6 …
partitions let you scale storage!
partitions let you scale consuming!
all records are retained, whether consumed or not, up to a configurable limit
PRODUCERS
byte[]
(typically JSON, XML, Avro, Thrift, Protobufs)
(typically not funny GIFs)
can choose explicit partition, or a key (which is used for auto-partitioning)
https://github.com/edenhill/librdkafka & https://arnaud-lb.github.io/php-rdkafka/
BASIC PRODUCER$rk = new RdKafka\Producer();$rk->addBrokers("127.0.0.1");$topic = $rk->newTopic("test");$topic->produce(RD_KAFKA_PARTITION_UA, 0, "Unassigned partition, let Kafka choose");$topic->produce(RD_KAFKA_PARTITION_UA, 0, "Yay consistent hashing", $user->getId());$topic->produce(1, 0, "This will always be sent to partition 1");
CONSUMERS
cheap
only metadata stored per consumer: offset
guaranteed to always have messages in right order (within a partition)
can themselves produce new messages!
BASIC CONSUMER$conf = new RdKafka\Conf();$conf->set('group.id', 'myConsumerGroup');$rk = new RdKafka\Consumer($conf);$rk->addBrokers("127.0.0.1");$topicConf = new RdKafka\TopicConf();$topicConf->set('auto.commit.interval.ms', 100);$topic = $rk->newTopic("test", $topicConf);$topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);while (true) {$msg = $topic->consume(0, 120*10000);do_something($msg);}
AT-MOST ONCE DELIVERY$conf = new RdKafka\Conf();$conf->set('group.id', 'myConsumerGroup');$rk = new RdKafka\Consumer($conf);$rk->addBrokers("127.0.0.1");$topicConf = new RdKafka\TopicConf();$topicConf->set('auto.commit.enable', false);$topic = $rk->newTopic("test", $topicConf);$topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);while (true) {$msg = $topic->consume(0, 120*10000);$topic->offsetStore($msg->partition, $msg->offset);do_something($msg);}
AT-LEAST ONCE DELIVERY$conf = new RdKafka\Conf();$conf->set('group.id', 'myConsumerGroup');$rk = new RdKafka\Consumer($conf);$rk->addBrokers("127.0.0.1");$topicConf = new RdKafka\TopicConf();$topicConf->set('auto.commit.enable', false);$topic = $rk->newTopic("test", $topicConf);$topic->consumeStart(0, RD_KAFKA_OFFSET_STORED);while (true) {$msg = $topic->consume(0, 120*10000);do_something($msg);$topic->offsetStore($msg->partition, $msg->offset);}
EXACTLY-ONCE DELIVERY
you cannot have exactly-once delivery
THE BYZANTINE GENERALS "together we can beat the monsters.let's both attack at 07:00?""confirm, we attack at 07:00"☠
USE CASES
• LinkedIn• Yahoo• Twitter• Netflix• Square• Spotify• Pinterest• Uber• GoldmanSachs• Tumblr• PayPal• Airbnb• Mozilla• Cisco• Etsy• Foursquare• Shopify• CloudFlare
ingest Twitter firehose and turn it into a pointless demo ;)
messaging, of course
track user activity
record runtime metrics
IoT
replicate information between data centers
billing!
"shock absorber" between systems to avoid overload of DBs, APIs, etc.
in PHP: mostly producing messages; better languages exist for consuming
The End
THANK YOU FOR LISTENING!Questions? Ask me: @dzuelke & [email protected]