Slide 1

Slide 1 text

KAFKA WILL GET THE MESSAGE ACROSS. GUARANTEED. SymfonyCon 2016 Berlin, Germany

Slide 2

Slide 2 text

David Zuelke

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Slide 5

Slide 5 text

@dzuelke

Slide 6

Slide 6 text

KAFKA

Slide 7

Slide 7 text

LinkedIn

Slide 8

Slide 8 text

APACHE KAFKA

Slide 9

Slide 9 text

"uh oh, another Apache project?!"

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

KEEP CALM AND LOOK AT THE WEBSITE

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

"Basically it is a massively scalable pub/sub message queue. architected as a distributed transaction log."

Slide 14

Slide 14 text

"so it's a queue?"

Slide 15

Slide 15 text

it's not a queue

Slide 16

Slide 16 text

queues are not multi-subscriber :(

Slide 17

Slide 17 text

"so it's a pubsub thing?"

Slide 18

Slide 18 text

it's not a pubsub thing

Slide 19

Slide 19 text

pubsub broadcasts to all subscribers :(

Slide 20

Slide 20 text

it's a log

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

not that kind of log

Slide 23

Slide 23 text

WAL

Slide 24

Slide 24 text

Write-Ahead Log

Slide 25

Slide 25 text

WRITE-AHEAD LOG

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

1 foo 2 bar 3 baz 4 hi

Slide 28

Slide 28 text

1 create document: "foo", data: "…" 2 update document: "foo", data: "…" 3 create document: "bar", data: "…" 4 remove document: "foo"

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

never corrupts

Slide 31

Slide 31 text

sequential I/O

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

sequential I/O

Slide 34

Slide 34 text

every message will be read at least once, no random access

Slide 35

Slide 35 text

FileChannel.transferTo (shovels data straight from e.g. disk cache to network interface, no copying via RAM)

Slide 36

Slide 36 text

"HI, I AM KAFKA" "Buckle up while we process (m|b|tr)illions of messages/s."

Slide 37

Slide 37 text

TOPICS

Slide 38

Slide 38 text

streams of records

Slide 39

Slide 39 text

1 2 3 4 5 6 7 …

Slide 40

Slide 40 text

1 2 3 4 5 6 7 8 … producer writes consumer reads

Slide 41

Slide 41 text

can have many subscribers

Slide 42

Slide 42 text

1 2 3 4 5 6 7 8 … producer writes consumerB reads consumerA reads

Slide 43

Slide 43 text

can be partitioned

Slide 44

Slide 44 text

P0 1 2 3 4 5 6 7 … P1 1 2 3 4 … P2 1 2 3 4 5 6 7 8 … P3 1 2 3 4 5 6 …

Slide 45

Slide 45 text

partitions let you scale storage!

Slide 46

Slide 46 text

partitions let you scale consuming!

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

all records are retained, whether consumed or not, up to a configurable limit

Slide 49

Slide 49 text

PRODUCERS

Slide 50

Slide 50 text

byte[]

Slide 51

Slide 51 text

(typically JSON, XML, Avro, Thrift, Protobufs)

Slide 52

Slide 52 text

(typically not funny GIFs)

Slide 53

Slide 53 text

can choose explicit partition, or a key (which is used for auto-partitioning)

Slide 54

Slide 54 text

https://github.com/edenhill/librdkafka & https://arnaud-lb.github.io/php-rdkafka/

Slide 55

Slide 55 text

BASIC PRODUCER $rk = new RdKafka\Producer(); $rk->addBrokers("127.0.0.1"); $topic = $rk->newTopic("test"); $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Unassigned partition, let Kafka choose"); $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Yay consistent hashing", $user->getId()); $topic->produce(1, 0, "This will always be sent to partition 1");

Slide 56

Slide 56 text

CONSUMERS

Slide 57

Slide 57 text

cheap

Slide 58

Slide 58 text

only metadata stored per consumer: offset

Slide 59

Slide 59 text

guaranteed to always have messages in right order (within a partition)

Slide 60

Slide 60 text

can themselves produce new messages!

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

BASIC CONSUMER $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.interval.ms', 100); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); do_something($msg); }

Slide 63

Slide 63 text

AT-MOST ONCE DELIVERY $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.enable', false); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); $topic->offsetStore($msg->partition, $msg->offset); do_something($msg); }

Slide 64

Slide 64 text

AT-LEAST ONCE DELIVERY $conf = new RdKafka\Conf(); $conf->set('group.id', 'myConsumerGroup'); $rk = new RdKafka\Consumer($conf); $rk->addBrokers("127.0.0.1"); $topicConf = new RdKafka\TopicConf(); $topicConf->set('auto.commit.enable', false); $topic = $rk->newTopic("test", $topicConf); $topic->consumeStart(0, RD_KAFKA_OFFSET_STORED); while (true) { $msg = $topic->consume(0, 120*10000); do_something($msg); $topic->offsetStore($msg->partition, $msg->offset); }

Slide 65

Slide 65 text

EXACTLY-ONCE DELIVERY

Slide 66

Slide 66 text

you cannot have exactly-once delivery

Slide 67

Slide 67 text

THE BYZANTINE GENERALS "together we can beat the monsters. let's both attack at 07:00?" "confirm, we attack at 07:00" ☠

Slide 68

Slide 68 text

USE CASES

Slide 69

Slide 69 text

• LinkedIn • Yahoo • Twitter • Netflix • Square • Spotify • Pinterest • Uber • Goldman Sachs • Tumblr • PayPal • Airbnb • Mozilla • Cisco • Etsy • Foursquare • Shopify • CloudFlare

Slide 70

Slide 70 text

ingest Twitter firehose and turn it into a pointless demo ;)

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

messaging, of course

Slide 73

Slide 73 text

track user activity

Slide 74

Slide 74 text

record runtime metrics

Slide 75

Slide 75 text

IoT

Slide 76

Slide 76 text

replicate information between data centers

Slide 77

Slide 77 text

billing!

Slide 78

Slide 78 text

"shock absorber" between systems to avoid overload of DBs, APIs, etc.

Slide 79

Slide 79 text

in PHP: mostly producing messages; better languages exist for consuming

Slide 80

Slide 80 text

The End

Slide 81

Slide 81 text

THANK YOU FOR LISTENING! Questions? Ask me: @dzuelke & [email protected]