Kafka JVM adventures - Speaker Deck

Slide 1

Slide 1 text

KAFKA AND A DEVELOPER WALK INTO A BAR Shlomi Shemesh, R&D Group manager

Slide 2

Slide 2 text

Mobile Marketing Analytics 2014: team of 10 2015: team of 35 in 2 offices 2016: team of 130 in 8 offices, expected to double next year • Product partnership with • 7 Billion messages per day, which translates to a few times that in Kafka messages • Hundreds of nodes using a few Kafka clusters holding 40+ topics.

Slide 3

Slide 3 text

Ad Networks Agencies Advertisers The Mobile Attribution Eco-System

Slide 4

Slide 4 text

QUEUES

Slide 5

Slide 5 text

Pub-Sub vs. Synchronous Invocation

Slide 6

Slide 6 text

Apache Kafka A publish-subscribe messaging rethought as a distributed commit log. Fast A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Scalable Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers Durable Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Distributed by Design Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.

Slide 7

Slide 7 text

Apache Kafka A publish-subscribe messaging rethought as a distributed commit log. • More than one client can read the same message - messages are kept until Kafka discards them (retention policy) • Great support for concurrency using Kafka partitions • Kafka does not hold the queue state - responsibility of the consumer • Guarantees message delivery "at least once"

Slide 8

Slide 8 text

ELBs Web Handlers . . . Raw Reports Data River Matches Launches Aggr. Reports . . . IO Cluster Deep Analytics Secor Metadata X109 Loyals AppsFlyer Real-Time Architecture RealTime http messages

Slide 9

Slide 9 text

AppsFlyer River ® Kafka in Production

Slide 10

Slide 10 text

Media Sources App Developers App Users AppsFlyer Data Flows X109

Slide 11

Slide 11 text

• 120 c3.large nodes in the River cluster • ~1000 partners and many thousands or configured customers • Processing high-volume of events per second Http Post Parse and normalize Postback generator Postback Http Sender Monitoring and Analytics Multiple Topics The River Microservice The River Microservice

Slide 12

Slide 12 text

When Services Go Pop • Using queues if when, one of the River machines fails there is no data loss • Load is always distributed amongst all machines in the cluster

Slide 13

Slide 13 text

* more about monitoring and scaling in the next session Kafka Multiple Partitions = Elastic Microservice Design

Slide 14

Slide 14 text

• “Shock absorber” • Machine maintenance • Rush hours vs. off peak traffic time • quickly spot and fix bottlenecks Why Queues?

Slide 15

Slide 15 text

BACK- PRESSURE

Slide 16

Slide 16 text

• “Back-pressure is an important feedback mechanism that allows systems to gracefully respond to load rather than collapse under it… will ensure that the system is resilient under load” AppsFlyer Attribution micro service River blocked by Partner server timeouts Pull vs Push Queue as a Back- Pressure Queue Lag = Log size - Offset http://www.reactivemanifesto.org/glossary#Back-Pressure

Slide 17

Slide 17 text

AppsFlyer River ® Kafka in Production

Slide 18

Slide 18 text

Constant Traffic Growth

Slide 19

Slide 19 text

Troubles in Paradise Http Post Parse and normalize Postback generator Postback Http Sender Monitoring and Analytics Multiple Topics The River Microservice

Slide 20

Slide 20 text

commit or not commit?! • Major issues on destination timeouts => unstable # threads per jvm while http blocks • Unstable memory and CPU use • Services committing suicide Troubles in Paradise

Slide 21

Slide 21 text

QUEUES EVERYWHERE!

Slide 22

Slide 22 text

Consumer Single Topic Per Mode Parse and Normalize Postback Generator Postback Http Sender Monitoring and Analytics • Mode per topic • Queues and Back-Pressure within our microservice using blocking channel • Spot the bottlenecks by monitoring the buffers • Efficiently utilize the CPU cores Refactor - Introduce Internal Queues

Slide 23

Slide 23 text

Just Pop-Up Another Consumer Group • Easily add additional micro services to consume from the same topic • Utilizing Kafka consumer groups to test production data during development or by test servers • Data migration from U.S to E.U located servers • Replace legacy microservice with new code - run in parallel • Replay data in case of a bug

Slide 24

Slide 24 text

Quickly Spot Where the Bottleneck is

Slide 25

Slide 25 text

IO Cluster to the Rescue • Microservice that uses Java event-driven, non-blocking I/O model in order to make the http calls (via the http-kit library) • Split the complexity of the http calls from the complexity of the business logic that yields them • ~1000 concurrent http calls on a single node up to 110k per minute using 'enhanced networking’ machines • Generic - Receive the output topic as a input message parameter

Slide 26

Slide 26 text

Kafka Consumer Single topic per mode Postback Generator Rule Engine Kafka Producer Kafka Consumer Http Sender Kafka Producer Kafka Consumer Monitoring and Analytics Kafka Producer River-analytics River-out River IO Cluster River Analytics Refactor - Introduce Internal Queues

Slide 27

Slide 27 text

Tuning the Kafka Consumer

Slide 28

Slide 28 text

and sleep much better at night # of River Machines Reduced by 90% !

Slide 29

Slide 29 text

• Our Microservices are stateless • We aim to keep our microservices simple, meaning Single Responsibility Simple Made Easy (Rich Hickey, creator of clojure)

Slide 30

Slide 30 text

Thank You. WE ARE HIRING!! Email: [email protected]