Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka JVM adventures

January 14, 2016

Kafka JVM adventures

Wondering about the best development practices that will let you sleep at night while your Kafka related applications ship millions or billions of messages each day?!

AppsFlyer R&D Team has a full 2 years experience of using Apache Kafka as the main messaging backbone of its mobile attribution service, shipping over 10 billion messages in Kafka every single day, maintaining tens of different services consuming and producing from 2 Kafka clusters holding 40+ topics.

Kafka is such a critical part of the AppsFlyer system architecture that we wrote a dedicated monitoring service, which monitors our consumers and producers, and is even used to autoscale our services as load varies during the day, we’ll cover this service as part of the meetup. We’ll also cover how we use Kafka for testing new code before deploying to productio


January 14, 2016

More Decks by AppsFlyer

Other Decks in Programming


  1. Mobile Marketing Analytics 2014: team of 10 2015: team of

    35 in 2 offices 2016: team of 130 in 8 offices, expected to double next year • Product partnership with • 7 Billion messages per day, which translates to a few times that in Kafka messages • Hundreds of nodes using a few Kafka clusters holding 40+ topics.
  2. Apache Kafka A publish-subscribe messaging rethought as a distributed commit

    log. Fast A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Scalable Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers Durable Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Distributed by Design Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  3. Apache Kafka A publish-subscribe messaging rethought as a distributed commit

    log. • More than one client can read the same message - messages are kept until Kafka discards them (retention policy) • Great support for concurrency using Kafka partitions • Kafka does not hold the queue state - responsibility of the consumer • Guarantees message delivery "at least once"
  4. ELBs Web Handlers . . . Raw Reports Data River

    Matches Launches Aggr. Reports . . . IO Cluster Deep Analytics Secor Metadata X109 Loyals AppsFlyer Real-Time Architecture RealTime http messages
  5. • 120 c3.large nodes in the River cluster • ~1000

    partners and many thousands or configured customers • Processing high-volume of events per second Http Post Parse and normalize Postback generator Postback Http Sender Monitoring and Analytics Multiple Topics The River Microservice The River Microservice
  6. When Services Go Pop • Using queues if when, one

    of the River machines fails there is no data loss • Load is always distributed amongst all machines in the cluster
  7. * more about monitoring and scaling in the next session

    Kafka Multiple Partitions = Elastic Microservice Design
  8. • “Shock absorber” • Machine maintenance • Rush hours vs.

    off peak traffic time • quickly spot and fix bottlenecks Why Queues?
  9. • “Back-pressure is an important feedback mechanism that allows systems

    to gracefully respond to load rather than collapse under it… will ensure that the system is resilient under load” AppsFlyer Attribution micro service River blocked by Partner server timeouts Pull vs Push Queue as a Back- Pressure Queue Lag = Log size - Offset http://www.reactivemanifesto.org/glossary#Back-Pressure
  10. Troubles in Paradise Http Post Parse and normalize Postback generator

    Postback Http Sender Monitoring and Analytics Multiple Topics The River Microservice
  11. commit or not commit?! • Major issues on destination timeouts

    => unstable # threads per jvm while http blocks • Unstable memory and CPU use • Services committing suicide Troubles in Paradise
  12. Consumer Single Topic Per Mode Parse and Normalize Postback Generator

    Postback Http Sender Monitoring and Analytics • Mode per topic • Queues and Back-Pressure within our microservice using blocking channel • Spot the bottlenecks by monitoring the buffers • Efficiently utilize the CPU cores Refactor - Introduce Internal Queues
  13. Just Pop-Up Another Consumer Group • Easily add additional micro

    services to consume from the same topic • Utilizing Kafka consumer groups to test production data during development or by test servers • Data migration from U.S to E.U located servers • Replace legacy microservice with new code - run in parallel • Replay data in case of a bug
  14. IO Cluster to the Rescue • Microservice that uses Java

    event-driven, non-blocking I/O model in order to make the http calls (via the http-kit library) • Split the complexity of the http calls from the complexity of the business logic that yields them • ~1000 concurrent http calls on a single node up to 110k per minute using 'enhanced networking’ machines • Generic - Receive the output topic as a input message parameter
  15. Kafka Consumer Single topic per mode Postback Generator Rule Engine

    Kafka Producer Kafka Consumer Http Sender Kafka Producer Kafka Consumer Monitoring and Analytics Kafka Producer River-analytics River-out River IO Cluster River Analytics Refactor - Introduce Internal Queues
  16. • Our Microservices are stateless • We aim to keep

    our microservices simple, meaning Single Responsibility Simple Made Easy (Rich Hickey, creator of clojure)