Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architecture for Real-Time and Batch Big-Data ...

AppsFlyer
May 21, 2015
220

Architecture for Real-Time and Batch Big-Data Analytics

AppsFlyer

May 21, 2015
Tweet

Transcript

  1. The World's leading Mobile Measurement Platform Founded in 2011 by

    Reshef Mann and Oren Kaniel Just completed Round B funding - total of $28M Processing 3.5B daily events (it was 1.9B just 2 months ago and 250M at the start of 2014!) 13 people in the development team (we were just 6 people 12 months ago!) AppsFlyer Who?!
  2. The Way We Were • We had no real concept

    of “Big Data”- we were just occupied with making the system work • Even though we were ignorant of the future, we tried to adhere to a few abstractions: - Small isolated services - Central concept of message delivery via - Different tech for different tasks A message-bus
  3. • CouchDB that served raw reports via views couldn't keep

    up with view generation • Python processes that read from the message bus (via pub sub) couldn't keep up with the amount of data • The split between aggregated and raw data was good, but caused discrepancies because each service failed at a different time • If the message bus failed (Redis), all other services were also in a fail state – single point of failure First Creaks In The System
  4. Part 1 of The Solution • Migrate raw reports from

    CouchDB to Google's Bigquery (easiest solution at the time) • Rewrite some of the Python services in a new language that: – Deals better with strings and allocation of memory – Can help us scale out – Has a great ecosystem – Functional
  5. Why Clojure? Sequence based processing capabilities really fit in the

    visualized data flow of AppsFlyer (processing the Event Stream) Enforces use of FP paradigm more strictly than Scala Repl based development Easy and common Java interop JVM!
  6. • Python's proprietary serialization • Python custom data structure as

    the base message in the system The Hurdles (or how 2 stupid mistakes can bite you in the ass 2 years down the road)
  7. Small isolated services Each service has a single business responsibility

    Each service encapsulates its own data (if it has any) and he exposes it over a well known interface Data objects are always POCO/POJO (simple data structures represented in JSON or EDN) Preference for queues and buffers that pass isolated data for total async processing How We Model
  8. How We Test No QA team Each new service can

    read from the event stream to its heart content - regular Kafka consumers behavior Each new service handles real life traffic and real life load because it's connected to the event stream Test DB if needed is easy to spin up on the cloud Once deemed ready, just throw the switch to on
  9. Service discovery via Consul How We Orchestrate Mesos Each service

    is a single uberjar in a single docker container DBs get their own dedicated machines Preference for ring based architecture for DBs Marathon
  10. Jenkins build server Deployment via the in-house Santa tool Consul

    health checks Statsd for the JVM and application metrics Sensu End-to-end flows Deployment and Monitoring