Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks

AppsFlyer
January 10, 2017

Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks

Talk from ILTechTalks ReversimX meetup.
Adi Belan talking about AppsFlyer concerns with efficiency and scale of the real-time back-end services.
video (in hebrew) is here:
https://youtu.be/TPQKBwDky60

AppsFlyer

January 10, 2017
Tweet

More Decks by AppsFlyer

Other Decks in Technology

Transcript

  1. Adi Belan - R&D Team Lead Processing 15 Billion events

    per day - In Real Time! (While not breaking the bank)
  2. So, 15 Billion Events you say? • SDK installed in

    10 Billion mobile devices • Every time App is launched += 1 event • Every time User engages with Ad += 1 event
  3. So, Why Real Time? • Deferred Deeplinking - Opening App

    in correct context • Ad Networks optimize bidding in real time
  4. • 7 Different ELBs • ~200 - 400 EC2 Instances

    running • 5-8 million http requests per minute • 20+ Billion Kafka messages per day • Multiple AeroSpike & Couchbase clusters with Billions of keys Some Numbers
  5. Strong. Light. Cheap - Choose 2 • Not All Traffic

    has same business value • Choose resilience vs. speed vs. cost according to business value.
  6. Saving Mission Critical! • 1st App Open is Install event

    = mission critical • Dual RabbitMQ clusters = 32 r3.2xlarge ~= 500 $/day
  7. Saving Money where possible • Use Spots - ⅕ of

    the $$$ • Auto Scale - according to Applicative Metrics to get best utilization • Replication Factor - where you can afford to lose ◦ Kafka Cluster per business value ◦ DB replication factor
  8. AWS Spots - The Stock Market • Bidding Mechanism •

    Different instance types and AZ • Be ready to replace with On-Demand
  9. Auto-Scale • Kafka Consumer Lag (in Seconds!!) • Keep SLA

    - don’t worry about Load / CPU • Scale spots before On-Demand
  10. Replication Factor • 80% of conversions come from clicks in

    the last 12 hours In Memory RF 2 ~ 24 Hours SSD RF 1 30 Days Attribution Service XDR DB Writes DB Reads