Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling to Get the Whole World Running

Steve Huff
October 10, 2017

Scaling to Get the Whole World Running

Talk given at Mobile@Scale 2017, describing Runkeeper's strategies for scaling a mobile app and backend.

Steve Huff

October 10, 2017
Tweet

More Decks by Steve Huff

Other Decks in Technology

Transcript

  1. Scaling to Get the Whole World Running Steve Huff [email protected]

    @hakamadare Lead Site Reliability Engineer Joe Bondi [email protected] @007i Co-Founder and CTO
  2. Runkeeper APIs Started simple: “core” most-critical API functions Mobile <->

    Server relationship Activity tracking User registration
  3. Runkeeper APIs Planned for extensibility and evolution Mobile <-> Server

    relationship Activity tracking User registration Training plans Social network / feed Challenges Subscriptions ... Goals Routes
  4. • Know your expected traffic patterns / client behavior •

    Monitor your actual traffic patterns / client behavior • Log metrics to surface and identify bottlenecks • Recognize badly-behaving mobile clients, and fix! • Avoid self DDoS’ing! ◦ API calls made during app launch or home screen ◦ API calls made in loops (N+1) ◦ API calls made from push notifications at large surge volume Log, measure, monitor
  5. Best way to scale a database is to not use

    one How does your database grow? Trip points-data Trip summary records
  6. • Have a client-side + server-side strategy • Find queries

    that benefit from caching ◦ Measure hit / miss - know how it’s working Caching - be fast and efficient Local cache (etag / retrofit) CDN Web App CloudFront Redis Postgres
  7. • Queues help manage write actions at large volumes •

    Queue anything that can be queued • Monitor queue length to know when there’s an issue Queues - are a savior
  8. Migrations - change the engines while in-flight 1. Dual-write: Deploy

    new writes while old way stays up and running 2. Backfill historical data 3. Incremental deploy (via server- or client-side config) 4. Cleanup (remove code and data) Migrating things to the “new way”
  9. • Backwards compatibility ◦ How long to support old versions

    of apps? • Time needed to get new version out to users devices • Wireless connectivity failures ◦ Design for re-trying - though avoid self-DDoS ◦ Re-transmission of data already successfully received by server Challenges specific to mobile apps
  10. 1. Thanks for inviting us, and to all of you

    for attending! 2. Questions? Steve Huff [email protected] @hakamadare Lead Site Reliability Engineer Joe Bondi [email protected] @007i Co-Founder and CTO Thank you!