Scaling to Get the Whole World Running

Scaling to Get the Whole World Running Steve Huff [email protected]
@hakamadare Lead Site Reliability Engineer Joe Bondi [email protected] @007i Co-Founder and CTO

Runkeeper APIs Started simple: “core” most-critical API functions Mobile <->
Server relationship Activity tracking User registration

Runkeeper APIs Planned for extensibility and evolution Mobile <-> Server
relationship Activity tracking User registration Training plans Social network / feed Challenges Subscriptions ... Goals Routes

Log, measure, monitor Expected traffic patterns and app behavior

Log, measure, monitor Actual traffic patterns and app behavior

• Know your expected traffic patterns / client behavior •
Monitor your actual traffic patterns / client behavior • Log metrics to surface and identify bottlenecks • Recognize badly-behaving mobile clients, and fix! • Avoid self DDoS’ing! ◦ API calls made during app launch or home screen ◦ API calls made in loops (N+1) ◦ API calls made from push notifications at large surge volume Log, measure, monitor

Scale horizontally Plan for multiples of things from the start
(app servers, caches, databases)

Best way to scale a database is to not use
one How does your database grow? Trip points-data Trip summary records

• Have a client-side + server-side strategy • Find queries
that benefit from caching ◦ Measure hit / miss - know how it’s working Caching - be fast and efficient Local cache (etag / retrofit) CDN Web App CloudFront Redis Postgres

• Queues help manage write actions at large volumes •
Queue anything that can be queued • Monitor queue length to know when there’s an issue Queues - are a savior

Migrations - change the engines while in-flight 1. Dual-write: Deploy
new writes while old way stays up and running 2. Backfill historical data 3. Incremental deploy (via server- or client-side config) 4. Cleanup (remove code and data) Migrating things to the “new way”

• Backwards compatibility ◦ How long to support old versions
of apps? • Time needed to get new version out to users devices • Wireless connectivity failures ◦ Design for re-trying - though avoid self-DDoS ◦ Re-transmission of data already successfully received by server Challenges specific to mobile apps

1. Thanks for inviting us, and to all of you
for attending! 2. Questions? Steve Huff [email protected] @hakamadare Lead Site Reliability Engineer Joe Bondi [email protected] @007i Co-Founder and CTO Thank you!

Scaling to Get the Whole World Running

Scaling to Get the Whole World Running

Steve Huff

More Decks by Steve Huff

Other Decks in Technology

Featured

Transcript

Scaling to Get the Whole World Running Steve Huff [email protected]

Runkeeper APIs Started simple: “core” most-critical API functions Mobile <->

Runkeeper APIs Planned for extensibility and evolution Mobile <-> Server

Log, measure, monitor Expected traffic patterns and app behavior

Log, measure, monitor Actual traffic patterns and app behavior

• Know your expected traffic patterns / client behavior •

Scale horizontally Plan for multiples of things from the start

Best way to scale a database is to not use

• Have a client-side + server-side strategy • Find queries

• Queues help manage write actions at large volumes •

Migrations - change the engines while in-flight 1. Dual-write: Deploy

• Backwards compatibility ◦ How long to support old versions

1. Thanks for inviting us, and to all of you