Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons from scaling to hundreds of millions of users

Lessons from scaling to hundreds of millions of users

Lessons from scaling to hundreds of millions of users is a talk I presented at DigitalOcean TIDE 2018 in NYC.

Tammy Bryant Butow

April 25, 2018
Tweet

More Decks by Tammy Bryant Butow

Other Decks in Technology

Transcript

  1. Lessons from scaling to millions of users Tammy Butow Empress

    of Chaos Principal SRE, Gremlin @tammybutow
  2. Lessons from scaling to millions of users hundreds of ^

    Tammy Butow Empress of Chaos
 Principal SRE, Gremlin
 @tammybutow
  3. I was previously a SRE Manager @: Dropbox, leading Databases,

    Magic Pocket and Code Workflows (Dev Tools) Prior to that I worked @ DigitalOcean, National Australia Bank, Queensland University of Technology + more I’m now a Principal SRE @: Gremlin
  4. Always prioritise reliability, performance and durability. Achieved through automation, monitoring,

    tooling and engineers : ) Have a clear “Engineering Principles” paper for your company.
  5. Kubernetes 3 primaries & 3 nodes Sharded MySQL Percona Community

    with semi-sync replication Monitoring Security Alerts Capacity Planning Support SLAs Backups (Percona xtra backup, short and long) Docker Containers Load balancing Private networking Engineering Tools Chaos Engineering (Gremlin) GIT GitHub / Phabricator Circle CI Code Search (LiveGrep) Infra Automation (Terraform) Infra Engineering Before Launch Coding Choose two - three approved languages for use: 1. Rust - Systems 2. Python - Scripting/Tools 3. Go - Services
  6. Web Mobile* Desktop * Native Development: iOS Android Native Development


    will give you better 
 performance Electron JS (built by GitHub) 
 Electron is used by: - GitHub - Slack Most big tech companies in the bay area are moving to Electron * Only if you are mobile first * Only if you are desktop first React (built by Facebook) 
 React is used by: - Everyone :P Most big tech companies in the bay area use React or are moving to React API Swagger (built by ) 
 Swagger is used by: - Gremlin Most big tech companies in the bay area have an API. Launching With an API makes sense! Product Engineering Before Launch
  7. Kubernetes Sharded MySQL Percona Community with semi-sync replication Monitoring Security

    Alerts Capacity Planning Support SLAs Backups (Percona xtra backup, short and long) Docker Containers Load balancing Private networking Engineering Tools Chaos Engineering (Gremlin) GIT GitHub / Phabricator Circle CI Code Search (LiveGrep) Infra Automation (Terraform) Infra Engineering With 5 Enterprise Customers Coding Choose two - three approved languages for use: 1. Rust - Systems 2. Python - Scripting/Tools 3. Go - Services Small Data Mix Panel Specific Infra Based on your product
  8. You will have started to build out infra specific for

    your product features and optimised for your own workload.
  9. Kubernetes Sharded MySQL Percona Community with semi-sync replication Monitoring Security

    Alerts Capacity Planning Support SLAs Backups (Percona xtra backup, short and long) Docker Containers Load balancing Private networking Engineering Tools Chaos Engineering (Gremlin) GIT GitHub / Phabricator Circle CI Code Search (LiveGrep) Infra Automation (Terraform) Infra Engineering With 3 Million Users Coding Choose two - three approved languages for use: 1. Rust - Systems 2. Python - Scripting/Tools 3. Go - Services Hadoop Spark, Pig Big Data / Analytics Moar Specific Infra Based on your product Caching Memcache
  10. Kubernetes Sharded MySQL Percona Community with semi-sync replication Monitoring Security

    Alerts Capacity Planning Support SLAs Backups (Percona xtra backup, short and long) Docker Containers Load balancing Private networking Engineering Tools Chaos Engineering (Gremlin) GIT GitHub / Phabricator Circle CI Code Search (LiveGrep) Infra Automation (Terraform) Infra Engineering With 50 Million Users Coding Choose two - three approved languages for use: 1. Rust - Systems 2. Python - Scripting/Tools 3. Go - Services Hadoop Spark, Pig Big Data / Analytics Moar Specific Infra Based on your product Caching Memcache
  11. From 400 million to 500 million users in one very

    fast year AKA: Getting on the rocket ship
  12. Kubernetes Sharded MySQL Percona Community with semi-sync replication Monitoring Security

    Alerts Capacity Planning Support SLAs Backups (Percona xtra backup, short and long) Docker Containers Load balancing Private Networking Engineering Tools Chaos Engineering (Gremlin) GIT GitHub / Phabricator Circle CI Code Search (LiveGrep) Infra Automation (Terraform) Infra Engineering With 400 Million Users Coding Choose two - three approved languages for use: 1. Rust - Systems 2. Python - Scripting/Tools 3. Go - Services Hadoop Spark, Pig Big Data / Analytics Caching Memcache Moar tools! (I can’t fit them) Distributed Datastore Built in-house Moar Specific Infra Based on your product
  13. You need to be able to zoom out with tools

    to make quick and important decisions
  14. You build simple and useful tools for all engineers and

    other departments (e.g. self-service analytics dashboards, cloud infra allocation CLI tools)
  15. You do performance tuning for your cloud infra because you

    sweat the details. (e.g. linux performance governor and CPU hyperthreading settings)
  16. • Prioritise capacity planning • Create org and team roadmaps,

    but stay flexible • IQRs are useful (infra quarterly reviews) • Give teams 20% time to work on KTLO
  17. Kubernetes Sharded MySQL Percona Community with semi-sync replication Monitoring Security

    Alerts Capacity Planning Support SLAs Backups (Percona xtra backup, short and long) Docker Containers Load balancing Private Networking Engineering Tools Chaos Engineering (Gremlin) GIT GitHub / Phabricator Circle CI Code Search (LiveGrep) Infra Automation (Terraform) Infra Engineering With 500 Million Users Coding Choose two - three approved languages for use: 1. Rust - Systems 2. Python - Scripting/Tools 3. Go - Services Hadoop Spark, Pig Big Data / Analytics Caching Memcache Moar tools! (I can’t fit them) Distributed Datastore Built in-house Moar Specific Infra Based on your product
  18. Always be migrating! Have 1+ migration in progress at all

    times. (e.g. data migrations and framework/tool migrations - from Ember to React)
  19. How Do You Dramatically Speed Up Engineering Onboarding? You need

    @etelsverdlov at your Company Hack Week…! (She is the Director of Community at DigitalOcean)
  20. Reduced Eng Onboarding from 4 weeks to 30min. No people

    required to support onboarding. ~ Automate all the things ~ saved 6500+ engineering hours a month
  21. • Prioritise reliability, durability & performance • Focus on making

    sure “it just works” • Your core product is solid • Infra and Product Engineering work together • You sweat the details and aim higher each day!
  22. Learn more about scaling @ Chaos Conf One day single

    track conference in SF on September 28
 Topics include building internet-scale systems, container chaos and chaos engineering. chaosconf.io
 @chaosconf