Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Pinterest - Next Stage

Scaling Pinterest - Next Stage

Yash Nelapati

April 25, 2013
Tweet

More Decks by Yash Nelapati

Other Decks in Technology

Transcript

  1. Growth March 2010 Page Views / Day Mar 2010 Jan

    2011 Jan 2012 May 2012 Thursday, April 25, 13
  2. Growth March 2010 · RackSpace · 1 small Web Engine

    · 1 small MySQL DB · 1 Engineer (3 Total) Page Views / Day Mar 2010 Jan 2011 Jan 2012 May 2012 Thursday, April 25, 13
  3. Growth September 2011 Page Views / Day Mar 2010 Jan

    2011 Jan 2012 May 2012 Thursday, April 25, 13
  4. Growth September 2011 Page Views / Day Mar 2010 Jan

    2011 Jan 2012 May 2012 · Amazon EC2 + S3 + CloudFront · 2 NGinX, 16 Web Engines + 2 API Engines · 5 Functionally Sharded MySQL DB + 9 read slaves · 4 Cassandra Nodes · 15 Membase Nodes (3 separate clusters) · 8 Memcache Nodes · 10 Redis Nodes · 3 Task Routers + 4 Task Processors · 4 Elastic Search Nodes · 3 Mongo Clusters · 3 Engineers (8 Total) Thursday, April 25, 13
  5. Growth April 2012 Page Views / Day Mar 2010 Jan

    2011 Jan 2012 May 2012 Thursday, April 25, 13
  6. Growth April 2012 Page Views / Day Mar 2010 Jan

    2011 Jan 2012 May 2012 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances · 80 MySQL DBs (m1.xlarge) + 1 slave each · 110 Redis Instances · 60 Memcache Instances · 2 Redis Task Manager + 60 Task Processors · Sharded Solr · 15 Engineers (25 Total) Thursday, April 25, 13
  7. · Amazon EC2 + S3 + Edge Cast · 300

    Web Engines + 400 API Engines · 69 MySQL DBs (hi.4xlarge on SSDs) + 1 slave each · 100+ Redis Instances · 230+ Memcache Instances · 7 Redis Task Manager + 500 Task Processors · 70+ Engineers (130+ Total) Growth April 2013 Page Views / Day April 2012 April 2013 Thursday, April 25, 13
  8. Growth April 2013 Page Views / Day April 2012 April

    2013 · 6 services (80 instances) · Sharded Solr · 20 HBase · 12 Ka a + Azkabhan · 8 Zookeeper Instances · 12 Varnish Thursday, April 25, 13
  9. April 2012 Pinployees • 12 Engineers • 1 Data Infrastructure

    • 1 Ops • 2 Mobile • 8 Generalists Thursday, April 25, 13
  10. April 2012 Pinployees • 12 Engineers • 1 Data Infrastructure

    • 1 Ops • 2 Mobile • 8 Generalists April 2013 • 65 Engineers • 7 Data Infrastructure + Science • 7 Search and Discovery • 9 Business and Platform • 6 Spam, Abuse, Security • 9 Web • 9 Mobile • 2 growth • 10 Infrastructure • 6 Ops Thursday, April 25, 13
  11. • Amazon • Python, Java, Go • MySQL • Memcache

    • Redis • HBase Thursday, April 25, 13
  12. If you’re the biggest user of a technology, the challenges

    will be greatly amplified Thursday, April 25, 13
  13. Why Amazon? Hosting • When? Beginning • Very good peripherals,

    such as load balancing, DNS, map reduce, and more... • New instances ready in seconds Thursday, April 25, 13
  14. Why Amazon? Hosting • When? Beginning • Very good peripherals,

    such as load balancing, DNS, map reduce, and more... • New instances ready in seconds When to move to a datacenter? • Once you’re consistently hi ing issues beyond your control Thursday, April 25, 13
  15. Why Python? Code • Extremely mature • Well known and

    well liked • Solid active community • Very good libraries specifically targeted to web development • Effective rapid prototyping Thursday, April 25, 13
  16. Why Not Python? Code • Interpreted • Global Interpreter Lock

    • Primitive GC • Alternatives: Java, Go Thursday, April 25, 13
  17. Why MySQL and Memcache? Production Data • Extremely mature •

    Well known and well liked • Rarely catastrophic loss of data • Response time to request rate increases linearly • Very good soware support - XtraBackup, Innotop, Maatkit • Solid active community • Free Thursday, April 25, 13
  18. Why Redis? Production Data • Well known and well liked

    • Consistently good performance • Free • Variety of convenient and efficient data structures • Insert into queue in O(1) • 3 Flavors of Persistence: Now, Snapshot, Never • For HIGH write:read ratio, snapshot saves a lot I/O bandwidth • Snapshot increases reliability in noisy environments Thursday, April 25, 13
  19. Why HBase? (or, Why NOT MySQL, Redis, Memcache) Production Data

    • Efficient storage • Can handle large write thoughput • Solid Hadoop interface • Maturing quickly, used heavily by Facebook • Built on HDFS • Free • When? Use it to optimize your already mature system Thursday, April 25, 13
  20. • Employee Growth • Data Data Data • Abuse Protection

    • Uptime and Latency • Connections Thursday, April 25, 13
  21. Challenge: One Codebase + Lots of Engineers = Deploy Hell

    Employee Growth • Major bugs and performance issues stall deploys • Performance issues creep in under radar • 7+ development teams, 1 ops team • Workload changing more rapidly and less predictably • Want developers to not fear moving fast Thursday, April 25, 13
  22. Challenge: One Codebase + Lots of Engineers = Deploy Hell

    Employee Growth • Major bugs and performance issues stall deploys • Performance issues creep in under radar • 7+ development teams, 1 ops team • Workload changing more rapidly and less predictably • Want developers to not fear moving fast Challenge: Maintain Fast Flexible Experimentation • Want to empower engineers and PMs to try new things Thursday, April 25, 13
  23. Solution: Deploy Checkpoints Employee Growth • A gressive unit tests

    (careful! don’t erase your DB!) • Rings of deployment • Canary, employees only, 5% of user base, etc. • Continuous deployment • Production integration tests Thursday, April 25, 13
  24. Solution: Services Employee Growth • Move away from a monolithic

    code base and topology • When? 50 engineers or too many connections • Empower each team • Service architecture with metrics and alerts • Configurable deployment • Ability to add capacity • Convenient and consistent data storage and caching • Provide Reliable Business and Ops Data • Win: Protect your database from accidents (e.g., unit test dropping DB tables) Thursday, April 25, 13
  25. Challenge: Provide Reliable Business and Ops Data Data Data Data

    • Business relies more heavily on data • Need reliable metrics to run successful experiments Thursday, April 25, 13
  26. Solution: Use What's Available Data Data Data • Google Analytics

    • When? Day 1 • S3 + Amazon’s EMR • When? When you start needing to dig deeper • You’ll need a data lo ging pipeline Thursday, April 25, 13
  27. Solution: Build a Reliable Data Pipeline Data Data Data •

    Example: Flume, Scribe, Kaa • Get data from business logic to map reduce • Benefits: • Track trends • Understand what’s actually going on • Recover from database mishaps • What do you log? All Requests? All Events? Individual types of Events? Thursday, April 25, 13
  28. Solution: Build a Reliable Data Pipeline Data Data Data •

    Example: Flume, Scribe, Kaa • Get data from business logic to map reduce • Benefits: • Track trends • Understand what’s actually going on • Recover from database mishaps • What do you log? All Requests? All Events? Individual types of Events? • Answer: All of the above Thursday, April 25, 13
  29. Challenge: Spam and Abuse Abuse Protection • Abusive content •

    Hijacking • Application Security • DDOS / Scraping • Spam • Each flavor has a different set of actors with unique motives and behavior Thursday, April 25, 13
  30. Solution: Spam Detection and Prevention Abuse Protection • Spammers... •

    are human • know your product and demographics as well as you do • know your defenses very well • are generally more tech savvy than your users • will grow with you • want to make money • If spammers are not making a good ROI, they’ll go away • Always communicate blocks as if the receiver is a good user Thursday, April 25, 13
  31. Challenge: Increase Availability, Decrease Latency Uptime and Latency • Push

    for be er uptime and lower latency • Initially, most uptime and latency issues due to DB + caching • Fewer Instances => Few, but big failures • More Instances => More smaller failures + more complexity • How a gressively can you retry without hurting the system? Thursday, April 25, 13
  32. Solution: Metrics Dashboard and Alerts Uptime and Latency • Create

    dashboard + alerts, and review response times weekly • When? Soon aer launch at latest • Profile everything • MySQL - Maatkit, InnoTop • Memcache - Maatkit • Frontend - New Relic • General Ops - StatsD, Nagios / Monit, Ganglia Thursday, April 25, 13
  33. Solution: Configuration Manager and Failover Uptime and Latency • Provides

    load balancing and automatic connection reconfiguration • When? 30+ caches / DBs • One option: Intermediate load balancers • Example: HAProxy, NGinx, Varnish • Extra latency hop • More complication • Configuration hassle (1 LB / 7 services?) Thursday, April 25, 13
  34. Solution: Zookeeper Co-ordination • Centralized configuration management • Used for

    service discovery • Notifies of service failures • WATCH and its callback are pre y reliable • Experiment framework Thursday, April 25, 13
  35. Solution: Zookeeper Co-ordination • Centralized configuration management • Used for

    service discovery • Notifies of service failures • WATCH and its callback are pre y reliable • Experiment framework Zookeeper Services app Register Thursday, April 25, 13
  36. Solution: Zookeeper Co-ordination • Centralized configuration management • Used for

    service discovery • Notifies of service failures • WATCH and its callback are pre y reliable • Experiment framework Zookeeper Services app Register WATCH Thursday, April 25, 13
  37. Part 1: Configuration Manager and Failover MySQL Failover A B

    App Zookeeper {“master” : “A”} readonly=True Thursday, April 25, 13
  38. Part 2: Configuration Manager and Failover MySQL Failover A B

    App Zookeeper {“master” : “B”} readonly=True Thursday, April 25, 13
  39. Part 2: Configuration Manager and Failover MySQL Failover A B

    App Zookeeper {“master” : “B”} readonly=False Thursday, April 25, 13
  40. Memcache Failures App Nutcracker Cache 001 Cache 002 Cache 003

    Cache 004 Cache 005 Thursday, April 25, 13
  41. Memcache Failures App Nutcracker Cache 001 Cache 002 Cache 003

    Cache 004 Cache 005 Ketama ring adjusted Thursday, April 25, 13
  42. Solution: Instance Configuration Uptime and Latency • Example: Puppet •

    Systems to auto- and re- configure your instances • Makes it easier to spin up more capacity or replacements • When to use? • Once you begin to have clear server instance segmentation • ~15+ instances • Earlier is be er -- you’ll want all your instances Puppet-ified Thursday, April 25, 13
  43. Challenge: Number of Connections Rising Connections • Initially, entire app

    tier connected to all Memcache, Redis, MySQL • On Memcache... • 20k connections * 10kB / connection = 195MB / Memcache • 40 Memcaches means 7.6 GB used on connections • Connection space is not allocated from slab memory! • Can eventually cause Memcache process to leak into swap • On MySQL • At least 256 kB / connection Thursday, April 25, 13
  44. Challenge: Number of Connections Rising Connections • On Redis... •

    Max number of connections allowed is 10240 (weird...) • Exceeding max connections will make Redis CPU peg at 100% • On Ubuntu 12.04, default max connections is 1024 (!!) • (Go change to 65536 now) Thursday, April 25, 13
  45. Solution: Connection Pooling and Multiplexing Connections • Data Services, Nutcracker

    • When? Once any service gets close to 10k connections • Success: Memcache • Once was >20k connections • Now 1.3k connections • But, a gressive fan-out causes... • Network contention • Incast congestion Thursday, April 25, 13
  46. Finagle Why Java over Python • RPC for high concurrency

    • Twi er • Completely asynchronous • Previous experience with Finagle • Lots of compatible libraries • JVM • Lots of bells and whistles - Ostrich, Zipkin, lago Thursday, April 25, 13
  47. Near Term Challenges What’s Next? • Continually improve deployment mechanisms

    • More a gressively push toward services maintained by teams • Growing beyond 130 Pinployees • Build products faster • MySQL 5.6 • Aer that? I don’t know... Thursday, April 25, 13
  48. Redis Configuration Tips • Challenges • BGSAVE forks the main

    Redis process, potentially doubling RAM usage • If one Redis instance is using > 70% of RAM, you’re in danger • Redis is single threaded • Solution • Run 32 Redis instances on a single host (port 6379, 6380, ...) • Can move some of those instances to new host to add capacity • If you configure each instance with multiple databases, you can split even more times. • Utilizes more cores Thursday, April 25, 13
  49. Redis Configuration Tips • Also... • Max number of connections

    allowed is 10240 • Can only be overcome by editing source and recompiling Thursday, April 25, 13
  50. Increasing connections in Ubuntu 12.04 Tips • As root... •

    In /etc/sysctl.conf fs.file-max = 65536 • In /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536 • In ~/.bashrc ulimit -n 65536 Thursday, April 25, 13
  51. MySQL config Tips • Protect yourself from whereless UPDATEs !

    • Example: UPDATE users SET first_name = “Bob”; • Now all your users are named Bob! • Add sql_safe_updates=1 to my.cnf Thursday, April 25, 13