Scaling Instagram

Scaling Instagram

Slides from an AirBnb tech talk I gave; doesn't make the most sense out of context but will hopefully be helpful for any folks that saw the talk

401b07f62a1f23221cfe55e73bf8a813?s=128

Mike Krieger

April 12, 2012
Tweet

Transcript

  1. Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram

  2. me - Co-founder, Instagram - Previously: UX & Front-end @

    Meebo - Stanford HCI BS/MS - @mikeyk on everything
  3. None
  4. None
  5. None
  6. communicating and sharing in the real world

  7. 30+ million users in less than 2 years

  8. the story of how we scaled it

  9. a brief tangent

  10. the beginning

  11. Text

  12. 2 product guys

  13. no real back-end experience

  14. analytics & python @ meebo

  15. CouchDB

  16. CrimeDesk SF

  17. None
  18. let’s get hacking

  19. good components in place early on

  20. ...but were hosted on a single machine somewhere in LA

  21. None
  22. less powerful than my MacBook Pro

  23. okay, we launched. now what?

  24. 25k signups in the first day

  25. everything is on fire!

  26. best & worst day of our lives so far

  27. load was through the roof

  28. first culprit?

  29. None
  30. favicon.ico

  31. 404-ing on Django, causing tons of errors

  32. lesson #1: don’t forget your favicon

  33. real lesson #1: most of your initial scaling problems won’t

    be glamorous
  34. favicon

  35. ulimit -n

  36. memcached -t 4

  37. prefork/postfork

  38. friday rolls around

  39. not slowing down

  40. let’s move to EC2.

  41. None
  42. None
  43. scaling = replacing all components of a car while driving

    it at 100mph
  44. since...

  45. “"canonical [architecture] of an early stage startup in this era."

    (HighScalability.com)
  46. Nginx & Redis & Postgres & Django.

  47. Nginx & HAProxy & Redis & Memcached & Postgres &

    Gearman & Django.
  48. 24h Ops

  49. None
  50. None
  51. our philosophy

  52. 1 simplicity

  53. 2 optimize for minimal operational burden

  54. 3 instrument everything

  55. walkthrough: 1 scaling the database 2 choosing technology 3 staying

    nimble 4 scaling for android
  56. 1 scaling the db

  57. early days

  58. django ORM, postgresql

  59. why pg? postgis.

  60. moved db to its own machine

  61. but photos kept growing and growing...

  62. ...and only 68GB of RAM on biggest machine in EC2

  63. so what now?

  64. vertical partitioning

  65. django db routers make it pretty easy

  66. def db_for_read(self, model): if app_label == 'photos': return 'photodb'

  67. ...once you untangle all your foreign key relationships

  68. a few months later...

  69. photosdb > 60GB

  70. what now?

  71. horizontal partitioning!

  72. aka: sharding

  73. “surely we’ll have hired someone experienced before we actually need

    to shard”
  74. you don’t get to choose when scaling challenges come up

  75. evaluated solutions

  76. at the time, none were up to task of being

    our primary DB
  77. did in Postgres itself

  78. what’s painful about sharding?

  79. 1 data retrieval

  80. hard to know what your primary access patterns will be

    w/out any usage
  81. in most cases, user ID

  82. 2 what happens if one of your shards gets too

    big?
  83. in range-based schemes (like MongoDB), you split

  84. A-H: shard0 I-Z: shard1

  85. A-D: shard0 E-H: shard2 I-P: shard1 Q-Z: shard2

  86. downsides (especially on EC2): disk IO

  87. instead, we pre-split

  88. many many many (thousands) of logical shards

  89. that map to fewer physical ones

  90. // 8 logical shards on 2 machines user_id % 8

    = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B }
  91. // 8 logical shards on 2 4 machines user_id %

    8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D }
  92. little known but awesome PG feature: schemas

  93. not “columns” schema

  94. - database: - schema: - table: - columns

  95. machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

  96. machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user machineA’:

    shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
  97. machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user machineC:

    shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
  98. can do this as long as you have more logical

    shards than physical ones
  99. lesson: take tech/tools you know and try first to adapt

    them into a simple solution
  100. 2 which tools where?

  101. where to cache / otherwise denormalize data

  102. we <3 redis

  103. what happens when a user posts a photo?

  104. 1 user uploads photo with (optional) caption and location

  105. 2 synchronous write to the media database for that user

  106. 3 queues!

  107. 3a if geotagged, async worker POSTs to Solr

  108. 3b follower delivery

  109. can’t have every user who loads her timeline look up

    all their followers and then their photos
  110. instead, everyone gets their own list in Redis

  111. media ID is pushed onto a list for every person

    who’s following this user
  112. Redis is awesome for this; rapid insert, rapid subsets

  113. when time to render a feed, we take small #

    of IDs, go look up info in memcached
  114. Redis is great for...

  115. data structures that are relatively bounded

  116. (don’t tie yourself to a solution where your in- memory

    DB is your main data store)
  117. caching complex objects where you want to more than GET

  118. ex: counting, sub- ranges, testing membership

  119. especially when Taylor Swift posts live from the CMAs

  120. follow graph

  121. v1: simple DB table (source_id, target_id, status)

  122. who do I follow? who follows me? do I follow

    X? does X follow me?
  123. DB was busy, so we started storing parallel version in

    Redis
  124. follow_all(300 item list)

  125. inconsistency

  126. extra logic

  127. so much extra logic

  128. exposing your support team to the idea of cache invalidation

  129. None
  130. redesign took a page from twitter’s book

  131. PG can handle tens of thousands of requests, very light

    memcached caching
  132. two takeaways

  133. 1 have a versatile complement to your core data storage

    (like Redis)
  134. 2 try not to have two tools trying to do

    the same job
  135. 3 staying nimble

  136. 2010: 2 engineers

  137. 2011: 3 engineers

  138. 2012: 5 engineers

  139. scarcity -> focus

  140. engineer solutions that you’re not constantly returning to because they

    broke
  141. 1 extensive unit-tests and functional tests

  142. 2 keep it DRY

  143. 3 loose coupling using notifications / signals

  144. 4 do most of our work in Python, drop to

    C when necessary
  145. 5 frequent code reviews, pull requests to keep things in

    the ‘shared brain’
  146. 6 extensive monitoring

  147. munin

  148. statsd

  149. None
  150. “how is the system right now?”

  151. “how does this compare to historical trends?”

  152. scaling for android

  153. 1 million new users in 12 hours

  154. great tools that enable easy read scalability

  155. redis: slaveof <host> <port>

  156. our Redis framework assumes 0+ readslaves

  157. tight iteration loops

  158. statsd & pgfouine

  159. know where you can shed load if needed

  160. (e.g. shorter feeds)

  161. if you’re tempted to reinvent the wheel...

  162. don’t.

  163. “our app servers sometimes kernel panic under load”

  164. ...

  165. “what if we write a monitoring daemon...”

  166. wait! this is exactly what HAProxy is great at

  167. surround yourself with awesome advisors

  168. culture of openness around engineering

  169. give back; e.g. node2dm

  170. focus on making what you have better

  171. “fast, beautiful photo sharing”

  172. “can we make all of our requests 50% the time?”

  173. staying nimble = remind yourself of what’s important

  174. your users around the world don’t care that you wrote

    your own DB
  175. wrapping up

  176. unprecedented times

  177. 2 backend engineers can scale a system to 30+ million

    users
  178. key word = simplicity

  179. cleanest solution with the fewest moving parts as possible

  180. don’t over-optimize or expect to know ahead of time how

    site will scale
  181. don’t think “someone else will join & take care of

    this”
  182. will happen sooner than you think; surround yourself with great

    advisors
  183. when adding software to stack: only if you have to,

    optimizing for operational simplicity
  184. few, if any, unsolvable scaling challenges for a social startup

  185. have fun