$30 off During Our Annual Pro Sale. View Details »

Scaling Instagram

Scaling Instagram

Slides from an AirBnb tech talk I gave; doesn't make the most sense out of context but will hopefully be helpful for any folks that saw the talk

Mike Krieger

April 12, 2012
Tweet

More Decks by Mike Krieger

Other Decks in Technology

Transcript

  1. Scaling Instagram
    AirBnB Tech Talk 2012
    Mike Krieger
    Instagram

    View Slide

  2. me
    - Co-founder, Instagram
    - Previously: UX & Front-end
    @ Meebo
    - Stanford HCI BS/MS
    - @mikeyk on everything

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. communicating and
    sharing in the real world

    View Slide

  7. 30+ million users in less
    than 2 years

    View Slide

  8. the story of how we
    scaled it

    View Slide

  9. a brief tangent

    View Slide

  10. the beginning

    View Slide

  11. Text

    View Slide

  12. 2 product guys

    View Slide

  13. no real back-end
    experience

    View Slide

  14. analytics & python @
    meebo

    View Slide

  15. CouchDB

    View Slide

  16. CrimeDesk SF

    View Slide

  17. View Slide

  18. let’s get hacking

    View Slide

  19. good components in
    place early on

    View Slide

  20. ...but were hosted on a
    single machine
    somewhere in LA

    View Slide

  21. View Slide

  22. less powerful than my
    MacBook Pro

    View Slide

  23. okay, we launched.
    now what?

    View Slide

  24. 25k signups in the first
    day

    View Slide

  25. everything is on fire!

    View Slide

  26. best & worst day of our
    lives so far

    View Slide

  27. load was through the
    roof

    View Slide

  28. first culprit?

    View Slide

  29. View Slide

  30. favicon.ico

    View Slide

  31. 404-ing on Django,
    causing tons of errors

    View Slide

  32. lesson #1: don’t forget
    your favicon

    View Slide

  33. real lesson #1: most of
    your initial scaling
    problems won’t be
    glamorous

    View Slide

  34. favicon

    View Slide

  35. ulimit -n

    View Slide

  36. memcached -t 4

    View Slide

  37. prefork/postfork

    View Slide

  38. friday rolls around

    View Slide

  39. not slowing down

    View Slide

  40. let’s move to EC2.

    View Slide

  41. View Slide

  42. View Slide

  43. scaling = replacing all
    components of a car
    while driving it at
    100mph

    View Slide

  44. since...

    View Slide

  45. “"canonical [architecture]
    of an early stage startup
    in this era."
    (HighScalability.com)

    View Slide

  46. Nginx &
    Redis &
    Postgres &
    Django.

    View Slide

  47. Nginx & HAProxy &
    Redis & Memcached &
    Postgres & Gearman &
    Django.

    View Slide

  48. 24h Ops

    View Slide

  49. View Slide

  50. View Slide

  51. our philosophy

    View Slide

  52. 1 simplicity

    View Slide

  53. 2 optimize for
    minimal operational
    burden

    View Slide

  54. 3 instrument
    everything

    View Slide

  55. walkthrough:
    1 scaling the database
    2 choosing technology
    3 staying nimble
    4 scaling for android

    View Slide

  56. 1 scaling the db

    View Slide

  57. early days

    View Slide

  58. django ORM, postgresql

    View Slide

  59. why pg? postgis.

    View Slide

  60. moved db to its own
    machine

    View Slide

  61. but photos kept growing
    and growing...

    View Slide

  62. ...and only 68GB of
    RAM on biggest
    machine in EC2

    View Slide

  63. so what now?

    View Slide

  64. vertical partitioning

    View Slide

  65. django db routers make
    it pretty easy

    View Slide

  66. def db_for_read(self, model):
    if app_label == 'photos':
    return 'photodb'

    View Slide

  67. ...once you untangle all
    your foreign key
    relationships

    View Slide

  68. a few months later...

    View Slide

  69. photosdb > 60GB

    View Slide

  70. what now?

    View Slide

  71. horizontal partitioning!

    View Slide

  72. aka: sharding

    View Slide

  73. “surely we’ll have hired
    someone experienced
    before we actually need
    to shard”

    View Slide

  74. you don’t get to choose
    when scaling challenges
    come up

    View Slide

  75. evaluated solutions

    View Slide

  76. at the time, none were
    up to task of being our
    primary DB

    View Slide

  77. did in Postgres itself

    View Slide

  78. what’s painful about
    sharding?

    View Slide

  79. 1 data retrieval

    View Slide

  80. hard to know what your
    primary access patterns
    will be w/out any usage

    View Slide

  81. in most cases, user ID

    View Slide

  82. 2 what happens if
    one of your shards
    gets too big?

    View Slide

  83. in range-based schemes
    (like MongoDB), you split

    View Slide

  84. A-H: shard0
    I-Z: shard1

    View Slide

  85. A-D: shard0
    E-H: shard2
    I-P: shard1
    Q-Z: shard2

    View Slide

  86. downsides (especially on
    EC2): disk IO

    View Slide

  87. instead, we pre-split

    View Slide

  88. many many many
    (thousands) of logical
    shards

    View Slide

  89. that map to fewer
    physical ones

    View Slide

  90. // 8 logical shards on 2 machines
    user_id % 8 = logical shard
    logical shards -> physical shard map
    {
    0: A, 1: A,
    2: A, 3: A,
    4: B, 5: B,
    6: B, 7: B
    }

    View Slide

  91. // 8 logical shards on 2 4 machines
    user_id % 8 = logical shard
    logical shards -> physical shard map
    {
    0: A, 1: A,
    2: C, 3: C,
    4: B, 5: B,
    6: D, 7: D
    }

    View Slide

  92. little known but awesome
    PG feature: schemas

    View Slide

  93. not “columns” schema

    View Slide

  94. - database:
    - schema:
    - table:
    - columns

    View Slide

  95. machineA:
    shard0
    photos_by_user
    shard1
    photos_by_user
    shard2
    photos_by_user
    shard3
    photos_by_user

    View Slide

  96. machineA:
    shard0
    photos_by_user
    shard1
    photos_by_user
    shard2
    photos_by_user
    shard3
    photos_by_user
    machineA’:
    shard0
    photos_by_user
    shard1
    photos_by_user
    shard2
    photos_by_user
    shard3
    photos_by_user

    View Slide

  97. machineA:
    shard0
    photos_by_user
    shard1
    photos_by_user
    shard2
    photos_by_user
    shard3
    photos_by_user
    machineC:
    shard0
    photos_by_user
    shard1
    photos_by_user
    shard2
    photos_by_user
    shard3
    photos_by_user

    View Slide

  98. can do this as long as
    you have more logical
    shards than physical
    ones

    View Slide

  99. lesson: take tech/tools
    you know and try first to
    adapt them into a simple
    solution

    View Slide

  100. 2 which tools where?

    View Slide

  101. where to cache /
    otherwise denormalize
    data

    View Slide

  102. we <3 redis

    View Slide

  103. what happens when a
    user posts a photo?

    View Slide

  104. 1 user uploads photo
    with (optional) caption
    and location

    View Slide

  105. 2 synchronous write to
    the media database for
    that user

    View Slide

  106. 3 queues!

    View Slide

  107. 3a if geotagged, async
    worker POSTs to Solr

    View Slide

  108. 3b follower delivery

    View Slide

  109. can’t have every user
    who loads her timeline
    look up all their followers
    and then their photos

    View Slide

  110. instead, everyone gets
    their own list in Redis

    View Slide

  111. media ID is pushed onto
    a list for every person
    who’s following this user

    View Slide

  112. Redis is awesome for
    this; rapid insert, rapid
    subsets

    View Slide

  113. when time to render a
    feed, we take small # of
    IDs, go look up info in
    memcached

    View Slide

  114. Redis is great for...

    View Slide

  115. data structures that are
    relatively bounded

    View Slide

  116. (don’t tie yourself to a
    solution where your in-
    memory DB is your main
    data store)

    View Slide

  117. caching complex objects
    where you want to more
    than GET

    View Slide

  118. ex: counting, sub-
    ranges, testing
    membership

    View Slide

  119. especially when Taylor
    Swift posts live from the
    CMAs

    View Slide

  120. follow graph

    View Slide

  121. v1: simple DB table
    (source_id, target_id,
    status)

    View Slide

  122. who do I follow?
    who follows me?
    do I follow X?
    does X follow me?

    View Slide

  123. DB was busy, so we
    started storing parallel
    version in Redis

    View Slide

  124. follow_all(300 item list)

    View Slide

  125. inconsistency

    View Slide

  126. extra logic

    View Slide

  127. so much extra logic

    View Slide

  128. exposing your support
    team to the idea of
    cache invalidation

    View Slide

  129. View Slide

  130. redesign took a page
    from twitter’s book

    View Slide

  131. PG can handle tens of
    thousands of requests,
    very light memcached
    caching

    View Slide

  132. two takeaways

    View Slide

  133. 1 have a versatile
    complement to your core
    data storage (like Redis)

    View Slide

  134. 2 try not to have two
    tools trying to do the
    same job

    View Slide

  135. 3 staying nimble

    View Slide

  136. 2010: 2 engineers

    View Slide

  137. 2011: 3 engineers

    View Slide

  138. 2012: 5 engineers

    View Slide

  139. scarcity -> focus

    View Slide

  140. engineer solutions that
    you’re not constantly
    returning to because
    they broke

    View Slide

  141. 1 extensive unit-tests
    and functional tests

    View Slide

  142. 2 keep it DRY

    View Slide

  143. 3 loose coupling using
    notifications / signals

    View Slide

  144. 4 do most of our work in
    Python, drop to C when
    necessary

    View Slide

  145. 5 frequent code reviews,
    pull requests to keep
    things in the ‘shared
    brain’

    View Slide

  146. 6 extensive monitoring

    View Slide

  147. munin

    View Slide

  148. statsd

    View Slide

  149. View Slide

  150. “how is the system right
    now?”

    View Slide

  151. “how does this compare
    to historical trends?”

    View Slide

  152. scaling for android

    View Slide

  153. 1 million new users in 12
    hours

    View Slide

  154. great tools that enable
    easy read scalability

    View Slide

  155. redis: slaveof

    View Slide

  156. our Redis framework
    assumes 0+ readslaves

    View Slide

  157. tight iteration loops

    View Slide

  158. statsd & pgfouine

    View Slide

  159. know where you can
    shed load if needed

    View Slide

  160. (e.g. shorter feeds)

    View Slide

  161. if you’re tempted to
    reinvent the wheel...

    View Slide

  162. don’t.

    View Slide

  163. “our app servers
    sometimes kernel panic
    under load”

    View Slide

  164. ...

    View Slide

  165. “what if we write a
    monitoring daemon...”

    View Slide

  166. wait! this is exactly what
    HAProxy is great at

    View Slide

  167. surround yourself with
    awesome advisors

    View Slide

  168. culture of openness
    around engineering

    View Slide

  169. give back; e.g.
    node2dm

    View Slide

  170. focus on making what
    you have better

    View Slide

  171. “fast, beautiful photo
    sharing”

    View Slide

  172. “can we make all of our
    requests 50% the time?”

    View Slide

  173. staying nimble = remind
    yourself of what’s
    important

    View Slide

  174. your users around the
    world don’t care that you
    wrote your own DB

    View Slide

  175. wrapping up

    View Slide

  176. unprecedented times

    View Slide

  177. 2 backend engineers
    can scale a system to
    30+ million users

    View Slide

  178. key word = simplicity

    View Slide

  179. cleanest solution with the
    fewest moving parts as
    possible

    View Slide

  180. don’t over-optimize or
    expect to know ahead of
    time how site will scale

    View Slide

  181. don’t think “someone
    else will join & take care
    of this”

    View Slide

  182. will happen sooner than
    you think; surround
    yourself with great
    advisors

    View Slide

  183. when adding software to
    stack: only if you have to,
    optimizing for operational
    simplicity

    View Slide

  184. few, if any, unsolvable
    scaling challenges for a
    social startup

    View Slide

  185. have fun

    View Slide