Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2017-03 Lisa Guo: Scaling Instagram Infrastructure (QCon London 2017, 87p)

xy
December 01, 2023

2017-03 Lisa Guo: Scaling Instagram Infrastructure (QCon London 2017, 87p)

xy

December 01, 2023
Tweet

More Decks by xy

Other Decks in Technology

Transcript

  1. INSTAGRAM EVERYDAY 400 Million Users 4+ Billion likes 100 Million

    photo/video uploads Top account: 110 Million followers
  2. STORAGE VS. COMPUTING • Storage: needs to be consistent across

    data centers • Computing: driven by user traffic, as needed basis
  3. MEMCACHE • High performance key-value store in memory • Millions

    of reads/writes per second • Sensitive to network condition • Cross region operation is prohibitive No global consistency
  4. Django memcache PostgreSQL User C comment insert set DC1 Django

    memcache PostgreSQL User R feed get DC2 replication
  5. Django memcache PostgreSQL User C comment insert set DC1 Django

    memcache PostgreSQL User R feed DC2 replication Cache invalidate Cache invalidate get
  6. MEMCACHE LEASE d1 d2 memcache db time lease-get fill lease-get

    wait or use stale read from DB lease-set lease-get hit
  7. INSTAGRAM STACK - MULTI REGION Django RabbitMQ PostgreSQL Cassandra Celery

    memcache Django RabbitMQ PostgreSQL Cassandra Celery memcache DC1 DC2
  8. SCALING OUT - CHALLENGES, OPPORTUNITIES • Beyond North America •

    More localized social network • Direct messaging • Live streaming
  9. 20 40 60 80 100 0 2 4 6 8

    10 12 14 16 18 20 22 24 User growth Server growth
  10. COLLECT struct perf_event_attr pe; pe.type = PERF_TYPE_HARDWARE; pe.config = PERF_COUNT_HW_INSTRUCTIONS;

    fd = perf_event_open(&pe, 0, -1, -1, 0); ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); <code you want to measure> ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); read(fd, &count, sizeof(long long));
  11. DYNOSTATS 20 40 60 80 100 0 2 4 6

    8 10 12 14 16 18 20 22 24 Follow Feed Explore
  12. REGRESSION 20 40 60 80 100 0 2 4 6

    8 10 12 14 16 18 20 22 24
  13. PYTHON CPROFILE import cProfile, pstats, StringIO pr = cProfile.Profile() pr.enable()

    # ... do something ... pr.disable() s = StringIO.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print s.getvalue()
  14. CPU - ANALYZE continuous profiling 20 40 60 80 100

    0 2 4 6 8 10 12 14 16 18 20 22 24 Caller Callee Callee
  15. CPU - OPTIMIZE C is really faster • Candidate functions:

    • Used extensively • Stable • Cython or C/C++
  16. SCALE UP: NETWORK LATENCY Synchronous processing model with long latency

    ===> Worker starvation and fewer CPU instr executed
  17. SCALE UP: CHALLENGES, OPPORTUNITIES • Faster python run-time • Async

    web framework • Better memory analysis • etc etc
  18. SCALING TEAM 30% engineers joined in last 6 months Bootcampers

    - 1 week Hack-A-Month - 4 weeks Intern - 12 weeks
  19. Comment Filtering Self-harm Prevention Windows App Multiple media in one

    post Video View Notification Saved Posts First Story Notification Instagram Live Instagram Stories
  20. Which server? NewTable or New Column? What Index? Should I

    cache it? Will I lock up DB? Will I bring down Instagram?
  21. WHAT WE WANT • Automatically handle cache • Define relations,

    not worry about implementations • Self service by product engineers • Infra focuses on scale
  22. Comment Filtering Self-harm Prevention Windows App Multiple media in one

    post Video View Notification Saved Posts First Story Notification Instagram Live Instagram Stories
  23. SOURCE CONTROL • Context switching • Code sync/merge overhead •

    Surprises • Refactor/major upgrade • Performance tracking harder With branches
  24. SOURCE CONTROL • Continous integration • Collaborate easily • Fast

    bisect and revert • Continuous performance monitoring No branches