Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Messaging at Scale at Instagram by Rick Branson

PyCon 2013
March 17, 2013
96k

Messaging at Scale at Instagram by Rick Branson

PyCon 2013

March 17, 2013
Tweet

Transcript

  1. Messaging at Scale at Instagram Rick Branson, Infrastructure Engineer

  2. Messaging at Scale at Instagram Rick Branson, Infrastructure Engineer ASYNC

    TASKS AT INSTAGRAM
  3. Instagram Feed

  4. I see photos posted by the accounts I follow.

  5. Photos are time-ordered from newest to oldest.

  6. SELECT * FROM photos WHERE author_id IN (SELECT target_id FROM

    following WHERE source_id = %(user_id)d) ORDER BY creation_time DESC LIMIT 10; Naive Approach
  7. O(∞) •Fetch All Accounts You Follow •Fetch All Photos By

    Those Accounts •Sort Photos By Creation Time •Return First 10
  8. 382 487 1287 880 27 3201 441 6690 12 Per-Account

    Bounded List of Media IDs
  9. 382 487 1287 880 27 3201 441 6690 12 SELECT

    follower_id FROM followers WHERE user_id = 9023; {487, 3201, 441} 943058139
  10. 382 487 1287 880 27 3201 441 6690 12 943058139

    943058139 943058139 943058139 {487, 3201, 441}
  11. Fanout-On-Write •O(1) read cost •O(N) write cost (N = followers)

    •Reads outnumber writes 100:1 or more
  12. Reliability Problems •Database Servers Fail •Web Request is a Scary

    Place •Justin Bieber (Millions of Followers)
  13. Web 46 47 48 49 50 51 Broker Worker 46

    Worker 47 Worker
  14. Web 46 47 48 49 50 51 Broker Worker 46

    Worker 47 Worker X
  15. Web 46 47 48 49 50 51 Broker Worker 46

    Worker 47 Worker X 46 Redistributed
  16. Chained Tasks deliver(photo_id=1234, following_id=5678, cursor=None)

  17. Chained Tasks deliver(photo_id=1234, following_id=5678, cursor=None) deliver(photo_id=1234, following_id=5678, cursor=3493)

  18. Chained Tasks •Batch of 10,000 Followers Per Task •Tasks Yield

    Successive Tasks •Much Finer-Grained Load Balancing •Failure/Reload Penalty Low
  19. What else?

  20. Other Async Tasks •Cross-Posting to Other Networks •Search Indexing •Spam

    Analysis •Account Deletion •API Hook
  21. In the beginning...

  22. Gearman & Python •Simple, Purpose-Built Task Queue •Weak Framework Support

    •We just built ad hoc worker scripts •A mess to add new job types & capacity
  23. Gearman in Production •Persistence horrifically slow, complex •So we ran

    out of memory and crashed, no recovery •Single core, didn’t scale well: 60ms mean submission time for us •Probably should have just used Redis
  24. We needed a fresh start.

  25. WARNING System had to be in production before the heat

    death of the universe. We are probably doing something stupid!
  26. Celery • Distributed Task Framework • Highly Extensible, Pluggable •

    Mature, Feature Rich • Great Tooling • Excellent Django Support • celeryd
  27. Which broker?

  28. Redis •We Already Use It •Very Fast, Efficient •Polling For

    Task Distribution •Messy Non-Synchronous Replication •Memory Limits Task Capacity
  29. Beanstalk • Purpose-Built Task Queue • Very Fast, Efficient •

    Pushes to Consumers • Spills to Disk • No Replication • Useless For Anything Else
  30. RabbitMQ • Reasonably Fast, Efficient • Spill-To-Disk • Low-Maintenance Synchronous

    Replication • Excellent Celery Compatibility • Supports Other Use Cases • We don’t know Erlang
  31. Our RabbitMQ Setup •RabbitMQ 3.0 •Clusters of Two Broker Nodes,

    Mirrored •Scale Out By Adding Broker Clusters •EC2 c1.xlarge, RAID instance storage •Way Overprovisioned
  32. Alerting •We use Sensu •Monitors & alerts on queue length

    threshold •Uses rabbitmqctl list_queues
  33. Graphing •We use graphite & statsd •Per-task sent/fail/success/retry graphs •Using

    celery's hooks to make them possible
  34. 0A us-east-1a us-east-1e 0E 1A 1E 2A 2E web workers

  35. Mean vs P90 Publish Times (ms)

  36. Tasks per second

  37. Aggregate CPU% (all RabbitMQs)

  38. Wait, ~4000 tasks/sec... I thought you said scale?

  39. ~25,000 app threads publishing tasks

  40. Spans Datacenters

  41. Scale Out

  42. Celery IRL •Easy to understand, new engineers come up to

    speed in 15 minutes. •New job types deployed without fuss. •We hack the config a bit to get what we want.
  43. @task(routing_key="task_queue") def task_function(task_arg, another_task_arg): do_things() Related tasks run on the

    same queue
  44. task_function.delay("foo", "bar")

  45. Scaling Out •Celery only supported 1 broker host last year

    when we started. •Created kombu-multibroker "shim" •Multiple brokers used in a round-robin fashion. •Breaks some Celery management tools :(
  46. Concurrency Models •multiprocessing (pre-fork) •eventlet •gevent •threads

  47. gevent is cool and all, but only some of our

    tasks will run right under it.
  48. celeryd_multi Run multiple workers with different parameters (such as concurrency

    settings)
  49. CELERY_QUEUE_CONFIG = { "default": ( "normal_task", ), "gevent": ( "evented_task",

    ), } CELERY_QUEUE_GROUP = "default" CELERY_QUEUES = [Queue("celery.%s" % key, routing_key=key) for key in CELERY_QUEUES[CELERY_QUEUE_GROUP]]
  50. gevent = Network Bound •Facebook API •Tumblr API •Various Background

    S3 Tasks •Checking URLs for Spam
  51. Problem: Network-Bound Tasks Sometimes Need To Take Some Action

  52. @task(routing_key="task_remote_access"): def check_url(object_id, url): is_bad = run_url_check(url) if is_bad: take_some_action.delay(object_id,

    url) @task(routing_key="task_action"): def take_some_action(object_id, url): do_some_database_thing() Ran on "processes" worker Ran on "gevent" worker
  53. Problem: Slow Tasks Monopolize Workers

  54. Broker 5 4 3 2 0 1 Main Worker Worker

    0 Worker 1 5 4 3 2 1 0 Fetches Batch Wait Until Batch Finishes Before Grabbing Another One
  55. •Run higher concurrency? Inefficient :( •Lower batch (prefetch) size? Min

    is concurrency count, inefficient :( •Separate slow & fast tasks :)
  56. CELERY_QUEUE_CONFIG = { "default": ( "slow_task", ), "gevent": ( "evented_task",

    ), "fast": ( "fast_task", ), "feed": ( "feed_delivery", ), }
  57. Our Concurrency Levels fast (14) default (6) feed (12)

  58. Problem: Tasks Fail Sometimes

  59. @task(routing_key="media_activation") def deactivate_media_content(media_id): try: media = get_media_store_object(media_id) media.deactivate() except MediaContentRemoteOperationError,

    e: raise deactivate_media_content.retry(countdown=60) Wait 60 seconds before retrying.
  60. Problem: Worker Crashes Still Lose Tasks

  61. Normal Flow 1. Get Tasks 2. Worker Starts Task 3.

    Ack Sent to Broker 4. Worker Finishes Task
  62. ACKS_LATE Flow 1. Get Tasks 2. Worker Starts Task 3.

    Worker Finishes Task 4. Ack Sent to Broker
  63. @task(routing_key="feed_delivery", acks_late=True) def deliver_media_to_follower_feeds(media_id, following_user_id, resume_at=None): ...

  64. Why not do this everywhere? •Tasks must be idempotent! •That

    probably is the case anyway :( •Mirroring can cause duplicate tasks •FLP Impossibility FFFFFFFFFUUUUUUUUU!!!!
  65. There is no such thing as running tasks exactly-once.

  66. "... it is impossible for one process to tell whether

    another has died (stopped entirely) or is just running very slowly." Impossibility of Distributed Consensus with One Faulty Process Fischer, Lynch, Patterson (1985)
  67. NLP Proof Gives Us Choices: To retry or not to

    retry
  68. Problem: Early on, we noticed overloaded brokers were dropping tasks...

  69. Publisher Confirms •AMQP default is that we don't know if

    things were published or not. :( •Publisher Confirms makes broker send acknowledgements back on publishes. •kombu-multibroker forces this. •Can cause duplicate tasks. (FLP again!)
  70. Other Rules of Thumb

  71. Avoid using async tasks as a "backup" mechanism only during

    failures. It'll probably break.
  72. @task(routing_key="media_activation") def deactivate_media_content(media_id): try: media = get_media_store_object(media_id) media.deactivate() except MediaContentRemoteOperationError,

    e: raise deactivate_media_content.retry(countdown=60) Only pass self-contained, non-opaque data (strings, numbers, arrays, lists, and dicts) as arguments to tasks.
  73. Tasks should usually execute within a few seconds. They gum

    up the works otherwise.
  74. CELERYD_TASK_SOFT_TIME_LIMIT = 20 CELERYD_TASK_TIME_LIMIT = 30

  75. FUTURE •Better Grip on RabbitMQ Performance •Utilize Result Storage •Single

    Cluster for Control Queues •Eliminate kombu-multibroker
  76. We're hiring! jobs@instagram.com