Messaging at Scale at Instagram by Rick Branson

Afcfefa1f067d10bd021de0cc2e5e806?s=47 PyCon 2013
March 17, 2013
90k

Messaging at Scale at Instagram by Rick Branson

Afcfefa1f067d10bd021de0cc2e5e806?s=128

PyCon 2013

March 17, 2013
Tweet

Transcript

  1. 6.

    SELECT * FROM photos WHERE author_id IN (SELECT target_id FROM

    following WHERE source_id = %(user_id)d) ORDER BY creation_time DESC LIMIT 10; Naive Approach
  2. 7.

    O(∞) •Fetch All Accounts You Follow •Fetch All Photos By

    Those Accounts •Sort Photos By Creation Time •Return First 10
  3. 9.

    382 487 1287 880 27 3201 441 6690 12 SELECT

    follower_id FROM followers WHERE user_id = 9023; {487, 3201, 441} 943058139
  4. 10.

    382 487 1287 880 27 3201 441 6690 12 943058139

    943058139 943058139 943058139 {487, 3201, 441}
  5. 11.
  6. 12.

    Reliability Problems •Database Servers Fail •Web Request is a Scary

    Place •Justin Bieber (Millions of Followers)
  7. 15.

    Web 46 47 48 49 50 51 Broker Worker 46

    Worker 47 Worker X 46 Redistributed
  8. 18.

    Chained Tasks •Batch of 10,000 Followers Per Task •Tasks Yield

    Successive Tasks •Much Finer-Grained Load Balancing •Failure/Reload Penalty Low
  9. 22.

    Gearman & Python •Simple, Purpose-Built Task Queue •Weak Framework Support

    •We just built ad hoc worker scripts •A mess to add new job types & capacity
  10. 23.

    Gearman in Production •Persistence horrifically slow, complex •So we ran

    out of memory and crashed, no recovery •Single core, didn’t scale well: 60ms mean submission time for us •Probably should have just used Redis
  11. 25.

    WARNING System had to be in production before the heat

    death of the universe. We are probably doing something stupid!
  12. 26.

    Celery • Distributed Task Framework • Highly Extensible, Pluggable •

    Mature, Feature Rich • Great Tooling • Excellent Django Support • celeryd
  13. 28.

    Redis •We Already Use It •Very Fast, Efficient •Polling For

    Task Distribution •Messy Non-Synchronous Replication •Memory Limits Task Capacity
  14. 29.

    Beanstalk • Purpose-Built Task Queue • Very Fast, Efficient •

    Pushes to Consumers • Spills to Disk • No Replication • Useless For Anything Else
  15. 30.

    RabbitMQ • Reasonably Fast, Efficient • Spill-To-Disk • Low-Maintenance Synchronous

    Replication • Excellent Celery Compatibility • Supports Other Use Cases • We don’t know Erlang
  16. 31.

    Our RabbitMQ Setup •RabbitMQ 3.0 •Clusters of Two Broker Nodes,

    Mirrored •Scale Out By Adding Broker Clusters •EC2 c1.xlarge, RAID instance storage •Way Overprovisioned
  17. 32.

    Alerting •We use Sensu •Monitors & alerts on queue length

    threshold •Uses rabbitmqctl list_queues
  18. 41.
  19. 42.

    Celery IRL •Easy to understand, new engineers come up to

    speed in 15 minutes. •New job types deployed without fuss. •We hack the config a bit to get what we want.
  20. 45.

    Scaling Out •Celery only supported 1 broker host last year

    when we started. •Created kombu-multibroker "shim" •Multiple brokers used in a round-robin fashion. •Breaks some Celery management tools :(
  21. 47.

    gevent is cool and all, but only some of our

    tasks will run right under it.
  22. 49.

    CELERY_QUEUE_CONFIG = { "default": ( "normal_task", ), "gevent": ( "evented_task",

    ), } CELERY_QUEUE_GROUP = "default" CELERY_QUEUES = [Queue("celery.%s" % key, routing_key=key) for key in CELERY_QUEUES[CELERY_QUEUE_GROUP]]
  23. 52.

    @task(routing_key="task_remote_access"): def check_url(object_id, url): is_bad = run_url_check(url) if is_bad: take_some_action.delay(object_id,

    url) @task(routing_key="task_action"): def take_some_action(object_id, url): do_some_database_thing() Ran on "processes" worker Ran on "gevent" worker
  24. 54.

    Broker 5 4 3 2 0 1 Main Worker Worker

    0 Worker 1 5 4 3 2 1 0 Fetches Batch Wait Until Batch Finishes Before Grabbing Another One
  25. 55.

    •Run higher concurrency? Inefficient :( •Lower batch (prefetch) size? Min

    is concurrency count, inefficient :( •Separate slow & fast tasks :)
  26. 56.

    CELERY_QUEUE_CONFIG = { "default": ( "slow_task", ), "gevent": ( "evented_task",

    ), "fast": ( "fast_task", ), "feed": ( "feed_delivery", ), }
  27. 61.

    Normal Flow 1. Get Tasks 2. Worker Starts Task 3.

    Ack Sent to Broker 4. Worker Finishes Task
  28. 62.

    ACKS_LATE Flow 1. Get Tasks 2. Worker Starts Task 3.

    Worker Finishes Task 4. Ack Sent to Broker
  29. 64.

    Why not do this everywhere? •Tasks must be idempotent! •That

    probably is the case anyway :( •Mirroring can cause duplicate tasks •FLP Impossibility FFFFFFFFFUUUUUUUUU!!!!
  30. 66.

    "... it is impossible for one process to tell whether

    another has died (stopped entirely) or is just running very slowly." Impossibility of Distributed Consensus with One Faulty Process Fischer, Lynch, Patterson (1985)
  31. 69.

    Publisher Confirms •AMQP default is that we don't know if

    things were published or not. :( •Publisher Confirms makes broker send acknowledgements back on publishes. •kombu-multibroker forces this. •Can cause duplicate tasks. (FLP again!)
  32. 72.

    @task(routing_key="media_activation") def deactivate_media_content(media_id): try: media = get_media_store_object(media_id) media.deactivate() except MediaContentRemoteOperationError,

    e: raise deactivate_media_content.retry(countdown=60) Only pass self-contained, non-opaque data (strings, numbers, arrays, lists, and dicts) as arguments to tasks.
  33. 75.

    FUTURE •Better Grip on RabbitMQ Performance •Utilize Result Storage •Single

    Cluster for Control Queues •Eliminate kombu-multibroker