Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building to Scale

David Cramer
February 24, 2013

Building to Scale

PyCon Russia 2013

David Cramer

February 24, 2013
Tweet

More Decks by David Cramer

Other Decks in Technology

Transcript

  1. DISQUS Massive traffic with a long tail Sentry Counters and

    event aggregation tenXer More stats than we can count Tuesday, February 26, 13
  2. Counters in Sentry event ID 1 event ID 2 event

    ID 3 Redis INCR Redis INCR Redis INCR SQL Update Tuesday, February 26, 13
  3. Counters in Sentry ‣ INCR event_id in Redis ‣ Queue

    buffer incr task ‣ 5 - 10s explicit delay ‣ Task does atomic GET event_id and DEL event_id (Redis pipeline) ‣ No-op If GET is not > 0 ‣ One SQL UPDATE per unique event per delay Tuesday, February 26, 13
  4. Counters in Sentry (cont.) Pros ‣ Solves database row lock

    contention ‣ Redis nodes are horizontally scalable ‣ Easy to implement Cons ‣ Too many dummy (no-op) tasks Tuesday, February 26, 13
  5. Alternative Counters event ID 1 event ID 2 event ID

    3 Redis ZINCRBY Redis ZINCRBY Redis ZINCRBY SQL Update Tuesday, February 26, 13
  6. Sorted Sets in Redis > ZINCRBY events ad93a 1 {ad93a:

    1} > ZINCRBY events ad93a 1 {ad93a: 2} > ZINCRBY events d2ow3 1 {ad93a: 2, d2ow3: 1} Tuesday, February 26, 13
  7. Alternative Counters ‣ ZINCRBY events event_id in Redis ‣ Cron

    buffer flush ‣ ZRANGE events to get pending updates ‣ Fire individual task per update ‣ Atomic ZSCORE events event_id and ZREM events event_id to get and flush count. Tuesday, February 26, 13
  8. Alternative Counters (cont.) Pros ‣ Removes (most) no-op tasks ‣

    Works without a complex queue due to no required delay on jobs Cons ‣ Single Redis key stores all pending updates Tuesday, February 26, 13
  9. Streams in SQL class Activity: SET_RESOLVED = 1 SET_REGRESSION =

    6 TYPE = ( (SET_RESOLVED, 'set_resolved'), (SET_REGRESSION, 'set_regression'), ) event = ForeignKey(Event) type = IntegerField(choices=TYPE) user = ForeignKey(User, null=True) datetime = DateTimeField() data = JSONField(null=True) Tuesday, February 26, 13
  10. Streams in SQL (cont.) >>> Activity(event, SET_RESOLVED, user, now) "David

    marked this event as resolved." >>> Activity(event, SET_REGRESSION, datetime=now) "The system marked this event as a regression." >>> Activity(type=DEPLOY_START, datetime=now) "A deploy started." >>> Activity(type=SET_RESOLVED, datetime=now) "All events were marked as resolved" Tuesday, February 26, 13
  11. Views as a Cache TIMELINE = [] MAX = 500

    def on_event_creation(event): global TIMELINE TIMELINE.insert(0, event) TIMELINE = TIMELINE[:MAX] def get_latest_events(num=100): return TIMELINE[:num] Tuesday, February 26, 13
  12. Views in Redis class Timeline(object): def __init__(self): self.db = Redis()

    def add(self, event): score = float(event.date.strftime('%s.%m')) self.db.zadd('timeline', event.id, score) def list(self, offset=0, limit=-1): return self.db.zrevrange( 'timeline', offset, limit) Tuesday, February 26, 13
  13. Views in Redis (cont.) MAX_SIZE = 10000 def add(self, event):

    score = float(event.date.strftime('%s.%m')) # increment the key and trim the data to avoid # data bloat in a single key with self.db.pipeline() as pipe: pipe.zadd(self.key, event.id, score) pipe.zremrange(self.key, event.id, MAX_SIZE, -1) Tuesday, February 26, 13
  14. Fanout @task(exchange=”counters”) def incr_counter(key, id=None): counter.incr(key, id) @task(exchange=”event_creation”) def on_event_creation(event_id):

    incr_counter.delay('events', event_id) incr_counter.delay('global') # Delay execution on_event_creation(event.id) Tuesday, February 26, 13
  15. Object Cache Prerequisites ‣ Your database can't handle the read-load

    ‣ Your data changes infrequently ‣ You can handle slightly worse performance Tuesday, February 26, 13
  16. Distributing Load with Memcache Memcache 1 Memcache 2 Memcache 3

    Event ID 01 Event ID 04 Event ID 07 Event ID 10 Event ID 13 Event ID 02 Event ID 05 Event ID 08 Event ID 11 Event ID 14 Event ID 03 Event ID 06 Event ID 09 Event ID 12 Event ID 15 Tuesday, February 26, 13
  17. Querying the Object Cache def make_key(model, id): return '{}:{}'.format(model.__name__, id)

    def get_by_ids(model, id_list): model_name = model.__name__ keys = map(make_key, id_list) res = cache.get_multi() pending = set() for id, value in res.iteritems(): if value is None: pending.add(id) if pending: mres = model.objects.in_bulk(pending) cache.set_multi({make_key(o.id): o for o in mres}) res.update(mres) return res Tuesday, February 26, 13
  18. Redis for Persistence Redis 1 Redis 2 Redis 3 Event

    ID 01 Event ID 04 Event ID 07 Event ID 10 Event ID 13 Event ID 02 Event ID 05 Event ID 08 Event ID 11 Event ID 14 Event ID 03 Event ID 06 Event ID 09 Event ID 12 Event ID 15 Tuesday, February 26, 13
  19. Routing with Nydus # create a cluster of Redis connections

    which # partition reads/writes by (hash(key) % size) from nydus.db import create_cluster redis = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db...redis.PartitionRouter', 'hosts': { {0: {'db': 0} for n in xrange(10)}, } }) github.com/disqus/nydus Tuesday, February 26, 13
  20. Sentry's Team Dashboard ‣ Data limited to a single team

    ‣ Simple views which could be materialized ‣ Only entry point for "data for team" Tuesday, February 26, 13
  21. Sentry's Stream View ‣ Data limited to a single project

    ‣ Each project could map to a different DB Tuesday, February 26, 13
  22. DB5 DB6 DB7 DB8 DB9 DB0 DB1 DB2 DB3 DB4

    redis-1 Tuesday, February 26, 13
  23. redis-2 DB5 DB6 DB7 DB8 DB9 DB0 DB1 DB2 DB3

    DB4 redis-1 When a physical machine becomes overloaded migrate a chunk of shards to another machine. Tuesday, February 26, 13