Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building to Scale (PyCon TW 2013)

Building to Scale (PyCon TW 2013)

David Cramer

May 26, 2013
Tweet

More Decks by David Cramer

Other Decks in Programming

Transcript

  1. DISQUS Massive traffic with a long tail Sentry Counters and

    event aggregation tenXer More stats than we can count Sunday, May 26, 13
  2. Counters in Redis INCR counter 1 >>> redis.incr('counter') The key

    doesn't have to exist! Sunday, May 26, 13
  3. Counters in Sentry event ID 1 event ID 2 event

    ID 3 Redis INCR Redis INCR Redis INCR SQL Update Buffers! event ID 1 Sunday, May 26, 13
  4. Counters in Sentry ‣ INCR event_id in Redis ‣ Queue

    buffer incr task ‣ 5 - 10s explicit delay ‣ Task does atomic GET event_id and DEL event_id (Redis pipeline) ‣ No-op If GET is not > 0 ‣ One SQL UPDATE per unique event per delay Sunday, May 26, 13
  5. Counters in Sentry (cont.) Pros ‣ Solves database row lock

    contention ‣ Redis nodes are horizontally scalable ‣ Easy to implement Cons ‣ Too many dummy (no-op) tasks Sunday, May 26, 13
  6. Alternative Counters event ID 1 event ID 2 event ID

    3 Redis ZINCRBY Redis ZINCRBY Redis ZINCRBY SQL Update Sunday, May 26, 13
  7. Sorted Sets in Redis > ZINCRBY events ad93a 1 {ad93a:

    1} > ZINCRBY events ad93a 1 {ad93a: 2} > ZINCRBY events d2ow3 1 {ad93a: 2, d2ow3: 1} Sunday, May 26, 13
  8. Alternative Counters ‣ ZINCRBY events event_id in Redis ‣ Cron

    buffer flush ‣ ZRANGE events to get pending updates ‣ Fire individual task per update ‣ Atomic ZSCORE events event_id and ZREM events event_id to get and flush count. Sunday, May 26, 13
  9. Alternative Counters (cont.) Pros ‣ Removes (most) no-op tasks ‣

    Works without a complex queue due to no required delay on jobs Cons ‣ Single Redis key stores all pending updates Sunday, May 26, 13
  10. Streams in SQL class Activity: SET_RESOLVED = 1 SET_REGRESSION =

    6 TYPE = ( (SET_RESOLVED, 'set_resolved'), (SET_REGRESSION, 'set_regression'), ) event = ForeignKey(Event) type = IntegerField(choices=TYPE) user = ForeignKey(User, null=True) datetime = DateTimeField() data = JSONField(null=True) Sunday, May 26, 13
  11. Streams in SQL (cont.) >>> Activity(event, SET_RESOLVED, user, now) "David

    marked this event as resolved." >>> Activity(event, SET_REGRESSION, datetime=now) "The system marked this event as a regression." >>> Activity(type=DEPLOY_START, datetime=now) "A deploy started." >>> Activity(type=SET_RESOLVED, datetime=now) "All events were marked as resolved" Sunday, May 26, 13
  12. Views as a Cache TIMELINE = [] MAX = 500

    def on_event_creation(event): global TIMELINE TIMELINE.insert(0, event) TIMELINE = TIMELINE[:MAX] def get_latest_events(num=100): return TIMELINE[:num] Sunday, May 26, 13
  13. Views in Redis class Timeline(object): def __init__(self): self.db = Redis()

    def add(self, event): score = float(event.date.strftime('%s.%m')) self.db.zadd('timeline', event.id, score) def list(self, offset=0, limit=-1): return self.db.zrevrange( 'timeline', offset, limit) Sunday, May 26, 13
  14. Views in Redis (cont.) MAX_SIZE = 10000 def add(self, event):

    score = float(event.date.strftime('%s.%m')) # increment the key and trim the data to avoid # data bloat in a single key with self.db.pipeline() as pipe: pipe.zadd(self.key, event.id, score) pipe.zremrange(self.key, event.id, MAX_SIZE, -1) Sunday, May 26, 13
  15. Fanout @task(queue=”counters”) def incr_counter(key, id=None): counter.incr(key, id) @task(queue=”event_creation”) def on_event_creation(event_id):

    incr_counter.delay('events', event_id) incr_counter.delay('global') # Delay execution on_event_creation.delay(event.id) Sunday, May 26, 13
  16. @task def add_everything(offset=0, limit=1000): results = chunked(Event, offset, limit) for

    event in results: add_event.delay(event.id) if len(results) == limit: return # finished! add_everything.delay( offset=offset + limit, limit=limit, ) Sunday, May 26, 13
  17. Object Cache Prerequisites ‣ Your database can't handle the read-load

    ‣ Your data changes infrequently ‣ You can handle slightly worse performance Sunday, May 26, 13
  18. Distributing Load with Memcache Memcache 1 Memcache 2 Memcache 3

    Event ID 01 Event ID 04 Event ID 07 Event ID 10 Event ID 13 Event ID 02 Event ID 05 Event ID 08 Event ID 11 Event ID 14 Event ID 03 Event ID 06 Event ID 09 Event ID 12 Event ID 15 Sunday, May 26, 13
  19. Querying the Object Cache def make_key(model, id): return '{}:{}'.format(model.__name__, id)

    def get_by_ids(model, id_list): model_name = model.__name__ keys = map(make_key, id_list) res = cache.get_multi(keys) pending = set() for id, value in res.iteritems(): if value is None: pending.add(id) if pending: mres = model.objects.in_bulk(pending) cache.set_multi({make_key(o.id): o for o in mres}) res.update(mres) return res Sunday, May 26, 13
  20. Pushing State def save(self): cache.set(make_key(type(self), self.id), self) def delete(self): cache.delete(make_key(type(self),

    self.id) # or use a tombstone cache.set(make_key(type(self), self.id, DELETED) Sunday, May 26, 13
  21. Sentry's Team Dashboard ‣ Data limited to a single team

    ‣ Simple views which could be materialized ‣ Only entry point for "data for team" Sunday, May 26, 13
  22. Sentry's Stream View ‣ Data limited to a single project

    ‣ Each project could map to a different DB Sunday, May 26, 13
  23. DB5 DB6 DB7 DB8 DB9 DB0 DB1 DB2 DB3 DB4

    redis-1 Sunday, May 26, 13
  24. redis-2 DB5 DB6 DB7 DB8 DB9 DB0 DB1 DB2 DB3

    DB4 redis-1 When a physical machine becomes overloaded migrate a chunk of shards to another machine. Sunday, May 26, 13