Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Data — Architectures for caching, joins and search

Scaling Data — Architectures for caching, joins and search

While building Rapportive, we learnt a few things about dealing with data — especially data in large volumes, with complex structure, that is constantly changing. And we've developed some patterns for thinking about data and processing in ways that keeps us sane.

This presentation explores the idea of deriving secondary, read-optimised datasets from a primary, raw, write-optimised dataset. That idea is the common pattern behind caches, database indexes and full-text indexes.

The content is related to my blog post on precomputed caches and data dependencies (http://martin.kleppmann.com/2012/10/01/rethinking-caching-in-web-apps.html), though approaching the topic from a different angle.

Talk given at Wonga, London, 11 January 2013.

Martin Kleppmann

January 11, 2013
Tweet

More Decks by Martin Kleppmann

Other Decks in Programming

Transcript

  1. function getDocument(id) { var doc = cache.get(id); if (doc) {

    return doc; } else { doc = db.get(id); cache.set(id, doc, cacheTTL); return doc; } }
  2.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern
  3. CREATE TABLE users ( id integer PRIMARY KEY, name varchar(100)

    NOT NULL ); CREATE TABLE comments ( id integer PRIMARY KEY, in_reply_to integer REFERENCES comments (id), author_id integer REFERENCES users (id), content varchar(10000) );
  4. SELECT * FROM comments WHERE in_reply_to = ?; SELECT *

    FROM comments JOIN users ON author_id = users.id WHERE in_reply_to = ?; users (id, name) comments (id, in_reply_to, author_id, content)
  5.   blackdragonflower: May good fortune smile upon you foe your

    kindness.   Genmaken: Foe real! ▪  N0T_REALLY_RELEVANT: Almost Relevant   CanadianDave: for* ▪  Saucefire: Damn it, Dave, we don't have time foe yoe proper spelling. ▪ rosyatrandom: Ain't nobody got time foe that   beermethestrength: *dat
  6. id in_reply_to author_id content 1 NULL 42 May good fortune

    smile upon… 2 1 43 Foe real! 3 2 44 Almost Relevant
  7. -- Hacker News / Reddit clone, Postgres syntax WITH RECURSIVE

    tree(id, in_reply_to, author_id, content) AS ( SELECT id, in_reply_to, author_id, content FROM comments WHERE id = ? UNION ALL SELECT child.id, child.in_reply_to, child.author_id, child.content FROM tree parent, comments child WHERE parent.id = child.in_reply_to ) SELECT * FROM tree;
  8. id in_reply_to author_id content 1 NULL 42 May good fortune

    smile upon… 2 1 43 Foe real! CREATE INDEX ON comments (in_reply_to); -- Implicit in PRIMARY KEY CREATE UNIQUE INDEX ON comments (id);
  9. id in_reply_to author_id content 2 1 43 Foe real! CREATE

    INDEX ON comments (in_reply_to); SELECT * FROM comments WHERE in_reply_to = 1;
  10. 64 32 96 23 42 73 99 42 < 64

    42 > 32 “To find value 42, look at rows stored in disk blocks 12345 and 23456”
  11.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern
  12.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern RDBMS deals with all of these!
  13. Raw data Caches DB Indexes derive & maintain derive &

    maintain with (lots of) code with 1 line of SQL
  14. -- I already have 200 million rows. -- This may

    take a few hours… CREATE INDEX CONCURRENTLY ON comments (in_reply_to);
  15. -- I already have 200 million rows. -- This may

    take a few hours… CREATE INDEX CONCURRENTLY ON comments (in_reply_to); OMG! No table lock!
  16. enter_sandman → “Say your prayers, little one / don’t forget,

    my son, to include everyone” term occurrences dont enter_sandman:5 everyon enter_sandman:11 forget enter_sandman:6 includ enter_sandman:10 littl enter_sandman:3 my enter_sandman:7 on enter_sandman:4 prayer enter_sandman:2 Tokenization / Stemming / Indexing / …
  17.   Natural language (tokenization, stemming, synonyms)   Relevance ranking  

    Search facets   Spelling correction & suggestions → Warrants being separate from database
  18.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern
  19. Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard

    the Databus!,” at ACM Symposium on Cloud Computing, 2012. http://www.socc2012.org/s18-das.pdf
  20. snapshot of table data CREATE INDEX CONCURRENTLY… time write write

    write write write write writes write snapshot of table data
  21. snapshot of table data CREATE INDEX CONCURRENTLY… time write write

    write write write write writes write snapshot of table data
  22.   Derived data can always be re-generated from raw data

      Derived data optimized for reading," raw data optimized for writing (normalized)   Many different datasets derived from one raw dataset   Update latency depends on application (ACID/online < ‘nearline’ < offline)
  23.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern
  24.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern Just build a new one!
  25.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern Just build a new one! Build a new one regularly!
  26.   Consistency   Staleness   Aborted transactions   Race conditions

      Bootstrapping   Cold start after failure   New feature → new access pattern Just build a new one! Build a new one regularly! Fast stream process
  27. Data to write All data append-only Batch process Precomputed view

    (cache) Message queue Stream process Real-time diff to view Reader
  28. Data to write All data append-only Batch process Precomputed view

    (cache) Message queue Stream process Real-time diff to view Reader e.g. HDFS e.g. Hadoop MapReduce e.g. Voldemort e.g. Kafka e.g. Storm e.g. Memcached
  29.   No such thing as a cache miss   After

    every batch rebuild, discard stream output → “human fault tolerance”   Not feasible if O(n2) [e.g. friends in common] or unbounded [e.g. search]   How does stream process know which keys to update (when using joins)?