Scaling Data — Architectures for caching, joins and search

Martin Kleppmann • martin@kleppmann.com • http://martinkl.com

http://www.ﬂickr.com/photos/2fs/330318471/

http://www.ﬂickr.com/photos/nasamarshall/4619659804 Don’t let Architecture Astronauts scare you – Spolsky

1.  Caching 2.  Joins 3.  Full-text search

function getDocument(id) { var doc = cache.get(id); if (doc) {
return doc; } else { doc = db.get(id); cache.set(id, doc, cacheTTL); return doc; } }

function updateDocument(id, doc) { cache.set(id, doc, cacheTTL); // or: cache.invalidate(id);
db.set(id, doc); }

  Consistency   Staleness   Aborted transactions   Race conditions
  Bootstrapping   Cold start after failure   New feature → new access pattern

CREATE TABLE users ( id integer PRIMARY KEY, name varchar(100)
NOT NULL ); CREATE TABLE comments ( id integer PRIMARY KEY, in_reply_to integer REFERENCES comments (id), author_id integer REFERENCES users (id), content varchar(10000) );

SELECT * FROM comments WHERE in_reply_to = ?; SELECT *
FROM comments JOIN users ON author_id = users.id WHERE in_reply_to = ?; users (id, name) comments (id, in_reply_to, author_id, content)

  blackdragonﬂower: May good fortune smile upon you foe your
kindness.   Genmaken: Foe real! ▪  N0T_REALLY_RELEVANT: Almost Relevant   CanadianDave: for* ▪  Sauceﬁre: Damn it, Dave, we don't have time foe yoe proper spelling. ▪ rosyatrandom: Ain't nobody got time foe that   beermethestrength: *dat

id in_reply_to author_id content 1 NULL 42 May good fortune
smile upon… 2 1 43 Foe real! 3 2 44 Almost Relevant

-- Hacker News / Reddit clone, Postgres syntax WITH RECURSIVE
tree(id, in_reply_to, author_id, content) AS ( SELECT id, in_reply_to, author_id, content FROM comments WHERE id = ? UNION ALL SELECT child.id, child.in_reply_to, child.author_id, child.content FROM tree parent, comments child WHERE parent.id = child.in_reply_to ) SELECT * FROM tree;

id in_reply_to author_id content 1 NULL 42 May good fortune
smile upon… 2 1 43 Foe real! CREATE INDEX ON comments (in_reply_to); -- Implicit in PRIMARY KEY CREATE UNIQUE INDEX ON comments (id);

id in_reply_to author_id content 2 1 43 Foe real! CREATE
INDEX ON comments (in_reply_to); SELECT * FROM comments WHERE in_reply_to = 1;

64 32 96 23 42 73 99

64 32 96 23 42 73 99 42 < 64
42 > 32 “To ﬁnd value 42, look at rows stored in disk blocks 12345 and 23456”

  Bootstrapping   Cold start after failure   New feature → new access pattern RDBMS deals with all of these!

Raw data Caches DB Indexes derive & maintain derive &
maintain

Raw data Caches DB Indexes derive & maintain derive &
maintain with (lots of) code with 1 line of SQL

-- I already have 200 million rows. -- This may
take a few hours… CREATE INDEX CONCURRENTLY ON comments (in_reply_to);

-- I already have 200 million rows. -- This may
take a few hours… CREATE INDEX CONCURRENTLY ON comments (in_reply_to); OMG! No table lock!

time write write

snapshot of table data CREATE INDEX CONCURRENTLY… time write write

write write

write write writes write

write write write write writes write

CREATE INDEX CONCURRENTLY ON comments (in_reply_to);

enter_sandman → “Say your prayers, little one / don’t forget,
my son, to include everyone” term occurrences dont enter_sandman:5 everyon enter_sandman:11 forget enter_sandman:6 includ enter_sandman:10 littl enter_sandman:3 my enter_sandman:7 on enter_sandman:4 prayer enter_sandman:2 Tokenization / Stemming / Indexing / …

  Natural language (tokenization, stemming, synonyms)   Relevance ranking  
Search facets   Spelling correction & suggestions → Warrants being separate from database

Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard
the Databus!,” at ACM Symposium on Cloud Computing, 2012. http://www.socc2012.org/s18-das.pdf

Caches DB Indexes derive & maintain Search Indexes Raw data

write write write write writes write snapshot of table data

  Derived data can always be re-generated from raw data
  Derived data optimized for reading," raw data optimized for writing (normalized)   Many different datasets derived from one raw dataset   Update latency depends on application (ACID/online < ‘nearline’ < ofﬂine)

  Bootstrapping   Cold start after failure   New feature → new access pattern Just build a new one!

  Bootstrapping   Cold start after failure   New feature → new access pattern Just build a new one! Build a new one regularly!

  Bootstrapping   Cold start after failure   New feature → new access pattern Just build a new one! Build a new one regularly! Fast stream process

Data to write All data append-only Batch process Precomputed view
(cache) Message queue Stream process Real-time diff to view Reader

Data to write All data append-only Batch process Precomputed view
(cache) Message queue Stream process Real-time diff to view Reader e.g. HDFS e.g. Hadoop MapReduce e.g. Voldemort e.g. Kafka e.g. Storm e.g. Memcached

  No such thing as a cache miss   After
every batch rebuild, discard stream output → “human fault tolerance”   Not feasible if O(n2) [e.g. friends in common] or unbounded [e.g. search]   How does stream process know which keys to update (when using joins)?

Scaling Data — Architectures for caching, joins...

Scaling Data — Architectures for caching, joins and search

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript