Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The top five questions to ask about NoSQL. JONA...

The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/top-five-questions-about-nosql/jonathan-ellis

Big Data Spain

November 16, 2012
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. ©2012 DataStax Schema is your friend { "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name":

    "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "[email protected]"], }
  2. ©2012 DataStax SQL can be your friend too CREATE TABLE

    users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;
  3. ©2012 DataStax CREATE TABLE users ( id uuid PRIMARY KEY,

    name text, state text, birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses; Collections
  4. ©2012 DataStax CREATE TABLE users ( id uuid PRIMARY KEY,

    name text, state text, birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses; Collections X
  5. ©2012 DataStax CREATE TABLE users ( id uuid PRIMARY KEY,

    name text, state text, birth_date date, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’}; Collections
  6. ©2012 DataStax Joins don’t scale • No joins • No

    subqueries • No aggregation functions* or GROUP BY • ORDER BY?
  7. ©2012 DataStax SELECT * FROM tweets WHERE user_id IN (SELECT

    follower FROM followers WHERE user_id = ’driftx’) followers ? tweets
  8. ©2012 DataStax CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid,

    tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id) ); Clustering in Cassandra user_id tweet_id _author _body jbellis 3290f9da.. rbranson lorem jbellis 3895411a.. tjake ipsum ... ... ... driftx 3290f9da.. rbranson lorem driftx 71b46a84.. yzhang dolor ... ... ... yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ...
  9. ©2012 DataStax CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid,

    tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id) ); Clustering in Cassandra user_id tweet_id _author _body jbellis 3290f9da.. rbranson lorem jbellis 3895411a.. tjake ipsum ... ... ... driftx 3290f9da.. rbranson lorem driftx 71b46a84.. yzhang dolor ... ... ... yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... SELECT * FROM timeline WHERE user_id = ’driftx’;
  10. ©2012 DataStax Memory Hard drive k1 c1:v1 k1 c1:v1 k1

    c2:v2 c2:v2 write( , ) k2 c1:v1 c2:v2 k2 c1:v1 c2:v2 k2 c1:v1 c2:v2
  11. ©2012 DataStax Memory Hard drive k1 c1:v1 k1 c1:v4 k1

    c2:v2 c2:v2 write( , ) k1 c1:v4 c3:v3 k2 c1:v1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v4 c3:v3 c3:v3
  12. ©2012 DataStax Memory Hard drive SSTable flush k1 c1:v4 c2:v2

    k2 c1:v1 c2:v2 c3:v3 index / BF cleanup
  13. ©2012 DataStax Availability • “High availability implies that a single

    fault will not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax • “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram
  14. ©2012 DataStax Scaling antipatterns • Metadata servers • Router bottlenecks

    • Overloading existing nodes when adding capacity
  15. ©2012 DataStax Data model: Realtime Portfolios StockHist stock last GOOG

    $95.52 AAPL $186.10 AMZN $112.98 LiveStocks stock date price GOOG 2011-01-01 $8.23 GOOG 2011-01-02 $6.14 GOOG 2011-001-03 $7.78 user stock shares jbellis GOOG 80 jbellis LNKD 20 yukim AMZN 100
  16. ©2012 DataStax Data model: Analytics worst_date loss 2011-07-23 -$34.81 2011-03-11

    -$11432.24 2011-05-21 -$1476.93 Portfolio1 HistLoss Portfolio2 Portfolio3
  17. ©2012 DataStax Data model: Analytics stock rdate return GOOG 2011-07-25

    $8.23 GOOG 2011-07-24 $6.14 GOOG 2011-07-23 $7.78 AAPL 2011-07-25 $15.32 AAPL 2011-07-24 $12.68 10dayreturns INSERT OVERWRITE TABLE 10dayreturns SELECT a.stock, b.date as rdate, b.price - a.price FROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date);
  18. ©2012 DataStax Data model: Analytics portfolio rdate preturn Portfolio1 2011-07-25

    $118.21 Portfolio1 2011-07-24 $60.78 Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-07-25 $2143.92 Portfolio3 2011-07-24 -$10.19 portfolio_returns INSERT OVERWRITE TABLE portfolio_returns SELECT portfolio, rdate, SUM(b.return) FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock) GROUP BY portfolio, rdate;
  19. ©2012 DataStax Data model: Analytics INSERT OVERWRITE TABLE HistLoss SELECT

    a.portfolio, rdate, minp FROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn); worst_date loss 2011-07-23 -$34.81 2011-03-11 -$11432.24 2011-05-21 -$1476.93 Portfolio1 HistLoss Portfolio2 Portfolio3
  20. Questions? • http://www.flickr.com/photos/26817893@N05/2573006312/ • http://www.flickr.com/photos/rowanbank/7686239548 • http://www.flickr.com/photos/mervtheswerve/6081933265 • http://www.flickr.com/photos/dg_pics/2526208830 •

    http://www.flickr.com/photos/wainwright/351684037 • http://www.flickr.com/photos/mikeneilson/1606662529 • http://www.flickr.com/photos/sbisson/3852905534 • http://www.flickr.com/photos/breadnbadger/2674928517 Image credits