Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/top-five-questions-about-nosql/jonathan-ellis
Five questionsfor your NoSQL solution!Jonathan EllisCTO, DataStaxProject Chair, Apache Cassandra
View Slide
©2012 DataStaxhow do Imodelmy application?
©2012 DataStaxPopular options• Key/value• Tabular• Document• Graph?
©2012 DataStaxSchema is your friend{"id": "e451dd42-ece3-11e1-a0a3-34159e154f4c","name": "jbellis","state": "TX","birthdate": "1/1/1976","email_addresses": ["jbellis@gmail", "[email protected]"],}
©2012 DataStaxSQL can be your friend tooCREATE TABLE users (id uuid PRIMARY KEY,name text,state text,birth_date date);CREATE INDEX ON users(state);SELECT * FROM usersWHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;
©2012 DataStaxCREATE TABLE users (id uuid PRIMARY KEY,name text,state text,birth_date date);CREATE TABLE users_addresses (user_id uuid REFERENCES users,email text);SELECT *FROM users NATURAL JOIN users_addresses;Collections
©2012 DataStaxCREATE TABLE users (id uuid PRIMARY KEY,name text,state text,birth_date date);CREATE TABLE users_addresses (user_id uuid REFERENCES users,email text);SELECT *FROM users NATURAL JOIN users_addresses;CollectionsX
©2012 DataStaxCREATE TABLE users (id uuid PRIMARY KEY,name text,state text,birth_date date,email_addresses set);UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’,‘[email protected]’};Collections
©2012 DataStaxJoins don’t scale• No joins• No subqueries• No aggregation functions* or GROUP BY• ORDER BY?
©2012 DataStaxSELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followersWHERE user_id = ’driftx’)followers?tweets
©2012 DataStaxCREATE TABLE timeline (user_id uuid,tweet_id timeuuid,tweet_author uuid,tweet_body text,PRIMARY KEY (user_id,tweet_id));Clustering in Cassandrauser_id tweet_id _author _bodyjbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum... ... ...driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor... ... ...yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet... ... ...
©2012 DataStaxCREATE TABLE timeline (user_id uuid,tweet_id timeuuid,tweet_author uuid,tweet_body text,PRIMARY KEY (user_id,tweet_id));Clustering in Cassandrauser_id tweet_id _author _bodyjbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum... ... ...driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor... ... ...yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet... ... ...SELECT * FROM timelineWHERE user_id = ’driftx’;
©2012 DataStaxhow does itperform?
©2012 DataStaxVLDB benchmark
©2012 DataStaxLocking
©2012 DataStaxEfficiency
©2012 DataStaxUPDATE usersSET email_addresses = email_addresses + {...}WHERE user_id = ‘jbellis’;
©2012 DataStaxDurability
©2012 DataStaxLog-structured storage engineMemoryHard driveMemtablewrite( , )k1 c1:v1Commit log
©2012 DataStaxMemoryHard driveMemtablewrite( , )k1 c1:v1Commit logk1 c1:v1k1 c1:v1
©2012 DataStaxMemoryHard drivewrite( , )k1 c2:v2k1 c1:v1k1 c1:v1k1 c2:v2c2:v2
©2012 DataStaxMemoryHard drivek1 c1:v1k1 c1:v1k1 c2:v2c2:v2write( , )k2 c1:v1 c2:v2k2 c1:v1 c2:v2k2 c1:v1 c2:v2
©2012 DataStaxMemoryHard drivek1 c1:v1k1 c1:v4k1 c2:v2c2:v2write( , )k1 c1:v4 c3:v3k2 c1:v1 c2:v2k2 c1:v1 c2:v2k1 c1:v4 c3:v3c3:v3
©2012 DataStaxMemoryHard driveSSTableflushk1 c1:v4 c2:v2k2 c1:v1 c2:v2c3:v3index / BFcleanup
©2012 DataStaxNo random writes
©2012 DataStaxThe gory details
©2012 DataStaxLarger than memory datasets
©2012 DataStaxhow does it handlefailure?
©2012 DataStaxClassic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4routerclient
©2012 DataStaxAvailability• “High availability implies that a single fault willnot bring down your system. Not ‘we’ll recoverquickly.’”-- Ben Coverston: DataStax• “The biggest problem with failover is that you'realmost never using it until it really hurts. It's likebackups that you never test.”-- Rick Branson: Instagram
©2012 DataStaxFully distributed, no SPOFclientp1p1p1p3p6
©2012 DataStaxMultiple datacenters
©2012 DataStax
©2012 DataStaxSelf-healingClientrequestCoordinatorReplicainternalrequestinternalresponseresponse1234
©2012 DataStaxSelf-healingClientrequestCoordinatorReplicainternalrequest12replica failstimeoutresponse4
©2012 DataStaxSelf-healingClientrequestCoordinatorReplicainternalrequest12Xreplica failstimeoutresponse4
©2012 DataStaxSelf-healingClientrequestCoordinatorReplicainternalrequest124replica failstimeoutresponsehint3
©2012 DataStaxSelf-healingClientrequestCoordinatorReplicainternalrequest124Xreplica failstimeoutresponsehint3
©2012 DataStaxOther healing modes• AntiEntropyService• Read repair
©2012 DataStaxDynamic snitch(dealing with partial failure)Client Coordinator40% busy90% busy30% busy
©2012 DataStaxhow does itscale?
©2012 DataStaxScaling antipatterns• Metadata servers• Router bottlenecks• Overloading existing nodes when addingcapacity
©2012 DataStaxhowflexibleis it?
©2012 DataStaxData model: RealtimePortfoliosStockHiststock lastGOOG $95.52AAPL $186.10AMZN $112.98LiveStocksstock date priceGOOG 2011-01-01 $8.23GOOG 2011-01-02 $6.14GOOG 2011-001-03 $7.78user stock sharesjbellis GOOG 80jbellis LNKD 20yukim AMZN 100
©2012 DataStaxData model: Analyticsworst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93Portfolio1HistLossPortfolio2Portfolio3
©2012 DataStaxData model: Analyticsstock rdate returnGOOG 2011-07-25 $8.23GOOG 2011-07-24 $6.14GOOG 2011-07-23 $7.78AAPL 2011-07-25 $15.32AAPL 2011-07-24 $12.6810dayreturnsINSERT OVERWRITE TABLE 10dayreturnsSELECT a.stock,b.date as rdate,b.price - a.priceFROM StockHist aJOIN StockHist bON (a.stock = b.stockAND date_add(a.date, 10) = b.date);
©2012 DataStaxData model: Analyticsportfolio rdate preturnPortfolio1 2011-07-25 $118.21Portfolio1 2011-07-24 $60.78Portfolio1 2011-07-23 -$34.81Portfolio2 2011-07-25 $2143.92Portfolio3 2011-07-24 -$10.19portfolio_returnsINSERT OVERWRITE TABLE portfolio_returnsSELECT portfolio,rdate,SUM(b.return)FROM portfolios a JOIN 10dayreturns bON (a.stock = b.stock)GROUP BY portfolio, rdate;
©2012 DataStaxData model: AnalyticsINSERT OVERWRITE TABLE HistLossSELECT a.portfolio, rdate, minpFROM (SELECT portfolio, min(preturn) as minpFROM portfolio_returnsGROUP BY portfolio) aJOIN portfolio_returns bON (a.portfolio = b.portfolio and a.minp = b.preturn);worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93Portfolio1HistLossPortfolio2Portfolio3
©2012 DataStaxSome Cassandra users
Questions?• http://www.flickr.com/photos/26817893@N05/2573006312/• http://www.flickr.com/photos/rowanbank/7686239548• http://www.flickr.com/photos/mervtheswerve/6081933265• http://www.flickr.com/photos/dg_pics/2526208830• http://www.flickr.com/photos/wainwright/351684037• http://www.flickr.com/photos/mikeneilson/1606662529• http://www.flickr.com/photos/sbisson/3852905534• http://www.flickr.com/photos/breadnbadger/2674928517Image credits