NoSQL use cases, survey of database options, Couchbase architecture. Also how to develop with JSON document databases and how to build Couchbase map reduce indexes.
with workload •Easy node provisioning •All nodes are the same •MulA-‐master Cross-‐Datacenter ReplicaAon •For a fast and reliable user experience worldwide •EffecAve Auto-‐sharding •Should avoid cluster hot spots Saturday, October 6, 12
Just add more commodity web servers Database Scales Up Get a bigger, more complex server Expensive & disruptive sharding, doesn’t perform at web scale Saturday, October 6, 12
MySQL machines as you need • Data sharded evenly across the machines using client code • Memcached used to provide faster response time for users and reduce load on the database Memcached Tier MySQL Tier App Servers www.example.com Saturday, October 6, 12
need to start using MySQL more simply • Scale by hand • Replication / Sharding is a black art • Code overhead to manage keeping memcache and mysql in sync • Lots of components to deploy Learn From Others -‐ This Scenario Costs Time and Money. Scaling SQL is poten^ally disastrous when going Viral: very risky ^me for major code changes and migra^ons... you have no Time when skyrocke^ng up. Saturday, October 6, 12
101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 “Data Structures” Blob List Set Hash … redis In-‐memory only Vast set of opera^ons Blob Storage: Set, Add, Replace, CAS Retrieval: Get, Pub-‐Sub Structured Data: Strings, Hashes, Lists, Sets, Sorted lists Example opera7ons for a Set Add, count, subtract sets, intersec^on, is member?, atomic move from one set to another Saturday, October 6, 12
caching required for low-‐latency reads “Columns” are overlaid on the data Not all rows must have all columns Supports efficient queries on columns Restart required when adding columns Good cross-‐datacenter support Cassandra Column 1 Column 2 Column 3 (not present) Saturday, October 6, 12
evenly across servers in the cluster §Each server stores both ac)ve & replica docs § Only one server ac^ve at a ^me §Client library provides app with simple interface to database §Cluster map provides map to which server doc is on § App never needs to know § App reads, writes, updates docs § Mul^ple App Servers can access same document at same ^me Doc 2 Doc 5 SERVER 1 Doc 4 SERVER 2 Doc 1 SERVER 3 COUCHBASE CLIENT LIBRARY Doc 9 Doc 7 Doc 8 Doc 6 Doc 3 DOC DOC DOC DOC DOC DOC DOC DOC DOC Ac^ve Docs Ac^ve Docs Ac^ve Docs CLUSTER MAP CLUSTER MAP APP SERVER 1 APP SERVER 2 COUCHBASE SERVER CLUSTER Saturday, October 6, 12
services experienced outages: • FourSquare, Reddit, Quora, among others •With memory buffered writes, a scalable data layer keeps working • When EBS came back online, Couchbase wrote all the updated data to disk without missing a beat. War Story: EBS Outage Saturday, October 6, 12
loca^ons for disaster recovery §Independently managed clusters serving local data US DATA CENTER EUROPE DATA CENTER ASIA DATA CENTER Replica7on Replica7on Replica7on Saturday, October 6, 12
the distribution model is very valuable. It allows the database to use its knowledge of how the application programmer clusters the data to help performance across the cluster. hrp://mar^nfowler.com/bliki/AggregateOrientedDatabase.html o::1001 { uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001, last5: 12345 } } Saturday, October 6, 12
Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "descrip7on": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-‐Style Ales", "updated": "2010-‐07-‐22 20:00:20" } { "id" : "beer_Enlightened_Black_Ale”, ... { Document user data, can be anything unique ID Metadata identifier, expiration, etc “vintage” date format from an SQL dump >_< Saturday, October 6, 12
Daily, hourly, minute or second rollup all possible with the same index. • hrp://crate.im/posts/couchbase-‐views-‐reddit-‐data/ Saturday, October 6, 12
ad-‐hoc queries and faceted browsing • Our adapter is aware of changing Couchbase topology • Indexed by Elas^c Search aOer stored to disk in Couchbase Saturday, October 6, 12