NoSQL use cases, survey of database options, Couchbase architecture. Also how to develop with JSON document databases and how to build Couchbase map reduce indexes.
• 2.2
Billion
internet
users • 50%
Americans
use
smartphones • Your
app
can
grow
overnight • Are
you
ready? 2 Growth
is
the
New
Reality Saturday, October 6, 12
Draw
Something
-‐
Social
Game 5 35 million monthly active users in 1 month about 5 Instagrams (Instagram today is waaaay more than 1 Instagram) Saturday, October 6, 12
Scalable
Data
Layer 9 ●On-‐demand
cluster
sizing ●Grow
or
shrink
with
workload ●Easy
node
provisioning ●All
nodes
are
the
same ●MulA-‐master
Cross-‐Datacenter
ReplicaAon ●For
a
fast
and
reliable
user
experience
worldwide ●EffecAve
Auto-‐sharding ●Should
avoid
cluster
hot
spots Saturday, October 6, 12
Old
School
Hits
a
Scale
Wall 10 Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server Expensive & disruptive sharding, doesn’t perform at web scale Saturday, October 6, 12
Tradi^onal
MySQL
+
Memcached
Architecture 11 ● Run as many MySQL machines as you need ● Data sharded evenly across the machines using client code ● Memcached used to provide faster response time for users and reduce load on the database Memcached
Tier MySQL
Tier App
Servers www.example.com Saturday, October 6, 12
Limita^ons
of
MySQL
+
Memcached 12 ● To scale you need to start using MySQL more simply ● Scale by hand ● Replication / Sharding is a black art ● Code overhead to manage keeping memcache and mysql in sync ● Lots of components to deploy Learn
From
Others
-‐
This
Scenario
Costs
Time
and
Money.
Scaling
SQL
is
poten^ally
disastrous
when
going
Viral:
very
risky
^me
for
major
code
changes
and
migra^ons...
you
have
no
Time
when
skyrocke^ng
up. Saturday, October 6, 12
15 The Key-Value Store – the foundation of NoSQL Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque Binary Value Saturday, October 6, 12
17 Redis
–
More
“Structured
Data”
commands Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 “Data
Structures” Blob List Set Hash … redis In-‐memory
only Vast
set
of
opera^ons Blob
Storage:
Set,
Add,
Replace,
CAS Retrieval:
Get,
Pub-‐Sub Structured
Data:
Strings,
Hashes,
Lists,
Sets, Sorted
lists Example
opera7ons
for
a
Set Add,
count,
subtract
sets,
intersec^on,
is
member?,
atomic
move
from
one
set
to
another Saturday, October 6, 12
25 Cassandra
–
Column
overlays Disk-‐based
system Clustered
External
caching
required
for
low-‐latency
reads “Columns”
are
overlaid
on
the
data Not
all
rows
must
have
all
columns Supports
efficient
queries
on
columns Restart
required
when
adding
columns Good
cross-‐datacenter
support Cassandra Column
1 Column
2 Column
3
(not
present)
Saturday, October 6, 12
27 Neo4j
–
Graph
database Disk-‐based
system External
caching
required
for
low-‐latency
reads Nodes,
rela^onships
and
paths Proper^es
on
nodes Delete,
Insert,
Traverse,
etc. Neo4j Saturday, October 6, 12
(Really)
High
Performance 34 Latency less than 1/2 ms Throughput grows linearly with cluster size 5 Nodes -- 1.75M operations per second Cisco and Solarflare benchmark of Couchbase Server Saturday, October 6, 12
38 Couchbase
Server
Basic
Opera^on COUCHBASE
CLIENT
LIBRARY §Docs
distributed
evenly
across
servers
in
the
cluster §Each
server
stores
both
ac)ve
&
replica
docs § Only
one
server
ac^ve
at
a
^me §Client
library
provides
app
with
simple
interface
to
database §Cluster
map
provides
map
to
which
server
doc
is
on § App
never
needs
to
know § App
reads,
writes,
updates
docs § Mul^ple
App
Servers
can
access
same
document
at
same
^me Doc
2 Doc
5 SERVER
1 Doc
4 SERVER
2 Doc
1 SERVER
3 COUCHBASE
CLIENT
LIBRARY Doc
9 Doc
7 Doc
8 Doc
6 Doc
3 DOC DOC DOC DOC DOC DOC DOC DOC DOC Ac^ve
Docs Ac^ve
Docs Ac^ve
Docs CLUSTER
MAP CLUSTER
MAP APP
SERVER
1 APP
SERVER
2 COUCHBASE
SERVER
CLUSTER Saturday, October 6, 12
41 ●Suddenly, disk writes all began to time out ●Many services experienced outages: ● FourSquare, Reddit, Quora, among others ●With memory buffered writes, a scalable data layer keeps working ● When EBS came back online, Couchbase wrote all the updated data to disk without missing a beat. War
Story:
EBS
Outage Saturday, October 6, 12
42 Cross
Data
Center
Replica^on §Data
close
to
users §Mul^ple
loca^ons
for
disaster
recovery §Independently
managed
clusters
serving
local
data US
DATA
CENTER EUROPE
DATA
CENTER ASIA
DATA
CENTER Replica7on Replica7on Replica7on Saturday, October 6, 12
46 Document
Database This synergy between the programming model and the distribution model is very valuable. It allows the database to use its knowledge of how the application programmer clusters the data to help performance across the cluster. hrp://mar^nfowler.com/bliki/AggregateOrientedDatabase.html o::1001 { uid:
ji22jd, customer:
Ann, line_items:
[
{
sku:
0321293533,
quan:
3,
unit_price:
48.0
}, {
sku:
0321601912,
quan:
1,
unit_price:
39.0
}, {
sku:
0131495054,
quan:
1,
unit_price:
51.0
}
], payment:
{
type:
Amex,
expiry:
04/2001,
last5:
12345
} } Saturday, October 6, 12
Meta
+
Document
Body 50 {
"brewery":
"New
Belgium
Brewing",
"name":
"1554
Enlightened
Black
Ale",
"abv":
5.5,
"descrip7on":
"Born
of
a
flood...",
"category":
"Belgian
and
French
Ale",
"style":
"Other
Belgian-‐Style
Ales",
"updated":
"2010-‐07-‐22
20:00:20" } {
"id"
:
"beer_Enlightened_Black_Ale”,
... { Document user data, can be anything unique ID Metadata identifier, expiration, etc “vintage” date format from an SQL dump >_< Saturday, October 6, 12
53 Emergent
Schema JSON.org Github
API Twiqer
API "Capture
the
user's
intent" • The
database
can
handle
it • Your
app
controls
the
schema Saturday, October 6, 12
group_level=3
-‐
daily
results
-‐
great
for
graphing 67 • Daily,
hourly,
minute
or
second
rollup
all
possible
with
the
same
index. • hrp://crate.im/posts/couchbase-‐views-‐reddit-‐data/ Saturday, October 6, 12
6 GeoCouch
R-‐Tree
Index • Op^mized
for
bulk
loading
of
large
data
sets • Simplified
query
model
(bounding
box,
nearest
neighbor) Saturday, October 6, 12
Elas^c
Search
Adapter 70 • Elas^c
Search
is
good
for
ad-‐hoc
queries
and
faceted
browsing • Our
adapter
is
aware
of
changing
Couchbase
topology • Indexed
by
Elas^c
Search
aOer
stored
to
disk
in
Couchbase Saturday, October 6, 12