Berlin Buzzwords Hybrid Database

Chris Harris Email : [email protected] Twitter : cj_harris5 Hybrid Datastore
Monday, 4 June 12

Traditional Architecture Monday, 4 June 12

Traditional Architecture Controllers Services Application Server Database Web Server HTML
SQL Monday, 4 June 12

Challenge #1 - HTML 5 Monday, 4 June 12

Multiple Client Types Controllers Services Application Server Web Server HTML
None HTML? Database SQL Monday, 4 June 12

HTML 5 Services Application Server Web Server JSON JSON JSON
Controllers Database SQL Monday, 4 June 12

The controllers have moved to the client Expose small JSON
Services The Move to Services + Death to the monolithic deployment Monday, 4 June 12

Service Architecture Service #2 Application Server Web Server Service #1
Application Server ... SQL SQL JSON JSON JSON Monday, 4 June 12

JSON Service becomes a API to client and other applications.
The clients are not bound to the underlaying data store The Service becomes the API - Complexity due to mismatch between the JSON and SQL Monday, 4 June 12

Migrating to MongoDB Service #2 Application Server Web Server Service
#1 Application Server ... SQL JSON JSON JSON JSON Monday, 4 June 12

Here is a “simple” SQL Model mysql> select * from
book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | ﬁrst_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Monday, 4 June 12

The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title"
: "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, 4 June 12

Aggregate Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0
}, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } ); Predicate Parts of the document you want to project Function to apply to the result set Monday, 4 June 12

Challenge #2 - Write Volumes Monday, 4 June 12

Need to Scale Datasource Web Server Service #1 Application Server
SQL JSON JSON JSON Bottleneck! Monday, 4 June 12

Application Cache? Web Server Service #1 Application Server SQL JSON
JSON JSON App Cache Monday, 4 June 12

Issues + Read Only data comes from a Cache -
Writes slow down as need to update the Cache and the Database - Need to keep cache data in sync between Application Servers Monday, 4 June 12

http://community.qlikview.com/cfs-ﬁlesystemﬁle.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/ theqlikviewblog/Cutting-Grass-with-Scissors-_2D00_-2.jpg Monday, 4 June 12

http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg Monday, 4 June 12

Big Data at a Glance Large Dataset Primary Key as
“username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12

Big Data at a Glance • Systems like Google File
System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12

System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12

System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes • Each data node contains many chunks Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12

System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes • Each data node contains many chunks • If a chunk gets too large or a node overloaded, data can be rebalanced Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12

“username” a b c d e f g h s t u v w x y z Monday, 4 June 12

“username” a b c d e f g h s t u v w x y z MongoDB Sharding ( as well as HDFS ) breaks data into chunks (~64 mb) Monday, 4 June 12

Large Dataset Primary Key as “username” Scaling Data Node 1
25% of chunks Data Node 2 25% of chunks Data Node 3 25% of chunks Data Node 4 25% of chunks a b c d e f g h s t u v w x y z Representing data as chunks allows many levels of scale across n data nodes Monday, 4 June 12

Scaling Data Node 1 Data Node 2 Data Node 3
Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The set of chunks can be evenly distributed across n data nodes Monday, 4 June 12

Add Nodes: Chunk Rebalancing Data Node 1 Data Node 2
Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The goal is equilibrium - an equal distribution. As nodes are added (or even removed) chunks can be redistributed for balance. Monday, 4 June 12

Writes Routed to Appropriate Chunk Data Node 1 Data Node
2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Monday, 4 June 12

Writes Routed to Appropriate Chunk Data Node 1 Data Node
2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z Writes are efficiently routed to the appropriate node & chunk Monday, 4 June 12

Chunk Splitting & Balancing Data Node 1 Data Node 2
Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks Monday, 4 June 12

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks Monday, 4 June 12

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks z1 Monday, 4 June 12

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 z1 Each new part of the Z chunk (left & right) now contains half of the keys Monday, 4 June 12

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 z1 As chunks continue to grow and split, they can be rebalanced to keep an equal share of data on each server. Monday, 4 June 12

Reads with Key Routed Efficiently Data Node 1 Data Node
2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z1 Read Key “xavier” Reading a single value by Primary Key Read routed efficiently to speciﬁc chunk containing key z2 Monday, 4 June 12

2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Key “xavier” Reading a single value by Primary Key Read routed efficiently to speciﬁc chunk containing key z1 z2 Monday, 4 June 12

2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Keys “T”->”X” Reading multiple values by Primary Key Reads routed efficiently to speciﬁc chunks in range t u v w x z1 z2 Monday, 4 June 12

Architecture Monday, 4 June 12

Adding MongoS Web Server Service #2 Application Server JSON JSON
JSON JSON MongoS Service #1 Application Server SQL Monday, 4 June 12

Challenge #3 -Offline Processing Monday, 4 June 12

Online / Offline Web Server Application Server JSON JSON JSON
JSON Online Offline Service #2 MongoS Monday, 4 June 12

MongoDB and Hadoop Monday, 4 June 12

MongoDB & Hadoop config config config Shard 1 Shard 2
Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 read config Hadoop mapper Hadoop mapper read config Monday, 4 June 12

MongoDB & Hadoop Hadoop config config config Shard 1 Shard
2 Shard 3 Shard 4 5 9 6 10 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 mapper Hadoop mapper 1 2 2 1 Monday, 4 June 12

MongoDB Hadoop Adapter public void map(LongWritable key, Text value, Context
context) throws ..{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } public void map(Object key, BSONObject value, Context context ) throws ....{ StringTokenizer itr = new StringTokenizer( value.get( "line" ).toString() ); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } Classic Hadoop Word Count - Map Monday, 4 June 12

MongoDB Hadoop Adapter is the same code! public void reduce(
Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException{ int sum = 0; for ( ﬁnal IntWritable val : values ){ sum += val.get(); } context.write( key, new IntWritable(sum)); } Classic Hadoop Word Count - Reduce Monday, 4 June 12

New Hybrid World Monday, 4 June 12

New Hybrid World Application Server Web Server Service #1 Application
Server ... SQL JSON JSON JSON JSON Online Offline Service #2 MongoS Monday, 4 June 12

conferences, appearances http://www.10gen.com/events download at mongodb.org We’re Hiring ! Chris
Harris Email : [email protected] Twitter : cj_harris5 Monday, 4 June 12

Berlin Buzzwords Hybrid Database

Berlin Buzzwords Hybrid Database

More Decks by cj_harris5

Other Decks in Technology

Featured

Transcript