Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Berlin Buzzwords Hybrid Database

Berlin Buzzwords Hybrid Database

Berlin Buzzwords 2012 : Hybrid Database Applications

cj_harris5

June 04, 2012
Tweet

More Decks by cj_harris5

Other Decks in Technology

Transcript

  1. HTML 5 Services Application Server Web Server JSON JSON JSON

    Controllers Database SQL Monday, 4 June 12
  2. The controllers have moved to the client Expose small JSON

    Services The Move to Services + Death to the monolithic deployment Monday, 4 June 12
  3. Service Architecture Service #2 Application Server Web Server Service #1

    Application Server ... SQL SQL JSON JSON JSON Monday, 4 June 12
  4. JSON Service becomes a API to client and other applications.

    The clients are not bound to the underlaying data store The Service becomes the API - Complexity due to mismatch between the JSON and SQL Monday, 4 June 12
  5. Migrating to MongoDB Service #2 Application Server Web Server Service

    #1 Application Server ... SQL JSON JSON JSON JSON Monday, 4 June 12
  6. Here is a “simple” SQL Model mysql> select * from

    book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Monday, 4 June 12
  7. The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title"

    : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, 4 June 12
  8. Aggregate Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0

    }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } ); Predicate Parts of the document you want to project Function to apply to the result set Monday, 4 June 12
  9. Need to Scale Datasource Web Server Service #1 Application Server

    SQL JSON JSON JSON Bottleneck! Monday, 4 June 12
  10. Need to Scale Datasource Web Server Service #1 Application Server

    SQL JSON JSON JSON Bottleneck! Monday, 4 June 12
  11. Issues + Read Only data comes from a Cache -

    Writes slow down as need to update the Cache and the Database - Need to keep cache data in sync between Application Servers Monday, 4 June 12
  12. Big Data at a Glance Large Dataset Primary Key as

    “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12
  13. Big Data at a Glance • Systems like Google File

    System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12
  14. Big Data at a Glance • Systems like Google File

    System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12
  15. Big Data at a Glance • Systems like Google File

    System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes • Each data node contains many chunks Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12
  16. Big Data at a Glance • Systems like Google File

    System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes • Each data node contains many chunks • If a chunk gets too large or a node overloaded, data can be rebalanced Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ... Monday, 4 June 12
  17. Big Data at a Glance Large Dataset Primary Key as

    “username” a b c d e f g h s t u v w x y z Monday, 4 June 12
  18. Big Data at a Glance Large Dataset Primary Key as

    “username” a b c d e f g h s t u v w x y z MongoDB Sharding ( as well as HDFS ) breaks data into chunks (~64 mb) Monday, 4 June 12
  19. Large Dataset Primary Key as “username” Scaling Data Node 1

    25% of chunks Data Node 2 25% of chunks Data Node 3 25% of chunks Data Node 4 25% of chunks a b c d e f g h s t u v w x y z Representing data as chunks allows many levels of scale across n data nodes Monday, 4 June 12
  20. Large Dataset Primary Key as “username” Scaling Data Node 1

    25% of chunks Data Node 2 25% of chunks Data Node 3 25% of chunks Data Node 4 25% of chunks a b c d e f g h s t u v w x y z Representing data as chunks allows many levels of scale across n data nodes Monday, 4 June 12
  21. Scaling Data Node 1 Data Node 2 Data Node 3

    Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The set of chunks can be evenly distributed across n data nodes Monday, 4 June 12
  22. Scaling Data Node 1 Data Node 2 Data Node 3

    Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The set of chunks can be evenly distributed across n data nodes Monday, 4 June 12
  23. Add Nodes: Chunk Rebalancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The goal is equilibrium - an equal distribution. As nodes are added (or even removed) chunks can be redistributed for balance. Monday, 4 June 12
  24. Add Nodes: Chunk Rebalancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The goal is equilibrium - an equal distribution. As nodes are added (or even removed) chunks can be redistributed for balance. Monday, 4 June 12
  25. Writes Routed to Appropriate Chunk Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Monday, 4 June 12
  26. Writes Routed to Appropriate Chunk Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Monday, 4 June 12
  27. Writes Routed to Appropriate Chunk Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z Writes are efficiently routed to the appropriate node & chunk Monday, 4 June 12
  28. Writes Routed to Appropriate Chunk Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z Writes are efficiently routed to the appropriate node & chunk Monday, 4 June 12
  29. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks Monday, 4 June 12
  30. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks Monday, 4 June 12
  31. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks Monday, 4 June 12
  32. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks z1 Monday, 4 June 12
  33. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks z1 Monday, 4 June 12
  34. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 z1 Each new part of the Z chunk (left & right) now contains half of the keys Monday, 4 June 12
  35. Chunk Splitting & Balancing Data Node 1 Data Node 2

    Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 z1 As chunks continue to grow and split, they can be rebalanced to keep an equal share of data on each server. Monday, 4 June 12
  36. Reads with Key Routed Efficiently Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z1 Read Key “xavier” Reading a single value by Primary Key Read routed efficiently to specific chunk containing key z2 Monday, 4 June 12
  37. Reads with Key Routed Efficiently Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Key “xavier” Reading a single value by Primary Key Read routed efficiently to specific chunk containing key z1 z2 Monday, 4 June 12
  38. Reads with Key Routed Efficiently Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Key “xavier” Reading a single value by Primary Key Read routed efficiently to specific chunk containing key z1 z2 Monday, 4 June 12
  39. Reads with Key Routed Efficiently Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Keys “T”->”X” Reading multiple values by Primary Key Reads routed efficiently to specific chunks in range t u v w x z1 z2 Monday, 4 June 12
  40. Reads with Key Routed Efficiently Data Node 1 Data Node

    2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Keys “T”->”X” Reading multiple values by Primary Key Reads routed efficiently to specific chunks in range t u v w x z1 z2 Monday, 4 June 12
  41. Adding MongoS Web Server Service #2 Application Server JSON JSON

    JSON JSON MongoS Service #1 Application Server SQL Monday, 4 June 12
  42. Online / Offline Web Server Application Server JSON JSON JSON

    JSON Online Offline Service #2 MongoS Monday, 4 June 12
  43. MongoDB & Hadoop config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 read config Hadoop mapper Hadoop mapper read config Monday, 4 June 12
  44. MongoDB & Hadoop Hadoop config config config Shard 1 Shard

    2 Shard 3 Shard 4 5 9 6 10 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 mapper Hadoop mapper 1 2 2 1 Monday, 4 June 12
  45. MongoDB & Hadoop Hadoop config config config Shard 1 Shard

    2 Shard 3 Shard 4 5 9 6 10 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 mapper Hadoop mapper 1 2 2 1 Monday, 4 June 12
  46. MongoDB Hadoop Adapter public void map(LongWritable key, Text value, Context

    context) throws ..{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } public void map(Object key, BSONObject value, Context context ) throws ....{ StringTokenizer itr = new StringTokenizer( value.get( "line" ).toString() ); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } Classic Hadoop Word Count - Map Monday, 4 June 12
  47. MongoDB Hadoop Adapter is the same code! public void reduce(

    Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException{ int sum = 0; for ( final IntWritable val : values ){ sum += val.get(); } context.write( key, new IntWritable(sum)); } Classic Hadoop Word Count - Reduce Monday, 4 June 12
  48. New Hybrid World Application Server Web Server Service #1 Application

    Server ... SQL JSON JSON JSON JSON Online Offline Service #2 MongoS Monday, 4 June 12