Slide 1

Slide 1 text

1 Jared  Rosoff   @forjared   Open  source,  high  performance  database  

Slide 2

Slide 2 text

2 Challenges   •  Variably  typed  data   •  Complex  objects   •  High  transac@on  rates   •  Large  data  size   •  High  availability   Opportuni1es   •  Agility   •  Cost  reduc@on   •  Simplifica@on  

Slide 3

Slide 3 text

3 VOLUME  AND  TYPE   OF  DATA   AGILE  DEVELOPMENT   •  Systems  scaling  horizontally,     not  ver@cally   •  Commodity  servers   •  Cloud  Compu@ng   •  Trillions  of  records   •  10’s  of  millions  of  queries   per  second   •  Volume  of  data   •  Semi-­‐structured  and   unstructured  data   •  Itera@ve  and  con@nuous   •  New  and  emerging  apps   NEW  ARCHITECTURES  

Slide 4

Slide 4 text

4 DEVELOPER  PRODUCTIVITY  DECREASES   •  Needed  to  add  new  soPware  layers  of  ORM,  Caching,   Sharding  and  Message  Queue   •  Polymorphic,  semi-­‐structured  and  unstructured  data   not  well  supported   COST  OF  DATABASE  INCREASES   •  Increased  database  licensing  cost   •  Ver@cal,  not  horizontal,  scaling   •  High  cost  of  SAN   LAUNCH   +30  DAYS   +90  DAYS   +6  MONTHS   +1  YEAR   PROJECT   START   DENORMALIZE   DATA  MODEL   STOP  USING   JOINS   CUSTOM   CACHING  LAYER   CUSTOM   SHARDING  

Slide 5

Slide 5 text

5 How  we  got  here  

Slide 6

Slide 6 text

6 •  Variably  typed  data   •  Complex  data  objects   •  High  transac@on  rates   •  Large  data  size   •  High  availability     •  Agile  development    

Slide 7

Slide 7 text

7 •  Metadata  management   •  EAV  an@-­‐pa^ern   •  Sparse  table  an@-­‐ pa^ern  

Slide 8

Slide 8 text

8 En1ty   AFribute   Value   1   Type   Book   1   Title   Inside  Intel   1   Author   Andy  Grove   2   Type   Laptop   2   Manufactur er   Dell   2   Model   Inspiron   2   RAM   8gb   3   Type   Cereal   Difficult to query Difficult to index Expensive to query

Slide 9

Slide 9 text

9 Type   Title   Author   Manufacturer   Model   RAM   Screen   Width   Book   Inside  Intel   Andy  Grove   Laptop   Dell   Inspiron   8gb   12”   TV   Panasonic   Viera   52”   MP3   Margin  Walker   Fugazi   Dischord   Constant schema changes Space inefficient Overloaded fields

Slide 10

Slide 10 text

10 Type Manufacturer Model RAM Screen Width Laptop Dell Inspiron 8gb 12” Manufacturer Model Screen Width Panasonic Viera 52” Title Author Manufacturer Margin Walker Fugazi Dischord Querying is difficult Hard to add more types

Slide 11

Slide 11 text

11 Complex objects

Slide 12

Slide 12 text

12 Challenges   •  Constant  load  from  client   •  Far  in  excess  of  single  server   capacity   •  Can  never  take  the  system   down   Database Query Query Query Query Query Query Query Query

Slide 13

Slide 13 text

13 Challenges   •  Adding  more  storage  over  @me   •  Aging  out  data  that’s  no  longer  needed   •  Minimizing  resource  overhead  of  “cold”  data   Fast Storage Archival Storage Recent Data Old Data Add Capacity

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

15

Slide 16

Slide 16 text

16 •  Rigid schemas Variably typed data •  Normalization can be hard •  Dependent on joins Complex Objects •  Vertical scaling •  Poor data locality High transaction rate •  Difficult to maintain consistency & HA •  HA a bolt-on to many RDBMS High Availability •  Schema changes •  Monolithic data model Agile Development

Slide 17

Slide 17 text

17 A  new  data  model  

Slide 18

Slide 18 text

18   var  post  =  {  author:  “Jared”,                                        date:  new  Date(),                                        text:  “NoSQL  Now  2012”,                                        tags:  [“NoSQL”,  “MongoDB”]}     >  db.posts.save(post)  

Slide 19

Slide 19 text

19  >db.posts.find()            {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),              author  :  ”Jared",                date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",                text  :  ”NoSQL  Now  2012",                tags  :  [  ”NoSQL",  ”MongoDB"  ]  }                Notes:            -­‐  _id  is  unique,  but  can  be  anything  you’d  like  

Slide 20

Slide 20 text

20 Create  index  on  any  Field  in  Document          //      1  means  ascending,  -­‐1  means  descending          >db.posts.ensureIndex({author:  1})          >db.posts.find({author:  ’Jared'})              {  _id              :  ObjectId("4c4ba5c0672c685e5e8aabf3"),            author  :  ”Jared",              ...  }  

Slide 21

Slide 21 text

21 •  Condi@onal  Operators     –  $all,  $exists,  $mod,  $ne,  $in,  $nin,  $nor,  $or,  $size,  $type   –  $lt,  $lte,  $gt,  $gte     //  find  posts  with  any  tags     >  db.posts.find(  {tags:  {$exists:  true  }}  )     //  find  posts  matching  a  regular  expression   >  db.posts.find(  {author:  /^Jar*/i  }  )     //  count  posts  by  author   >  db.posts.find(  {author:  ‘Jared’}  ).count()  

Slide 22

Slide 22 text

22 •  $set,  $unset,  $inc,  $push,  $pushAll,  $pull,  $pullAll,  $bit   >  comment  =  {  author:  “Brendan”,                                                    date:  new  Date(),                                                  text:  “I  want  a  freakin  pony”}      >  db.posts.update(  {  _id:  “...”  },                                          $push:  {comments:  comment}  );    

Slide 23

Slide 23 text

23  {    _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),            author  :  ”Jared",          date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",            text  :  ”NoSQL  Now  2012",          tags  :  [  ”NoSQL",  ”MongoDB"  ],          comments  :  [    {      author  :  "Brendan",      date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",      text  :  ”I  want  a  freakin  pony"    }          ]}  

Slide 24

Slide 24 text

24 //  Index  nested  documents   >  db.posts.ensureIndex(  “comments.author”:1  )   Ø db.posts.find({‘comments.author’:’Brendan’})   //  Index  on  tags   >  db.posts.ensureIndex(  tags:  1)   >  db.posts.find(  {  tags:  ’MongoDB’  }  )     //  geospa@al  index   >  db.posts.ensureIndex(  “author.loca@on”:  “2d”  )   >  db.posts.find(  “author.loca@on”  :  {  $near  :  [22,42]  }  )    

Slide 25

Slide 25 text

25 db.posts.aggregate(      {  $project  :  {              author  :  1,              tags  :  1,      }  },      {  $unwind  :  “$tags”  },      {  $group  :  {              _id  :  {  tags  :  1  },              authors  :  {        $addToSet  :  “$author”        }  }  }   );  

Slide 26

Slide 26 text

26 It’s  highly  available  

Slide 27

Slide 27 text

27 Primary Secondary Secondary Asynchronous Replication Read Write Read Read Driver

Slide 28

Slide 28 text

28 Primary Secondary Secondary Read Read Driver

Slide 29

Slide 29 text

29 Primary Primary Secondary Read Write Read Automatic Leader Election Driver

Slide 30

Slide 30 text

30 Secondary Primary Secondary Read Write Read Read Driver

Slide 31

Slide 31 text

31 With  tunable   consistency  

Slide 32

Slide 32 text

32 Primary Secondary Secondary Read Write Driver

Slide 33

Slide 33 text

33 Primary Secondary Secondary Read Write Driver Read

Slide 34

Slide 34 text

34 Durability  

Slide 35

Slide 35 text

35 •  Fire  and  forget   •  Wait  for  error     •  Wait  for  fsync   •  Wait  for  journal  sync     •  Wait  for  replica@on  

Slide 36

Slide 36 text

36 Driver Primary write apply in memory

Slide 37

Slide 37 text

37 Driver Primary getLastError apply in memory write

Slide 38

Slide 38 text

38 Driver Primary getLastError apply in memory write j:true Write to journal

Slide 39

Slide 39 text

39 Driver Primary getLastError apply in memory write w:2 Secondary replicate

Slide 40

Slide 40 text

40 Value Meaning   Replicate  to  N  members  of  replica  set   “majority”   Replicate  to  a  majority  of  replica  set   members     Use  custom  error  mode  name  

Slide 41

Slide 41 text

41 { _id: “someSet”, members: [ { _id:0, host:”A”, tags: { dc: “ny”}}, { _id:1, host:”B”, tags: { dc: “ny”}}, { _id:2, host:”C”, tags: { dc: “sf”}}, { _id:3, host:”D”, tags: { dc: “sf”}}, { _id:4, host:”E”, tags: { dc: “cloud”}}, settings: { getLastErrorModes: { veryImportant: { dc: 3 }, sortOfImportant: { dc: 2 } } } } These are the modes you can use in write concern

Slide 42

Slide 42 text

42 •  Between  0..1000   •  Highest  member  that  is  up  to  date  wins     –  Up  to  date  ==  within  10  seconds  of  primary   •  If  a  higher  priority  member  catches  up,  it  will  force  elec@on   and  win     Primary priority = 3 Secondary priority = 2 Secondary priority = 1

Slide 43

Slide 43 text

43 •  Lags  behind  master  by  configurable  @me  delay     •  Automa@cally  hidden  from  clients   •  Protects  against  operator  errors   –  Accidentally  delete  database     –  Applica@on  corrupts  data  

Slide 44

Slide 44 text

44 •  Vote  in  elec@ons   •  Don’t  store  a  copy  of  data     •  Use  as  @e  breaker  

Slide 45

Slide 45 text

45 Data Center Primary

Slide 46

Slide 46 text

46 Data Center Primary Secondary Secondary Zone 1 Zone 2 Zone 3

Slide 47

Slide 47 text

47 Data Center Primary Secondary hidden = true Secondary backups Zone 1 Zone 2 Zone 3

Slide 48

Slide 48 text

48 Active Data Center Standby Data Center Primary priority = 1 Secondary priority = 1 Secondary priority = 0 Zone 1 Zone 2

Slide 49

Slide 49 text

49 West Coast DC Central DC East Coast DC Secondary priority = 1 Abiter Primary priority = 2 Secondary priority = 2 Secondary priority = 1 Zone 1 Zone 2 Zone 1 Zone 2

Slide 50

Slide 50 text

50 Sharding  

Slide 51

Slide 51 text

51 Shard client client client client mongos mongos config config config mongod mongod mongod Shard mongod mongod mongod Shard mongod mongod mongod Config Servers

Slide 52

Slide 52 text

52 { name: “Jared”, email: “[email protected]”, } { name: “Scott”, email: “[email protected]”, } { name: “Dan”, email: “[email protected]”, } > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} )

Slide 53

Slide 53 text

53 -∞ +∞

Slide 54

Slide 54 text

Slide 55

Slide 55 text

Slide 56

Slide 56 text

56 -∞ +∞ [email protected] [email protected] [email protected] Split! This is a chunk This is a chunk

Slide 57

Slide 57 text

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Slide 60

Slide 60 text

60 •  Stored  in  the  config  servers   •  Cached  in  MongoS     •  Used  to  route  requests  and  keep  cluster  balanced   Min Key Max Key Shard -­‐∞   [email protected]   1   [email protected]   [email protected]   1   [email protected]   sco^@10gen.com   1   sco^@10gen.com   +∞   1  

Slide 61

Slide 61 text

61 Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 17 21 13 18 22 14 19 23 15 20 24 16 29 33 25 30 34 26 31 35 27 32 36 28 41 45 37 42 46 38 43 47 39 44 48 40 mongos balancer config config config Chunks!

Slide 62

Slide 62 text

62 mongos balancer config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Imbalance Imbalance

Slide 63

Slide 63 text

63 mongos balancer Move chunk 1 to Shard 2 config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48

Slide 64

Slide 64 text

64 mongos balancer config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1

Slide 65

Slide 65 text

65 mongos balancer Chunk 1 now lives on Shard 2 config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48

Slide 66

Slide 66 text

66 By  Shard  Key   Routed   db.users.find( {email: “[email protected]”}) Sorted  by   shard  key   Routed  in  order   db.users.find().sort({email:-1}) Find  by  non   shard  key   Sca^er  Gather   db.users.find({state:”CA”}) Sorted  by   non  shard   key   Distributed  merge   sort   db.users.find().sort({state:1})

Slide 67

Slide 67 text

67 Inserts   Requires  shard   key   db.users.insert({ name: “Jared”, email: “[email protected]”}) Removes   Routed   db.users.delete({ email: “[email protected]”}) Sca^ered   db.users.delete({name: “Jared”}) Updates   Routed   db.users.update( {email: “[email protected]”}, {$set: { state: “CA”}}) Sca^ered   db.users.update( {state: “FZ”}, {$set:{ state: “CA”}}, false, true )

Slide 68

Slide 68 text

68 mongos Shard 1 Shard 2 Shard 3 1 2 3 4 1.  Query arrives at MongoS 2.  MongoS routes query to a single shard 3.  Shard returns results of query 4.  Results returned to client

Slide 69

Slide 69 text

69 mongos Shard 1 Shard 2 Shard 3 1 4 1.  Query  arrives  at   MongoS   2.  MongoS  broadcasts   query  to  all  shards   3.  Each  shard  returns   results  for  query   4.  Results  combined  and   returned  to  client   2 2 3 3 2 3

Slide 70

Slide 70 text

70 mongos Shard 1 Shard 2 Shard 3 1 3 6 1.  Query  arrives  at   MongoS   2.  MongoS  broadcasts   query  to  all  shards   3.  Each  shard  locally  sorts   results   4.  Results  returned  to   mongos     5.  MongoS  merge  sorts   individual  results   6.  Combined  sorted  result   returned  to  client   2 2 3 3 4 4 5 2 4

Slide 71

Slide 71 text

71 Use  cases  

Slide 72

Slide 72 text

72 User  Data  Management   High  Volume  Data  Feeds     w   Content  Management   Opera@onal  Intelligence   Meta  Data  Management  

Slide 73

Slide 73 text

73 •  More machines, more sensors, more data •  Variably structured Machine Generated Data •  High frequency trading Stock Market Data •  Multiple sources of data •  Each changes their format constantly Social Media Firehose

Slide 74

Slide 74 text

74 Data Sources Asynchronous writes Flexible document model can adapt to changes in sensor format Write to memory with periodic disk flush Data Sources Data Sources Data Sources Scale writes over multiple shards

Slide 75

Slide 75 text

75 •  Large volume of state about users •  Very strict latency requirements Ad Targeting •  Expose report data to millions of customers •  Report on large volumes of data •  Reports that update in real time Real time dashboards •  What are people talking about? Social Media Monitoring

Slide 76

Slide 76 text

76 Dashboards API Low latency reads Parallelize queries across replicas and shards In database aggregation Flexible schema adapts to changing input data Can use same cluster to collect, store, and report on data

Slide 77

Slide 77 text

77 §  Intuit  hosts  more  than  500,000   websites   §  wanted  to  collect  and  analyze   data  to  recommend  conversion   and  lead  genera@on   improvements  to  customers.   §  With  10  years  worth  of  user   data,  it  took  several  days  to   process  the  informa@on  using  a   rela@onal  database.   Problem §  Intuit  hosts  more  than  500,000   websites   §  wanted  to  collect  and  analyze   data  to  recommend  conversion   and  lead  genera@on   improvements  to  customers.   §  With  10  years  worth  of  user   data,  it  took  several  days  to   process  the  informa@on  using  a   rela@onal  database.   Why  MongoDB   §  In  one  week  Intuit  was  able  to   become  proficient  in  MongoDB   development   §  Developed  applica@on  features   more  quickly  for  MongoDB  than   for  rela@onal  databases   §  MongoDB  was  2.5  1mes  faster   than  MySQL     Impact   Intuit  relies  on  a  MongoDB-­‐powered  real-­‐1me  analy1cs  tool  for  small  businesses  to   derive  interes1ng  and  ac1onable  paFerns  from  their  customers’  website  traffic   We  did  a  prototype  for  one  week,  and  within  one  week  we  had  made  big  progress.  Very  big  progress.  It   was  so  amazing  that  we  decided,  “Let’s  go  with  this.”  -­‐Nirmala  Ranganathan,  Intuit  

Slide 78

Slide 78 text

78 1 2 3 See Ad See Ad 4 Click Convert {  cookie_id:  “1234512413243”,      advertiser:{              apple:  {                  actions:  [                        {  impression:  ‘ad1’,  time:  123  },                        {  impression:  ‘ad2’,  time:  232  },                        {  click:  ‘ad2’,  time:  235  },                        {  add_to_cart:  ‘laptop’,                            sku:  ‘asdf23f’,                              time:  254  },                        {  purchase:  ‘laptop’,  time:  354  }                    ]              }      }   }   Rich profiles collecting multiple complex actions Scale out to support high throughput of activities tracked Indexing and querying to support matching, frequency capping Dynamic schemas make it easy to track vendor specific attributes

Slide 79

Slide 79 text

79 •  Meta data about artifacts •  Content in the library Data Archiving •  Have data sources that you don’t have access to •  Stores meta-data on those stores and figure out which ones have the content Information discovery •  Retina scans •  Finger prints Biometrics

Slide 80

Slide 80 text

80 {  ISBN:  “00e8da9b”,      type:  “Book”,      country:  “Egypt”,      title:  “Ancient  Egypt”   }   {  type:  “Artefact”,      medium:  “Ceramic”,      country:  “Egypt”,      year:  “3000  BC”   }   Flexible data model for similar, but different objects Indexing and rich query API for easy searching and sorting db.archives.        find({  “country”:  “Egypt”  });  

Slide 81

Slide 81 text

81 §  Managing  20TB  of  data  (six   billion  images  for  millions  of   customers)  par@@oning  by   func@on.   §  Home-­‐grown  key  value  store  on   top  of  their  Oracle  database   offered  sub-­‐par  performance   §  Codebase  for  this  hybrid  store   became  hard  to  manage   §  High  licensing,  HW  costs     Problem §  JSON-­‐based  data  structure   §  Provided  Shu^erfly  with  an   agile,  high  performance,   scalable  solu@on  at  a  low  cost.   §  Works  seamlessly  with   Shu^erfly’s  services-­‐based   architecture   Why  MongoDB   §  500%  cost  reduc@on  and  900%   performance  improvement     compared  to  previous  Oracle   implementa@on   §  Accelerated  @me-­‐to-­‐market  for   nearly  a  dozen  projects  on   MongoDB   §  Improved  Performance  by   reducing  average  latency  for   inserts  from  400ms  to  2ms.     Impact   ShuFerfly  uses  MongoDB  to  safeguard  more  than  six  billion  images  for  millions  of   customers  in  the  form  of  photos  and  videos,  and  turn  everyday  pictures  into  keepsakes   The  “really  killer  reason”  for  using  MongoDB  is  its  rich  JSON-­‐based  data  structure,  which  offers  ShuKerfly   an  agile  approach  to  develop  soNware.    With  MongoDB,  the  ShuKerfly  team  can  quickly  develop  and   deploy  new  applicaPons,  especially  Web  2.0  and  social  features.  -­‐Kenny  Gorman,  Director  of  Data  Services    

Slide 82

Slide 82 text

82 •  Comments and user generated content •  Personalization of content, layout News Site •  Generate layout on the fly for each device that connects •  No need to cache static pages Multi-Device rendering •  Store large objects •  Simple modeling of metadata Sharing

Slide 83

Slide 83 text

83 {  camera:  “Nikon  d4”,      location:  [  -­‐122.418333,  37.775  ]     }   {  camera:  “Canon  5d  mkII”,      people:  [  “Jim”,  “Carol”  ],        taken_on:  ISODate("2012-­‐03-­‐07T18:32:35.002Z")   }   {  origin:  “facebook.com/photos/xwdf23fsdf”,      license:  “Creative  Commons  CC0”,        size:  {              dimensions:  [  124,  52  ],            units:  “pixels”      }   }   Flexible data model for similar, but different objects Horizontal scalability for large data sets Geo spatial indexing for location based searches GridFS for large object storage

Slide 84

Slide 84 text

84 §  Analyze  a  staggering  amount  of   data  for  a  system  build  on   con@nuous  stream  of  high-­‐ quality  text  pulled  from  online   sources   §  Adding  too  much  data  too   quickly  resulted  in  outages;   tables  locked  for  tens  of   seconds  during  inserts   §  Ini@ally  launched  en@rely  on   MySQL  but  quickly  hit   performance  road  blocks     Problem Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  dramaPcally  smaller.   Since  we  don’t  spend  Pme  worrying  about  the  database,  we  can  spend  more  Pme  wriPng  code  for  our   applicaPon.  -­‐Tony  Tam,  Vice  President  of  Engineering  and  Technical  Co-­‐founder   §  Migrated  5  billion  records  in  a   single  day  with  zero  down@me   §  MongoDB  powers  every   website  requests:  20m  API  calls   per  day   §  Ability  to  eliminated   memcached  layer,  crea@ng  a   simplified  system  that  required   fewer  resources  and  was  less   prone  to  error.   Why  MongoDB   §  Reduced  code  by  75%   compared  to  MySQL   §  Fetch  @me  cut  from  400ms  to   60ms   §  Sustained  insert  speed  of  8k   words  per  second,  with   frequent  bursts  of  up  to  50k  per   second   §  Significant  cost  savings  and  15%   reduc@on  in  servers     Impact   Wordnik  uses  MongoDB  as  the  founda@on  for  its  “live”  dic@onary  that  stores  its  en@re    text  corpus  –  3.5T  of  data  in  20  billion  records  

Slide 85

Slide 85 text

85 •  Scale out to large graphs •  Easy to search and process Social Graphs •  Authentication, Authorization and Accounting Identity Management

Slide 86

Slide 86 text

86 Social Graph Documents enable disk locality of all profile data for a user Sharding partitions user profiles across available servers Native support for Arrays makes it easy to store connections inside user profile

Slide 87

Slide 87 text

87 Review  

Slide 88

Slide 88 text

88 •  Variety,  Velocity  and  Volume  make  it  difficult     •  Documents  are  easier  for  many  use  cases  

Slide 89

Slide 89 text

89 •  Distributed  by  default   •  High  availability   •  Cloud  deployment  

Slide 90

Slide 90 text

90 •  Document  oriented  data  model     •  Highly  available  deployments   •  Strong  consistency  model   •  Horizontally  scalable  architecture