Slide 1

Slide 1 text

1 August  2012   High  Volume  Data  Feeds  

Slide 2

Slide 2 text

2 •  Brief  overview  of  MongoDB     •  Challenges  for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda

Slide 3

Slide 3 text

3 •  Brief  overview  of  MongoDB     •  Challenges  for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda

Slide 4

Slide 4 text

4 Volume  and  Type   of  Data   Agile  Development   •  Systems  scaling  horizontally,  not   verGcally   •  Commodity  servers   •  Cloud  CompuGng   •  Trillions  of  records   •  10’s  of  millions  of   queries  per  second   •  Volume  of  data   •  Semi-­‐structured  and   unstructured  data   •  IteraGve  &  conGnuous   •  New  and  emerging  Apps   New  Architectures  

Slide 5

Slide 5 text

5 Increases  complexity   lowering  ProducGvity   Costs   Cost  of  database  increases   •  Increased  database  licensing  cost   •  VerGcal,  not  horizontal,  scaling   •  High  cost  of  SAN   Developer  producGvity  decreases   •  Needed  to  add  new  soPware  layers  of   ORM,  Caching,  Sharding,  and   Message  Queue   •  Polymorphic,  semi-­‐structured  and   unstructured  data  not  well  supported  

Slide 6

Slide 6 text

6 •  Document-­‐oriented  Storage   •  Based  on  JSON  Documents   •  Schema-­‐less   •  Scalable  Architecture   •  Auto-­‐sharding   •  ReplicaGon  &  high  availability   •  Open  source,  wriUen  in  C++   •  Key  Features  Include:   •  Full  featured  indexes   •  Query  language   •  Map/Reduce  &  aggregaGon  

Slide 7

Slide 7 text

7 shard mongos mongos mongos config config config mongod mongod mongod shard mongod mongod mongod shard mongod mongod mongod

Slide 8

Slide 8 text

8 General Purpose Easy to Use Fast & Scalable Multiple Data interfaces Full featured indexes Rich data model Simple to setup and manage Native language drivers in all popular languages Easy mapping to object oriented code Dynamically add / remove capacity with no downtime Auto-sharding built in Operates at in- memory speed wherever possible

Slide 9

Slide 9 text

9 •  Brief  overview  of  MongoDB     •  Challenges  for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda

Slide 10

Slide 10 text

10 Server metrics Social media Financial data Web click stream

Slide 11

Slide 11 text

11 Challenges   •  ConGnuous  arrival  of  data   •  Costly  to  scale  disks  to   accommodate  high  rates  of   small  writes   •  Can’t  apply  back  pressure  to   the  feed   Storage Event Event Event Event Event Event Event Event

Slide 12

Slide 12 text

12 Challenges   •  Adding  more  storage  over  Gme   •  Aging  out  data  that’s  no  longer  needed   •  Minimizing  resource  overhead  of  “cold”  data   Fast Storage Archival Storage Recent Data Old Data Add Capacity

Slide 13

Slide 13 text

13 Challenges   •  Data  in  feed  can  evolve  over   Gme   •  Can’t  take  system  down   when  format  changes   a=1 b=2 a=3 b=4 a=5 b=6 c=7 “c” added to records a=‘foo’ b=8 c=9 “a” changed to a string time

Slide 14

Slide 14 text

14 Challenges   •  Query  and  filter  data   without  transformaGon     •  Low  latency  access  to  data   •  Workload  isolaGon   Storage Client Data Feed Queries Writes

Slide 15

Slide 15 text

15 •  Brief  overview  of  MongoDB     •  Challenges  for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda

Slide 16

Slide 16 text

16 shard mongos shard shard Event Event Event Event Event •  Spread writes across multiple shards •  Linearly scale write capacity of cluster

Slide 17

Slide 17 text

17 Server •  Writes  buffered  in  RAM  and  periodically  wriUen  to  disk   •  Asynchronous  writes  decouple  app  from  storage   RAM Disk ok

Slide 18

Slide 18 text

18 •  RAM  acts  as  LRU  cache     •  Recent  data  is  in  memory   •  Old  data  is  on  disk   RAM Disk

Slide 19

Slide 19 text

19 •  Accommodate  changes  in  feed  protocol     •  Zero  downGme  for  feed  protocol  upgrades   >  db.events.save(  {  a:1,  b:2  }  )   >  db.events.save(  {  a:3,  b:4  }  )     >  db.events.save(  {  a:5,  b:6,  c:  7}  )     >  db.events.save(  {  a:”foo”,  b:8,  c:9  }  )   >  db.events.find()     {  "_id"  :  ObjectId("501a2e263520cae8d164eabd"),  "a"  :  1,  "b"  :  2  }   {  "_id"  :  ObjectId("501a2e263520cae8d164eabe"),  "a"  :  3,  "b"  :  4  }   {  "_id"  :  ObjectId("501a2e263520cae8d164eabf"),  "a"  :  5,  "b"  :  6,  "c"  :  7  }   {  "_id"  :  ObjectId("501a2e443520cae8d164eac0"),  "a"  :  "foo",  "b"  :  8,  "c"  :  9  }    

Slide 20

Slide 20 text

20 •  Writes  always  go  to  primary   of  shard   •  Queries  can  be  send  to  only   secondaries  with  a  read   preference   •  Tags  can  be  used  to  isolate   workloads  to  different   replicas   shard mongod (primary) mongod (secondary) mongod (secondary) writes queries mongod (secondary)

Slide 21

Slide 21 text

21 •  Brief  overview  of  MongoDB     •  Challenges  for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda

Slide 22

Slide 22 text

22 §  Analyze  a  staggering  amount  of   data  for  a  system  build  on   conGnuous  stream  of  high-­‐ quality  text  pulled  from  online   sources   §  Adding  too  much  data  too   quickly  resulted  in  outages;   tables  locked  for  tens  of   seconds  during  inserts   §  IniGally  launched  enGrely  on   MySQL  but  quickly  hit   performance  road  blocks     Problem Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  drama?cally  smaller.   Since  we  don’t  spend  ?me  worrying  about  the  database,  we  can  spend  more  ?me  wri?ng  code  for  our   applica?on.  -­‐Tony  Tam,  Vice  President  of  Engineering  and  Technical  Co-­‐founder   §  Migrated  5  billion  records  in  a   single  day  with  zero  downGme   §  MongoDB  powers  every   website  requests:  20m  API  calls   per  day   §  Ability  to  eliminated   memcached  layer,  creaGng  a   simplified  system  that  required   fewer  resources  and  was  less   prone  to  error.   Why  MongoDB   §  Reduced  code  by  75%   compared  to  MySQL   §  Fetch  Gme  cut  from  400ms  to   60ms   §  Sustained  insert  speed  of  8k   words  per  second,  with   frequent  bursts  of  up  to  50k  per   second   §  Significant  cost  savings  and  15%   reducGon  in  servers     Impact   Wordnik  uses  MongoDB  as  the  foundaGon  for  its  “live”  dicGonary  that  stores  its  enGre    text  corpus  –  3.5T  of  data  in  20  billion  records  

Slide 23

Slide 23 text

23 §  Intuit  hosts  more  than  500,000   websites   §  wanted  to  collect  and  analyze   data  to  recommend  conversion   and  lead  generaGon   improvements  to  customers.   §  With  10  years  worth  of  user   data,  it  took  several  days  to   process  the  informaGon  using  a   relaGonal  database.   Problem §  Cope  with  high  rate  of   clickstream  traffic   §  Easy  to  build  new  features  and   extend  the  product   §  Large  community  provided   support  and  responsiveness,   even  without  commercial   support  contract   Why  MongoDB   §  In  one  week  Intuit  was  able  to   become  proficient  in  MongoDB   development   §  Developed  applicaGon  features   more  quickly  for  MongoDB  than   for  relaGonal  databases   §  MongoDB  was  2.5  Jmes  faster   than  MySQL     Impact   Intuit  relies  on  a  MongoDB-­‐powered  real-­‐Jme  analyJcs  tool  for  small  businesses  to   derive  interesJng  and  acJonable  paMerns  from  their  customers’  website  traffic   We  did  a  prototype  for  one  week,  and  within  one  week  we  had  made  big  progress.  Very  big  progress.  It   was  so  amazing  that  we  decided,  “Let’s  go  with  this.”  -­‐Nirmala  Ranganathan,  Intuit  

Slide 24

Slide 24 text

24 More  info:     hMp://10gen.com/use-­‐case/high-­‐volume-­‐data-­‐feeds   Thanks!