Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Webinar - High Volume Data Feeds

D8fc2580cfaca035f666d9e4ee79a7f7?s=47 mongodb
August 02, 2012
1.7k

Webinar - High Volume Data Feeds

Ingesting large streams of data such as server logs, telemetry data, stock market data, or social media status updates requiers a storage layer that's capable of keeping up with a high volume of writes. In this session, we will cover how MongoDB's scale out architecture and fast write performance make it a perfect fit for storing and processing such large volume data feeds.
Promotion Information

D8fc2580cfaca035f666d9e4ee79a7f7?s=128

mongodb

August 02, 2012
Tweet

Transcript

  1. 1 August  2012   High  Volume  Data  Feeds  

  2. 2 •  Brief  overview  of  MongoDB     •  Challenges

     for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda
  3. 3 •  Brief  overview  of  MongoDB     •  Challenges

     for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda
  4. 4 Volume  and  Type   of  Data   Agile  Development

      •  Systems  scaling  horizontally,  not   verGcally   •  Commodity  servers   •  Cloud  CompuGng   •  Trillions  of  records   •  10’s  of  millions  of   queries  per  second   •  Volume  of  data   •  Semi-­‐structured  and   unstructured  data   •  IteraGve  &  conGnuous   •  New  and  emerging  Apps   New  Architectures  
  5. 5 Increases  complexity   lowering  ProducGvity   Costs   Cost

     of  database  increases   •  Increased  database  licensing  cost   •  VerGcal,  not  horizontal,  scaling   •  High  cost  of  SAN   Developer  producGvity  decreases   •  Needed  to  add  new  soPware  layers  of   ORM,  Caching,  Sharding,  and   Message  Queue   •  Polymorphic,  semi-­‐structured  and   unstructured  data  not  well  supported  
  6. 6 •  Document-­‐oriented  Storage   •  Based  on  JSON  Documents

      •  Schema-­‐less   •  Scalable  Architecture   •  Auto-­‐sharding   •  ReplicaGon  &  high  availability   •  Open  source,  wriUen  in  C++   •  Key  Features  Include:   •  Full  featured  indexes   •  Query  language   •  Map/Reduce  &  aggregaGon  
  7. 7 shard mongos mongos mongos config config config mongod mongod

    mongod shard mongod mongod mongod shard mongod mongod mongod
  8. 8 General Purpose Easy to Use Fast & Scalable Multiple

    Data interfaces Full featured indexes Rich data model Simple to setup and manage Native language drivers in all popular languages Easy mapping to object oriented code Dynamically add / remove capacity with no downtime Auto-sharding built in Operates at in- memory speed wherever possible
  9. 9 •  Brief  overview  of  MongoDB     •  Challenges

     for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda
  10. 10 Server metrics Social media Financial data Web click stream

  11. 11 Challenges   •  ConGnuous  arrival  of  data   • 

    Costly  to  scale  disks  to   accommodate  high  rates  of   small  writes   •  Can’t  apply  back  pressure  to   the  feed   Storage Event Event Event Event Event Event Event Event
  12. 12 Challenges   •  Adding  more  storage  over  Gme  

    •  Aging  out  data  that’s  no  longer  needed   •  Minimizing  resource  overhead  of  “cold”  data   Fast Storage Archival Storage Recent Data Old Data Add Capacity
  13. 13 Challenges   •  Data  in  feed  can  evolve  over

      Gme   •  Can’t  take  system  down   when  format  changes   a=1 b=2 a=3 b=4 a=5 b=6 c=7 “c” added to records a=‘foo’ b=8 c=9 “a” changed to a string time
  14. 14 Challenges   •  Query  and  filter  data   without

     transformaGon     •  Low  latency  access  to  data   •  Workload  isolaGon   Storage Client Data Feed Queries Writes
  15. 15 •  Brief  overview  of  MongoDB     •  Challenges

     for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda
  16. 16 shard mongos shard shard Event Event Event Event Event

    •  Spread writes across multiple shards •  Linearly scale write capacity of cluster
  17. 17 Server •  Writes  buffered  in  RAM  and  periodically  wriUen

     to  disk   •  Asynchronous  writes  decouple  app  from  storage   RAM Disk ok
  18. 18 •  RAM  acts  as  LRU  cache     • 

    Recent  data  is  in  memory   •  Old  data  is  on  disk   RAM Disk
  19. 19 •  Accommodate  changes  in  feed  protocol     • 

    Zero  downGme  for  feed  protocol  upgrades   >  db.events.save(  {  a:1,  b:2  }  )   >  db.events.save(  {  a:3,  b:4  }  )     >  db.events.save(  {  a:5,  b:6,  c:  7}  )     >  db.events.save(  {  a:”foo”,  b:8,  c:9  }  )   >  db.events.find()     {  "_id"  :  ObjectId("501a2e263520cae8d164eabd"),  "a"  :  1,  "b"  :  2  }   {  "_id"  :  ObjectId("501a2e263520cae8d164eabe"),  "a"  :  3,  "b"  :  4  }   {  "_id"  :  ObjectId("501a2e263520cae8d164eabf"),  "a"  :  5,  "b"  :  6,  "c"  :  7  }   {  "_id"  :  ObjectId("501a2e443520cae8d164eac0"),  "a"  :  "foo",  "b"  :  8,  "c"  :  9  }    
  20. 20 •  Writes  always  go  to  primary   of  shard

      •  Queries  can  be  send  to  only   secondaries  with  a  read   preference   •  Tags  can  be  used  to  isolate   workloads  to  different   replicas   shard mongod (primary) mongod (secondary) mongod (secondary) writes queries mongod (secondary)
  21. 21 •  Brief  overview  of  MongoDB     •  Challenges

     for  high  volume   data  feeds   •  How  you  can  use  MongoDB  to   solve  them     •  Examples  of  real  world   scenarios   Agenda
  22. 22 §  Analyze  a  staggering  amount  of   data  for

     a  system  build  on   conGnuous  stream  of  high-­‐ quality  text  pulled  from  online   sources   §  Adding  too  much  data  too   quickly  resulted  in  outages;   tables  locked  for  tens  of   seconds  during  inserts   §  IniGally  launched  enGrely  on   MySQL  but  quickly  hit   performance  road  blocks     Problem Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  drama?cally  smaller.   Since  we  don’t  spend  ?me  worrying  about  the  database,  we  can  spend  more  ?me  wri?ng  code  for  our   applica?on.  -­‐Tony  Tam,  Vice  President  of  Engineering  and  Technical  Co-­‐founder   §  Migrated  5  billion  records  in  a   single  day  with  zero  downGme   §  MongoDB  powers  every   website  requests:  20m  API  calls   per  day   §  Ability  to  eliminated   memcached  layer,  creaGng  a   simplified  system  that  required   fewer  resources  and  was  less   prone  to  error.   Why  MongoDB   §  Reduced  code  by  75%   compared  to  MySQL   §  Fetch  Gme  cut  from  400ms  to   60ms   §  Sustained  insert  speed  of  8k   words  per  second,  with   frequent  bursts  of  up  to  50k  per   second   §  Significant  cost  savings  and  15%   reducGon  in  servers     Impact   Wordnik  uses  MongoDB  as  the  foundaGon  for  its  “live”  dicGonary  that  stores  its  enGre    text  corpus  –  3.5T  of  data  in  20  billion  records  
  23. 23 §  Intuit  hosts  more  than  500,000   websites  

    §  wanted  to  collect  and  analyze   data  to  recommend  conversion   and  lead  generaGon   improvements  to  customers.   §  With  10  years  worth  of  user   data,  it  took  several  days  to   process  the  informaGon  using  a   relaGonal  database.   Problem §  Cope  with  high  rate  of   clickstream  traffic   §  Easy  to  build  new  features  and   extend  the  product   §  Large  community  provided   support  and  responsiveness,   even  without  commercial   support  contract   Why  MongoDB   §  In  one  week  Intuit  was  able  to   become  proficient  in  MongoDB   development   §  Developed  applicaGon  features   more  quickly  for  MongoDB  than   for  relaGonal  databases   §  MongoDB  was  2.5  Jmes  faster   than  MySQL     Impact   Intuit  relies  on  a  MongoDB-­‐powered  real-­‐Jme  analyJcs  tool  for  small  businesses  to   derive  interesJng  and  acJonable  paMerns  from  their  customers’  website  traffic   We  did  a  prototype  for  one  week,  and  within  one  week  we  had  made  big  progress.  Very  big  progress.  It   was  so  amazing  that  we  decided,  “Let’s  go  with  this.”  -­‐Nirmala  Ranganathan,  Intuit  
  24. 24 More  info:     hMp://10gen.com/use-­‐case/high-­‐volume-­‐data-­‐feeds   Thanks!