Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Webinar - High Volume Data Feeds

mongodb
August 02, 2012
1.7k

Webinar - High Volume Data Feeds

Ingesting large streams of data such as server logs, telemetry data, stock market data, or social media status updates requiers a storage layer that's capable of keeping up with a high volume of writes. In this session, we will cover how MongoDB's scale out architecture and fast write performance make it a perfect fit for storing and processing such large volume data feeds.
Promotion Information

mongodb

August 02, 2012
Tweet

Transcript

  1. 1
    August  2012  
    High  Volume  Data  Feeds  

    View Slide

  2. 2
    •  Brief  overview  of  MongoDB    
    •  Challenges  for  high  volume  
    data  feeds  
    •  How  you  can  use  MongoDB  to  
    solve  them    
    •  Examples  of  real  world  
    scenarios  
    Agenda

    View Slide

  3. 3
    •  Brief  overview  of  MongoDB    
    •  Challenges  for  high  volume  
    data  feeds  
    •  How  you  can  use  MongoDB  to  
    solve  them    
    •  Examples  of  real  world  
    scenarios  
    Agenda

    View Slide

  4. 4
    Volume  and  Type  
    of  Data  
    Agile  Development  
    •  Systems  scaling  horizontally,  not  
    verGcally  
    •  Commodity  servers  
    •  Cloud  CompuGng  
    •  Trillions  of  records  
    •  10’s  of  millions  of  
    queries  per  second  
    •  Volume  of  data  
    •  Semi-­‐structured  and  
    unstructured  data  
    •  IteraGve  &  conGnuous  
    •  New  and  emerging  Apps  
    New  Architectures  

    View Slide

  5. 5
    Increases  complexity  
    lowering  ProducGvity  
    Costs  
    Cost  of  database  increases  
    •  Increased  database  licensing  cost  
    •  VerGcal,  not  horizontal,  scaling  
    •  High  cost  of  SAN  
    Developer  producGvity  decreases  
    •  Needed  to  add  new  soPware  layers  of  
    ORM,  Caching,  Sharding,  and  
    Message  Queue  
    •  Polymorphic,  semi-­‐structured  and  
    unstructured  data  not  well  supported  

    View Slide

  6. 6
    •  Document-­‐oriented  Storage  
    •  Based  on  JSON  Documents  
    •  Schema-­‐less  
    •  Scalable  Architecture  
    •  Auto-­‐sharding  
    •  ReplicaGon  &  high  availability  
    •  Open  source,  wriUen  in  C++  
    •  Key  Features  Include:  
    •  Full  featured  indexes  
    •  Query  language  
    •  Map/Reduce  &  aggregaGon  

    View Slide

  7. 7
    shard
    mongos mongos mongos config
    config
    config
    mongod
    mongod
    mongod
    shard
    mongod
    mongod
    mongod
    shard
    mongod
    mongod
    mongod

    View Slide

  8. 8
    General
    Purpose
    Easy to
    Use
    Fast &
    Scalable
    Multiple Data
    interfaces
    Full featured
    indexes
    Rich data
    model
    Simple to setup
    and manage
    Native language
    drivers in all
    popular
    languages
    Easy mapping
    to object
    oriented code
    Dynamically
    add / remove
    capacity with no
    downtime
    Auto-sharding
    built in
    Operates at in-
    memory speed
    wherever
    possible

    View Slide

  9. 9
    •  Brief  overview  of  MongoDB    
    •  Challenges  for  high  volume  
    data  feeds  
    •  How  you  can  use  MongoDB  to  
    solve  them    
    •  Examples  of  real  world  
    scenarios  
    Agenda

    View Slide

  10. 10
    Server metrics Social media
    Financial data Web click stream

    View Slide

  11. 11
    Challenges  
    •  ConGnuous  arrival  of  data  
    •  Costly  to  scale  disks  to  
    accommodate  high  rates  of  
    small  writes  
    •  Can’t  apply  back  pressure  to  
    the  feed  
    Storage
    Event
    Event
    Event
    Event
    Event
    Event
    Event
    Event

    View Slide

  12. 12
    Challenges  
    •  Adding  more  storage  over  Gme  
    •  Aging  out  data  that’s  no  longer  needed  
    •  Minimizing  resource  overhead  of  “cold”  data  
    Fast Storage Archival Storage
    Recent Data Old Data
    Add Capacity

    View Slide

  13. 13
    Challenges  
    •  Data  in  feed  can  evolve  over  
    Gme  
    •  Can’t  take  system  down  
    when  format  changes  
    a=1 b=2
    a=3 b=4
    a=5 b=6 c=7 “c” added to records
    a=‘foo’ b=8 c=9 “a” changed to a string
    time

    View Slide

  14. 14
    Challenges  
    •  Query  and  filter  data  
    without  transformaGon    
    •  Low  latency  access  to  data  
    •  Workload  isolaGon  
    Storage
    Client
    Data Feed
    Queries
    Writes

    View Slide

  15. 15
    •  Brief  overview  of  MongoDB    
    •  Challenges  for  high  volume  
    data  feeds  
    •  How  you  can  use  MongoDB  to  
    solve  them    
    •  Examples  of  real  world  
    scenarios  
    Agenda

    View Slide

  16. 16
    shard
    mongos
    shard shard
    Event
    Event
    Event
    Event
    Event
    •  Spread writes across
    multiple shards
    •  Linearly scale write
    capacity of cluster

    View Slide

  17. 17
    Server
    •  Writes  buffered  in  RAM  and  periodically  wriUen  to  disk  
    •  Asynchronous  writes  decouple  app  from  storage  
    RAM Disk
    ok

    View Slide

  18. 18
    •  RAM  acts  as  LRU  cache    
    •  Recent  data  is  in  memory  
    •  Old  data  is  on  disk   RAM
    Disk

    View Slide

  19. 19
    •  Accommodate  changes  in  feed  protocol    
    •  Zero  downGme  for  feed  protocol  upgrades  
    >  db.events.save(  {  a:1,  b:2  }  )  
    >  db.events.save(  {  a:3,  b:4  }  )    
    >  db.events.save(  {  a:5,  b:6,  c:  7}  )    
    >  db.events.save(  {  a:”foo”,  b:8,  c:9  }  )  
    >  db.events.find()    
    {  "_id"  :  ObjectId("501a2e263520cae8d164eabd"),  "a"  :  1,  "b"  :  2  }  
    {  "_id"  :  ObjectId("501a2e263520cae8d164eabe"),  "a"  :  3,  "b"  :  4  }  
    {  "_id"  :  ObjectId("501a2e263520cae8d164eabf"),  "a"  :  5,  "b"  :  6,  "c"  :  7  }  
    {  "_id"  :  ObjectId("501a2e443520cae8d164eac0"),  "a"  :  "foo",  "b"  :  8,  "c"  :  9  }  
     

    View Slide

  20. 20
    •  Writes  always  go  to  primary  
    of  shard  
    •  Queries  can  be  send  to  only  
    secondaries  with  a  read  
    preference  
    •  Tags  can  be  used  to  isolate  
    workloads  to  different  
    replicas  
    shard
    mongod
    (primary)
    mongod
    (secondary)
    mongod
    (secondary)
    writes
    queries
    mongod
    (secondary)

    View Slide

  21. 21
    •  Brief  overview  of  MongoDB    
    •  Challenges  for  high  volume  
    data  feeds  
    •  How  you  can  use  MongoDB  to  
    solve  them    
    •  Examples  of  real  world  
    scenarios  
    Agenda

    View Slide

  22. 22
    §  Analyze  a  staggering  amount  of  
    data  for  a  system  build  on  
    conGnuous  stream  of  high-­‐
    quality  text  pulled  from  online  
    sources  
    §  Adding  too  much  data  too  
    quickly  resulted  in  outages;  
    tables  locked  for  tens  of  
    seconds  during  inserts  
    §  IniGally  launched  enGrely  on  
    MySQL  but  quickly  hit  
    performance  road  blocks  
     
    Problem
    Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  drama?cally  smaller.  
    Since  we  don’t  spend  ?me  worrying  about  the  database,  we  can  spend  more  ?me  wri?ng  code  for  our  
    applica?on.  -­‐Tony  Tam,  Vice  President  of  Engineering  and  Technical  Co-­‐founder  
    §  Migrated  5  billion  records  in  a  
    single  day  with  zero  downGme  
    §  MongoDB  powers  every  
    website  requests:  20m  API  calls  
    per  day  
    §  Ability  to  eliminated  
    memcached  layer,  creaGng  a  
    simplified  system  that  required  
    fewer  resources  and  was  less  
    prone  to  error.  
    Why  MongoDB  
    §  Reduced  code  by  75%  
    compared  to  MySQL  
    §  Fetch  Gme  cut  from  400ms  to  
    60ms  
    §  Sustained  insert  speed  of  8k  
    words  per  second,  with  
    frequent  bursts  of  up  to  50k  per  
    second  
    §  Significant  cost  savings  and  15%  
    reducGon  in  servers  
     
    Impact  
    Wordnik  uses  MongoDB  as  the  foundaGon  for  its  “live”  dicGonary  that  stores  its  enGre  
     text  corpus  –  3.5T  of  data  in  20  billion  records  

    View Slide

  23. 23
    §  Intuit  hosts  more  than  500,000  
    websites  
    §  wanted  to  collect  and  analyze  
    data  to  recommend  conversion  
    and  lead  generaGon  
    improvements  to  customers.  
    §  With  10  years  worth  of  user  
    data,  it  took  several  days  to  
    process  the  informaGon  using  a  
    relaGonal  database.  
    Problem
    §  Cope  with  high  rate  of  
    clickstream  traffic  
    §  Easy  to  build  new  features  and  
    extend  the  product  
    §  Large  community  provided  
    support  and  responsiveness,  
    even  without  commercial  
    support  contract  
    Why  MongoDB  
    §  In  one  week  Intuit  was  able  to  
    become  proficient  in  MongoDB  
    development  
    §  Developed  applicaGon  features  
    more  quickly  for  MongoDB  than  
    for  relaGonal  databases  
    §  MongoDB  was  2.5  Jmes  faster  
    than  MySQL    
    Impact  
    Intuit  relies  on  a  MongoDB-­‐powered  real-­‐Jme  analyJcs  tool  for  small  businesses  to  
    derive  interesJng  and  acJonable  paMerns  from  their  customers’  website  traffic  
    We  did  a  prototype  for  one  week,  and  within  one  week  we  had  made  big  progress.  Very  big  progress.  It  
    was  so  amazing  that  we  decided,  “Let’s  go  with  this.”  -­‐Nirmala  Ranganathan,  Intuit  

    View Slide

  24. 24
    More  info:    
    hMp://10gen.com/use-­‐case/high-­‐volume-­‐data-­‐feeds  
    Thanks!  

    View Slide