Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB: Architecture and Use Cases

Norberto
November 23, 2013

MongoDB: Architecture and Use Cases

This is my talk delivered at dbcon13 in Dubai Knowledge Center

Norberto

November 23, 2013
Tweet

More Decks by Norberto

Other Decks in Programming

Transcript

  1. Global Community 5,000,000+ MongoDB Downloads 100,000+ Online Education Registrants 20,000+

    MongoDB User Group Members 20,000+ MongoDB Days Attendees 20,000+ MongoDB Management Service (MMS) Users
  2. NoSQL Features Flexible Data Models •  Lists, embedded objects • 

    Sparse data •  Semi-structured data •  Agile development High Data Throughput • Reads • Writes Big Data •  Aggregate Data Size •  Number of Objects Low Latency •  For reads and writes •  Millisecond Latency Cloud Computing •  Runs everywhere •  No special hardware Commodity Hardware •  Ethernet •  Local data storage •  JSON Based •  Dynamic Schemas •  Replica Sets to scale reads •  Sharding to scale writes •  1000s of shards in a single DB •  Data partitioning •  Designed for “typical” OS and local file system •  Scale-out to overcome hardware limitations •  In-memory cache •  Scale-out working set
  3. Document Data Model Relational MongoDB { ! first_name: ‘Paul’,! surname:

    ‘Miller’,! city: ‘London’,! location: [45.123,47.232],! cars: [ ! { model: ‘Bentley’,! year: 1973,! value: 100000, … },! { model: ‘Rolls Royce’,! year: 1965,! value: 330000, … }! }! }!
  4. Terminology RDBMS MongoDB Table, View ➜ Collection Row ➜ Document

    Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference Partition ➜ Shard
  5. Typical (relational) ERD User ·Name ·Email address Category ·Name ·URL

    Comment ·Comment ·Date ·Author Article ·Name ·Slug ·Publish date ·Text Tag ·Name ·URL
  6. MongoDB ERD User ·Name ·Email address Article ·Name ·Slug ·Publish

    date ·Text ·Author Comment[] ·Comment ·Date ·Author Tag[] ·Value Category[] ·Value
  7. Node 1 Secondary Config Server Node 1 Secondary Config Server

    Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server Sharding infrastructure
  8. Full Featured •  Ad Hoc queries •  Real time aggregation

    •  Rich query capabilities •  Strongly consistent •  Geospatial features •  Support for most programming languages •  Flexible schema
  9. Single Data Center •  Automated failover •  Tolerates server failures

    •  Tolerates rack failures •  Number of replicas defines failure tolerance Primary – A Primary – B Primary – C Secondary – A Secondary – A Secondary – B Secondary – B Secondary – C Secondary – C
  10. Active/Standby Data Center •  Tolerates server and rack failure • 

    Standby data center Data Center - West Primary – A Primary – B Primary – C Secondary – A Secondary – B Secondary – C Data Center - East Secondary – A Secondary – B Secondary – C
  11. Active/Active Data Center •  Tolerates server, rack, data center failures,

    network partitions Data Center - West Primary – A Primary – B Primary – C Secondary – A Secondary – B Secondary – C Data Center - East Secondary – A Secondary – B Secondary – C Secondary – B Secondary – C Secondary – A Data Center - Central Arbiter – A Arbiter – B Arbiter – C
  12. Global Distribution Real-time Real-time Real-time Real-time Real-time Real-time Real-time Primary

    Secondary Secondary Secondary Secondary Secondary Secondary Secondary
  13. High Volume Data Feeds •  More machines, more sensors, more

    data •  Variably structured Machine Generated Data •  High frequency trading •  Daily closing price Securities Data •  Multiple data sources •  Each changes their format consistently •  Student Scores, Telecom logs Social Media / General Public
  14. High Volume Data Feeds Data Sources Asynchronous Writes Flexible document

    model can adapt to changes in sensor format Write to memory with periodic disk flush Data Sources Data Sources Data Sources Scale writes over multiple shards
  15. Operational Intelligence •  Large volume of users •  Very strict

    latency requirements Ad Targeting •  Expose data to millions of customers •  Reports on large volumes of data •  Reports that update in real time Real time dashboards •  Join the conversation Social Media Monitoring
  16. Operational Intelligence Dashboards API Low latency reads Parallelize queries across

    replicas and shards In database aggregation Flexible schema adapts to changing input data Can use same cluster to collect, store and report on data
  17. {  cookie_id:  “1234512413243”,      advertiser:{        

         apple:  {                  actions:  [                        {  impression:  ‘ad1’,  time:  123  },                        {  impression:  ‘ad2’,  time:  232  },                        {  click:  ‘ad2’,  time:  235  },                        {  add_to_cart:  ‘laptop’,                            sku:  ‘asdf23f’,                              time:  254  },                        {  purchase:  ‘laptop’,  time:  354  }                    ]  …   Behavioral Profiles 1 2 3 See Ad See Ad 4 Click Convert Rich profiles collecting multiple complex actions Scale out to support high throughput of activities tracked Dynamic schemas make it easy to
  18. Metadata •  Diverse product portfolio •  Complex querying and filtering

    Product Catalogue •  Data mining Data analysis •  Retina Scans •  Fingerprints Biometric
  19. Metadata {  ISBN:  “00e8da9b”,      type:  “Book”,    

     country:  “Egypt”,      title:  “Ancient  Egypt”   }   {  type:  “Artefact”,      medium:  “Ceramic”,      country:  “Egypt”,      year:  “3000  BC”   }   Flexible data model for similar but different objects Indexing and rich query API for easy searching and sorting db.archives.        find({  “country”:  “Egypt”  });  
  20. Content Management •  Comments and user generated content •  Personalization

    of content, layout News Site •  Generate layout on the fly •  No need to cache static pages Multi-device rendering •  Store large objects •  Simpler modeling of metadata Sharing
  21. Content Management {  camera:  “Nikon  d4”,      location:  [

     -­‐122.418333,  37.775  ]     }   {  camera:  “Canon  5d  mkII”,      people:  [  “Jim”,  “Carol”  ],        taken_on:  ISODate("2012-­‐03-­‐07T18:32:35.002Z")   }   {  origin:  “facebook.com/photos/xwdf23fsdf”,      license:  “Creative  Commons  CC0”,        size:  {              dimensions:  [  124,  52  ],            units:  “pixels”      }   }   Flexible data model for similar but different objects Horizontal scalability for large data sets Geo spatial indexing for location-based searches GridFS for large object storage
  22. Application Why MongoDB Might be a good fit Large number

    of objects to store Sharding lets you split objects across multiple servers High write or read throughput Sharding + Replication lets you scale read and write traffic across multiple servers Low latency access Memory mapped storage engine cahces documents in RAM, enabling in-memory performance. Data locality of documents can significantly improve latency over join-based approaches Variable data in objects Dynamic schema and JSON data model enable fleixlbe data storage without sparse tables or complex joins Cloud based deployment Sharding and replication let you work around hardware limitations in clouds.
  23. Stores user and location-based data in MongoDB for social networking

    mobile app Case Study Problem Why MongoDB Results •  Relational architecture could not scale •  Check-in data growth hit single-node capacity ceiling •  Significant work to build custom sharding layer •  Auto-sharding to scale high-traffic and fast- growing application •  Geo-indexing for easy querying of location- based data •  Simple data model •  Focus engineering on building mobile app vs. back-end •  Scale efficiently with limited resources •  Increased developer productivity
  24. Serves targeted content to users using MongoDB- powered identity system

    Case Study Problem Why MongoDB Results •  20M+ unique visitors per month •  Rigid relational schema unable to evolve with changing data types and new features •  Slow development cycles •  Easy-to-manage dynamic data model enables limitless growth, interactive content •  Support for ad hoc queries •  Highly extensible •  Rapid rollout of new features •  Customized, social conversations throughout site •  Tracks user data to increase engagement, revenue
  25. Real-time server and website monitoring solution runs on MongoDB Case

    Study Problem Why MongoDB Results •  Needed to handle thousands of requests per second •  MySQL resulted in millions of rows per month, per server •  Difficult to scale MySQL with replication •  General purpose DB •  High-write throughput •  Scales easily while maintaining performance •  Easy-to-use replication and automated failover •  Native PHP and Python drivers •  MongoDB-first policy •  12+ TB ingested per month •  Increased performance, decreased disk usage •  Simplified infrastructure cuts costs, frees up resources for dev
  26. Uses MongoDB to safeguard over 6 billion images served to

    millions of customers Case Study Problem Why MongoDB Results •  6B images, 20TB of data •  Brittle code base on top of Oracle database – hard to scale, add features •  High SW and HW costs •  JSON-based data model •  Agile, high performance, scalable •  Alignment with Shutterfly’s services- based architecture •  80% cost reduction •  900% performance improvement •  Faster time-to-market •  Dev. cycles in weeks vs. tens of months
  27. Stores 3.5 TB of data in MongoDB to power real-

    time dictionary Case Study Problem Why MongoDB Results •  Performance roadblocks with MySQL •  Massive data ingestion led to database outages •  Tables locked for tens of seconds during inserts •  Easy to store, locate, retrieve data •  Eliminated Memcached while increasing performance: up to 2M requests per hour, 8,000 words inserted per second •  Long runway for scale-out •  Migrated 5B records in 1 day, zero downtime •  Reduced code by 75% •  Sped up document metadata retrieval from 30 ms to 0.1 ms •  Significant cost savings, 15% reduction in servers