MongoDB: Architecture and Use Cases

Slide 1

Slide 1 text

MongoDB: Architecture and Use Cases Senior Solutions Architect, MongoDB Norberto Leite #mongodubai

Slide 2

Slide 2 text

Agenda

Slide 3

Slide 3 text

Agenda •  MongoDB •  Architecture •  Use Cases

Slide 4

Slide 4 text

MongoDB

Slide 5

Slide 5 text

Global Community 5,000,000+ MongoDB Downloads 100,000+ Online Education Registrants 20,000+ MongoDB User Group Members 20,000+ MongoDB Days Attendees 20,000+ MongoDB Management Service (MMS) Users

Slide 6

Slide 6 text

NoSQL Features Flexible Data Models •  Lists, embedded objects •  Sparse data •  Semi-structured data •  Agile development High Data Throughput • Reads • Writes Big Data •  Aggregate Data Size •  Number of Objects Low Latency •  For reads and writes •  Millisecond Latency Cloud Computing •  Runs everywhere •  No special hardware Commodity Hardware •  Ethernet •  Local data storage •  JSON Based •  Dynamic Schemas •  Replica Sets to scale reads •  Sharding to scale writes •  1000s of shards in a single DB •  Data partitioning •  Designed for “typical” OS and local ﬁle system •  Scale-out to overcome hardware limitations •  In-memory cache •  Scale-out working set

Slide 7

Slide 7 text

Document Data Model Relational MongoDB { ! first_name: ‘Paul’,! surname: ‘Miller’,! city: ‘London’,! location: [45.123,47.232],! cars: [ ! { model: ‘Bentley’,! year: 1973,! value: 100000, … },! { model: ‘Rolls Royce’,! year: 1965,! value: 330000, … }! }! }!

Slide 8

Slide 8 text

Terminology RDBMS MongoDB Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference Partition ➜ Shard

Slide 9

Slide 9 text

Typical (relational) ERD User ·Name ·Email address Category ·Name ·URL Comment ·Comment ·Date ·Author Article ·Name ·Slug ·Publish date ·Text Tag ·Name ·URL

Slide 10

Slide 10 text

MongoDB ERD User ·Name ·Email address Article ·Name ·Slug ·Publish date ·Text ·Author Comment[] ·Comment ·Date ·Author Tag[] ·Value Category[] ·Value

Slide 11

Slide 11 text

MongoDB has native bindings for over 12 languages

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Replication

Slide 15

Slide 15 text

Replication @ Secondary Secondary Primary Client Application Driver Write Read Read

Slide 16

Slide 16 text

Replica Set – Failover Node 1 Secondary Node 2 Primary Node 3 Replication Heartbeat

Slide 17

Slide 17 text

Scalability

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Vertical Scale

Slide 20

Slide 20 text

Horizontal Scale

Slide 21

Slide 21 text

Sharding @ www.etiennemansard.com

Slide 22

Slide 22 text

Horizontal Scalability (Scale Out)

Slide 23

Slide 23 text

Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server Sharding infrastructure

Slide 24

Slide 24 text

Full Featured •  Ad Hoc queries •  Real time aggregation •  Rich query capabilities •  Strongly consistent •  Geospatial features •  Support for most programming languages •  Flexible schema

Slide 25

Slide 25 text

Much More Productive

Slide 26

Slide 26 text

Way More Productive

Slide 27

Slide 27 text

Architecture

Slide 28

Slide 28 text

Single Data Center •  Automated failover •  Tolerates server failures •  Tolerates rack failures •  Number of replicas defines failure tolerance Primary – A Primary – B Primary – C Secondary – A Secondary – A Secondary – B Secondary – B Secondary – C Secondary – C

Slide 29

Slide 29 text

Active/Standby Data Center •  Tolerates server and rack failure •  Standby data center Data Center - West Primary – A Primary – B Primary – C Secondary – A Secondary – B Secondary – C Data Center - East Secondary – A Secondary – B Secondary – C

Slide 30

Slide 30 text

Active/Active Data Center •  Tolerates server, rack, data center failures, network partitions Data Center - West Primary – A Primary – B Primary – C Secondary – A Secondary – B Secondary – C Data Center - East Secondary – A Secondary – B Secondary – C Secondary – B Secondary – C Secondary – A Data Center - Central Arbiter – A Arbiter – B Arbiter – C

Slide 31

Slide 31 text

Global Distribution Real-time Real-time Real-time Real-time Real-time Real-time Real-time Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary

Slide 32

Slide 32 text

Read Global / Write Local Primary:NYC Secondary:NYC Primary:LON Primary:SYD Secondary:LON Secondary:NYC Secondary:SYD Secondary:LON Secondary:SYD

Slide 33

Slide 33 text

Use Cases

Slide 34

Slide 34 text

High Volume Data Feeds •  More machines, more sensors, more data •  Variably structured Machine Generated Data •  High frequency trading •  Daily closing price Securities Data •  Multiple data sources •  Each changes their format consistently •  Student Scores, Telecom logs Social Media / General Public

Slide 35

Slide 35 text

High Volume Data Feeds Data Sources Asynchronous Writes Flexible document model can adapt to changes in sensor format Write to memory with periodic disk ﬂush Data Sources Data Sources Data Sources Scale writes over multiple shards

Slide 36

Slide 36 text

Operational Intelligence •  Large volume of users •  Very strict latency requirements Ad Targeting •  Expose data to millions of customers •  Reports on large volumes of data •  Reports that update in real time Real time dashboards •  Join the conversation Social Media Monitoring

Slide 37

Slide 37 text

Operational Intelligence Dashboards API Low latency reads Parallelize queries across replicas and shards In database aggregation Flexible schema adapts to changing input data Can use same cluster to collect, store and report on data

Slide 38

Slide 38 text

{ cookie_id: “1234512413243”, advertiser:{ apple: { actions: [ { impression: ‘ad1’, time: 123 }, { impression: ‘ad2’, time: 232 }, { click: ‘ad2’, time: 235 }, { add_to_cart: ‘laptop’, sku: ‘asdf23f’, time: 254 }, { purchase: ‘laptop’, time: 354 } ] … Behavioral Proﬁles 1 2 3 See Ad See Ad 4 Click Convert Rich proﬁles collecting multiple complex actions Scale out to support high throughput of activities tracked Dynamic schemas make it easy to

Slide 39

Slide 39 text

Metadata •  Diverse product portfolio •  Complex querying and ﬁltering Product Catalogue •  Data mining Data analysis •  Retina Scans •  Fingerprints Biometric

Slide 40

Slide 40 text

Metadata { ISBN: “00e8da9b”, type: “Book”, country: “Egypt”, title: “Ancient Egypt” } { type: “Artefact”, medium: “Ceramic”, country: “Egypt”, year: “3000 BC” } Flexible data model for similar but different objects Indexing and rich query API for easy searching and sorting db.archives. find({ “country”: “Egypt” });

Slide 41

Slide 41 text

Content Management •  Comments and user generated content •  Personalization of content, layout News Site •  Generate layout on the ﬂy •  No need to cache static pages Multi-device rendering •  Store large objects •  Simpler modeling of metadata Sharing

Slide 42

Slide 42 text

Content Management { camera: “Nikon d4”, location: [ -‐122.418333, 37.775 ] } { camera: “Canon 5d mkII”, people: [ “Jim”, “Carol” ], taken_on: ISODate("2012-‐03-‐07T18:32:35.002Z") } { origin: “facebook.com/photos/xwdf23fsdf”, license: “Creative Commons CC0”, size: { dimensions: [ 124, 52 ], units: “pixels” } } Flexible data model for similar but different objects Horizontal scalability for large data sets Geo spatial indexing for location-based searches GridFS for large object storage

Slide 43

Slide 43 text

Is my use case a good ﬁt for MongoDB?

Slide 44

Slide 44 text

Application Why MongoDB Might be a good fit Large number of objects to store Sharding lets you split objects across multiple servers High write or read throughput Sharding + Replication lets you scale read and write traffic across multiple servers Low latency access Memory mapped storage engine cahces documents in RAM, enabling in-memory performance. Data locality of documents can significantly improve latency over join-based approaches Variable data in objects Dynamic schema and JSON data model enable fleixlbe data storage without sparse tables or complex joins Cloud based deployment Sharding and replication let you work around hardware limitations in clouds.

Slide 45

Slide 45 text

Stores user and location-based data in MongoDB for social networking mobile app Case Study Problem Why MongoDB Results •  Relational architecture could not scale •  Check-in data growth hit single-node capacity ceiling •  Significant work to build custom sharding layer •  Auto-sharding to scale high-traffic and fast- growing application •  Geo-indexing for easy querying of location- based data •  Simple data model •  Focus engineering on building mobile app vs. back-end •  Scale efficiently with limited resources •  Increased developer productivity

Slide 46

Slide 46 text

Serves targeted content to users using MongoDB- powered identity system Case Study Problem Why MongoDB Results •  20M+ unique visitors per month •  Rigid relational schema unable to evolve with changing data types and new features •  Slow development cycles •  Easy-to-manage dynamic data model enables limitless growth, interactive content •  Support for ad hoc queries •  Highly extensible •  Rapid rollout of new features •  Customized, social conversations throughout site •  Tracks user data to increase engagement, revenue

Slide 47

Slide 47 text

Real-time server and website monitoring solution runs on MongoDB Case Study Problem Why MongoDB Results •  Needed to handle thousands of requests per second •  MySQL resulted in millions of rows per month, per server •  Difficult to scale MySQL with replication •  General purpose DB •  High-write throughput •  Scales easily while maintaining performance •  Easy-to-use replication and automated failover •  Native PHP and Python drivers •  MongoDB-first policy •  12+ TB ingested per month •  Increased performance, decreased disk usage •  Simplified infrastructure cuts costs, frees up resources for dev

Slide 48

Slide 48 text

Uses MongoDB to safeguard over 6 billion images served to millions of customers Case Study Problem Why MongoDB Results •  6B images, 20TB of data •  Brittle code base on top of Oracle database – hard to scale, add features •  High SW and HW costs •  JSON-based data model •  Agile, high performance, scalable •  Alignment with Shutterﬂy’s services- based architecture •  80% cost reduction •  900% performance improvement •  Faster time-to-market •  Dev. cycles in weeks vs. tens of months

Slide 49

Slide 49 text

Stores 3.5 TB of data in MongoDB to power real- time dictionary Case Study Problem Why MongoDB Results •  Performance roadblocks with MySQL •  Massive data ingestion led to database outages •  Tables locked for tens of seconds during inserts •  Easy to store, locate, retrieve data •  Eliminated Memcached while increasing performance: up to 2M requests per hour, 8,000 words inserted per second •  Long runway for scale-out •  Migrated 5B records in 1 day, zero downtime •  Reduced code by 75% •  Sped up document metadata retrieval from 30 ms to 0.1 ms •  Signiﬁcant cost savings, 15% reduction in servers