Webinar - High Volume Data Feeds

1 August 2012 High Volume Data Feeds

2 •  Brief overview of MongoDB •  Challenges
for high volume data feeds •  How you can use MongoDB to solve them •  Examples of real world scenarios Agenda

4 Volume and Type of Data Agile Development
•  Systems scaling horizontally, not verGcally •  Commodity servers •  Cloud CompuGng •  Trillions of records •  10’s of millions of queries per second •  Volume of data •  Semi-‐structured and unstructured data •  IteraGve & conGnuous •  New and emerging Apps New Architectures

5 Increases complexity lowering ProducGvity Costs Cost
of database increases •  Increased database licensing cost •  VerGcal, not horizontal, scaling •  High cost of SAN Developer producGvity decreases •  Needed to add new soPware layers of ORM, Caching, Sharding, and Message Queue •  Polymorphic, semi-‐structured and unstructured data not well supported

6 •  Document-‐oriented Storage •  Based on JSON Documents
•  Schema-‐less •  Scalable Architecture •  Auto-‐sharding •  ReplicaGon & high availability •  Open source, wriUen in C++ •  Key Features Include: •  Full featured indexes •  Query language •  Map/Reduce & aggregaGon

7 shard mongos mongos mongos config config config mongod mongod
mongod shard mongod mongod mongod shard mongod mongod mongod

8 General Purpose Easy to Use Fast & Scalable Multiple
Data interfaces Full featured indexes Rich data model Simple to setup and manage Native language drivers in all popular languages Easy mapping to object oriented code Dynamically add / remove capacity with no downtime Auto-sharding built in Operates at in- memory speed wherever possible

10 Server metrics Social media Financial data Web click stream

11 Challenges •  ConGnuous arrival of data • 
Costly to scale disks to accommodate high rates of small writes •  Can’t apply back pressure to the feed Storage Event Event Event Event Event Event Event Event

12 Challenges •  Adding more storage over Gme
•  Aging out data that’s no longer needed •  Minimizing resource overhead of “cold” data Fast Storage Archival Storage Recent Data Old Data Add Capacity

13 Challenges •  Data in feed can evolve over
Gme •  Can’t take system down when format changes a=1 b=2 a=3 b=4 a=5 b=6 c=7 “c” added to records a=‘foo’ b=8 c=9 “a” changed to a string time

14 Challenges •  Query and ﬁlter data without
transformaGon •  Low latency access to data •  Workload isolaGon Storage Client Data Feed Queries Writes

16 shard mongos shard shard Event Event Event Event Event
•  Spread writes across multiple shards •  Linearly scale write capacity of cluster

17 Server •  Writes buﬀered in RAM and periodically wriUen
to disk •  Asynchronous writes decouple app from storage RAM Disk ok

18 •  RAM acts as LRU cache • 
Recent data is in memory •  Old data is on disk RAM Disk

19 •  Accommodate changes in feed protocol • 
Zero downGme for feed protocol upgrades > db.events.save( { a:1, b:2 } ) > db.events.save( { a:3, b:4 } ) > db.events.save( { a:5, b:6, c: 7} ) > db.events.save( { a:”foo”, b:8, c:9 } ) > db.events.find() { "_id" : ObjectId("501a2e263520cae8d164eabd"), "a" : 1, "b" : 2 } { "_id" : ObjectId("501a2e263520cae8d164eabe"), "a" : 3, "b" : 4 } { "_id" : ObjectId("501a2e263520cae8d164eabf"), "a" : 5, "b" : 6, "c" : 7 } { "_id" : ObjectId("501a2e443520cae8d164eac0"), "a" : "foo", "b" : 8, "c" : 9 }

20 •  Writes always go to primary of shard
•  Queries can be send to only secondaries with a read preference •  Tags can be used to isolate workloads to diﬀerent replicas shard mongod (primary) mongod (secondary) mongod (secondary) writes queries mongod (secondary)

22 §  Analyze a staggering amount of data for
a system build on conGnuous stream of high-‐ quality text pulled from online sources §  Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts §  IniGally launched enGrely on MySQL but quickly hit performance road blocks Problem Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and drama?cally smaller. Since we don’t spend ?me worrying about the database, we can spend more ?me wri?ng code for our applica?on. -‐Tony Tam, Vice President of Engineering and Technical Co-‐founder §  Migrated 5 billion records in a single day with zero downGme §  MongoDB powers every website requests: 20m API calls per day §  Ability to eliminated memcached layer, creaGng a simplified system that required fewer resources and was less prone to error. Why MongoDB §  Reduced code by 75% compared to MySQL §  Fetch Gme cut from 400ms to 60ms §  Sustained insert speed of 8k words per second, with frequent bursts of up to 50k per second §  Significant cost savings and 15% reducGon in servers Impact Wordnik uses MongoDB as the foundaGon for its “live” dicGonary that stores its enGre text corpus – 3.5T of data in 20 billion records

23 §  Intuit hosts more than 500,000 websites
§  wanted to collect and analyze data to recommend conversion and lead generaGon improvements to customers. §  With 10 years worth of user data, it took several days to process the informaGon using a relaGonal database. Problem §  Cope with high rate of clickstream traffic §  Easy to build new features and extend the product §  Large community provided support and responsiveness, even without commercial support contract Why MongoDB §  In one week Intuit was able to become proficient in MongoDB development §  Developed applicaGon features more quickly for MongoDB than for relaGonal databases §  MongoDB was 2.5 Jmes faster than MySQL Impact Intuit relies on a MongoDB-‐powered real-‐Jme analyJcs tool for small businesses to derive interesJng and acJonable paMerns from their customers’ website traffic We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -‐Nirmala Ranganathan, Intuit

24 More info: hMp://10gen.com/use-‐case/high-‐volume-‐data-‐feeds Thanks!

Webinar - High Volume Data Feeds

Webinar - High Volume Data Feeds

mongodb

More Decks by mongodb

Featured

Transcript

1 August 2012 High Volume Data Feeds

2 •  Brief overview of MongoDB •  Challenges

3 •  Brief overview of MongoDB •  Challenges

4 Volume and Type of Data Agile Development

5 Increases complexity lowering ProducGvity Costs Cost

6 •  Document-‐oriented Storage •  Based on JSON Documents

7 shard mongos mongos mongos config config config mongod mongod

8 General Purpose Easy to Use Fast & Scalable Multiple

9 •  Brief overview of MongoDB •  Challenges

10 Server metrics Social media Financial data Web click stream

11 Challenges •  ConGnuous arrival of data •

12 Challenges •  Adding more storage over Gme

13 Challenges •  Data in feed can evolve over

14 Challenges •  Query and ﬁlter data without

15 •  Brief overview of MongoDB •  Challenges

16 shard mongos shard shard Event Event Event Event Event

17 Server •  Writes buﬀered in RAM and periodically wriUen

18 •  RAM acts as LRU cache •

19 •  Accommodate changes in feed protocol •

20 •  Writes always go to primary of shard

21 •  Brief overview of MongoDB •  Challenges

22 §  Analyze a staggering amount of data for

23 §  Intuit hosts more than 500,000 websites

24 More info: hMp://10gen.com/use-‐case/high-‐volume-‐data-‐feeds Thanks!