Slide 1

Slide 1 text

AN APPROACH TO DESIGN LARGE SCALE DATA CENTRIC ARCHITECTURE USING MONGODB By SUSHMITHA DIWAKAR ARULJOTHI ANNAMALAI CLARENCE J M TAURO Department of Computer Science Christ University, Hosur Road, Bangalore @ 5TH NATIONAL CONFERENCE ON EMERGING TRENDS IN IT ON 26TH FEBRUARY , 2014

Slide 2

Slide 2 text

Objectives • What Scale Is? • How is/was Scale achieved? Traditional Way • Scaling Today • Introduction to NoSQL • MongoDB • Scaling with MongoDB • Replication in MongoDB • Search Design

Slide 3

Slide 3 text

What Scale Is? • How well a solution to some problem will work when the size of the problem increases • Massive adoption/usage

Slide 4

Slide 4 text

How is/was Scale achieved? Traditional Way • Less usage of Joins; Less triggers • DEnormalize as much as possible • Horizontal/Vertical replication • Increase hardware • Traditional RDBMS; Use ORMs like Hibernate • Manual process – Developers job

Slide 5

Slide 5 text

Scaling Today • Much more persistence options • Cloud based architectures – completely abstract the underlying hardware from the developer • Use PaaS – CloudFoundry from Pivotal • Less developers

Slide 6

Slide 6 text

Example: Scaling with CloudFoundry v2 cf scale appName --instances 10

Slide 7

Slide 7 text

Introduction to NoSQL • NoSQL stands for – “NoSQL” = “No SQL” = Not using traditional relational DBMS – “No SQL” Don’t use SQL language – No Join • Usually do not require a fixed table schema • All NoSQL offerings relax one or more of the ACID properties

Slide 8

Slide 8 text

MongoDB • MongoDB ( from “humongous”) • Cross platform schemaless document-oriented NoSQL database • MongoDB uses BSON (JSON like structure) • Features include: – File storage – Indexing – Scaling – Replication

Slide 9

Slide 9 text

Sample MongoDB Document { _id : ObjectId("4e77bb3b8a3e000000004f7a"), when : Date("2014-02-126T02:10:11.3Z", author : "arul", title : "MongoDB", text : "This is the text of the post", tags : [ "JSON", "BSON" ], votes : 5, voters : ["sushmita", "clarence", "jothi" ], }

Slide 10

Slide 10 text

Scaling – Larger Level • Prefer simpler architectures • Completely breakdown workload • Fine-tune your workload • Do NOT use ORM – unless you really want to – Use simpler standards – Spring’s JdbcTemplate • Use smaller and fine-grained components to deploy your application • Shard • Replicate

Slide 11

Slide 11 text

Scaling – Micro Level • Multiple documents vs. Nested documents • Indexing – Need to have right amount of indexes – More indexes make the DB slow. Esp. MongoDB • Transactions vs. Compensating Transactions – JTA transactions are highly discouraged

Slide 12

Slide 12 text

Nested/Embedded Data Model - MongoDB Single I/O – or at least stored in continuous blocks

Slide 13

Slide 13 text

Normalized Data Model - MongoDB How many I/Os? Well it depends on the storage

Slide 14

Slide 14 text

Scaling – Shard Keys • Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth – MongoDB does a range based sharding – Sharding can increase the number of queries • Figure out the most common use case and then decide on sharding – Do this at design time

Slide 15

Slide 15 text

Replication in MongoDB • MongoDB uses replica set to achieve replication • Replica set is a group of MongoDB instances that can host the same data set • Replica set has one node as primary node which receives all write operations, where all other instances are secondary’s, which applies operations from the primary node so they can have the same data set

Slide 16

Slide 16 text

Replication in MongoDB

Slide 17

Slide 17 text

Search Functionality • NoSQL will be unique due to its special characteristic of “multi-attribute querying” • Multi Attribute Querying – Using $and operation db.inventory.find( { $and: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true} ] })

Slide 18

Slide 18 text

Search Design • Search is based on sharding id • With the help of indexes the horizontal scaling technique is implemented

Slide 19

Slide 19 text

Conclusion • Design Matters!

Slide 20

Slide 20 text

Future Work • There are more things while designing a scalable architecture: – Locking – Random partitioning – Write concerns

Slide 21

Slide 21 text

Questions?

Slide 22

Slide 22 text

No content