An Approach to Design Large Scale Data Centric Architecture using MongoDB

An Approach to Design Large Scale Data Centric Architecture using MongoDB

Paper Presentation: 5th National Conference on Emerging Trends in IT.
With help of Arul Jothi Annamalai and Clarence J M Tauro.

8f84be3677d0c178301145575f557235?s=128

Sushmitha Diwakar

February 26, 2014
Tweet

Transcript

  1. AN APPROACH TO DESIGN LARGE SCALE DATA CENTRIC ARCHITECTURE USING

    MONGODB By SUSHMITHA DIWAKAR ARULJOTHI ANNAMALAI Department of Computer Science Christ University, Hosur Road, Bangalore & CLARENCE J M TAURO Centre for Research Christ University, Hosur Road, Bangalore
  2. Objectives • What Scale Is? • How is/was Scale achieved?

    Traditional Way • Scaling Today • Introduction to NoSQL • MongoDB • Scaling with MongoDB • Replication in MongoDB • Search Design
  3. What Scale Is? • How well a solution to some

    problem will work when the size of the problem increases • Massive adoption/usage
  4. How is/was Scale achieved? Traditional Way • Less usage of

    Joins; Less triggers • DEnormalize as much as possible • Horizontal/Vertical replication • Increase hardware • Traditional RDBMS; Use ORMs like Hibernate • Manual process – Developers job
  5. Scaling Today • Much more persistence options • Cloud based

    architectures – completely abstract the underlying hardware from the developer • Use PaaS – CloudFoundry from Pivotal • Less developers
  6. Example: Scaling with CloudFoundry v2 cf scale appName --instances 10

  7. Introduction to NoSQL • NoSQL stands for – “NoSQL” =

    “No SQL” = Not using traditional relational DBMS – “No SQL”  Don’t use SQL language – No Join • Usually do not require a fixed table schema • All NoSQL offerings relax one or more of the ACID properties
  8. MongoDB • MongoDB ( from “humongous”) • Cross platform schemaless

    document-oriented NoSQL database • MongoDB uses BSON (JSON like structure) • Features include: – File storage – Indexing – Scaling – Replication
  9. Sample MongoDB Document { _id : ObjectId("4e77bb3b8a3e000000004f7a"), when : Date("2014-02-126T02:10:11.3Z",

    author : "arul", title : "MongoDB", text : "This is the text of the post", tags : [ "JSON", "BSON" ], votes : 5, voters : ["sushmita", "clarence", "jothi" ], }
  10. Scaling – Larger Level • Prefer simpler architectures • Completely

    breakdown workload • Fine-tune your workload • Do NOT use ORM – unless you really want to – Use simpler standards – Spring’s JdbcTemplate • Use smaller and fine-grained components to deploy your application • Shard • Replicate
  11. Scaling – Micro Level • Multiple documents vs. Nested documents

    • Indexing – Need to have right amount of indexes – More indexes make the DB slow. Esp. MongoDB • Transactions vs. Compensating Transactions – JTA transactions are highly discouraged
  12. Nested/Embedded Data Model - MongoDB Single I/O – or at

    least stored in continuous blocks
  13. Normalized Data Model - MongoDB How many I/Os? Well it

    depends on the storage
  14. Scaling – Shard Keys • Sharding is the process of

    storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth – MongoDB does a range based sharding – Sharding can increase the number of queries • Figure out the most common use case and then decide on sharding – Do this at design time
  15. Replication in MongoDB • MongoDB uses replica set to achieve

    replication • Replica set is a group of MongoDB instances that can host the same data set • Replica set has one node as primary node which receives all write operations, where all other instances are secondary’s, which applies operations from the primary node so they can have the same data set
  16. Replication in MongoDB

  17. Search Functionality • NoSQL will be unique due to its

    special characteristic of “multi-attribute querying” • Multi Attribute Querying – Using $and operation db.inventory.find( { $and: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true} ] })
  18. Search Design • Search is based on sharding id •

    With the help of indexes the horizontal scaling technique is implemented
  19. Conclusion • Design Matters!

  20. Future Work • There are more things while designing a

    scalable architecture: – Locking – Random partitioning – Write concerns
  21. Questions?

  22. None