Slide 1

Slide 1 text

International Conference on Emerging Research in Computing, Information, Communication and Applications, ERCICA 2013

Slide 2

Slide 2 text

A COMPARATIVE ANALYSIS OF DIFFERENT NOSQL DATABASES ON DATA MODEL, QUERY MODEL AND REPLICATION MODEL By BASAWANTH RAO PRASHANTH K R Centre for Research Christ University, Hosur Road, Bangalore & CLARENCE J M TAURO Centre for Research Christ University, Hosur Road, Bangalore

Slide 3

Slide 3 text

Objectives • Introduction to NoSQL • Need for the study • Are ACID Properties always desirable? • Basically available, Soft state, Eventually consistent (BASE) • The CAP Theorem • Motives of NoSQL Practitioner • Aim of the study • Validation procedure • Findings • Conclusion

Slide 4

Slide 4 text

Introduction RDBMS - predominant technology for storing structured data in web and business applications “one size fits all” - thinking concerning data-stores has been questioned Apply NoSQL databases for the persistence layer of a collaborative web application

Slide 5

Slide 5 text

Need for the Study • What is the problem of traditional databases?

Slide 6

Slide 6 text

Need for the Study • What is the problem of traditional databases?

Slide 7

Slide 7 text

Need for the Study - ACID Properties • ATOMICITY: All of nothing • CONSISTENCY: Any transaction will take the database from one consistent state to another, with no broken constraints (referential integrity) • ISOLATION: Other operations cannot access the data that has been modified during a transaction that has not been completed • DURABILITY: Ability to recover the committed transaction updates against any kind of system failure

Slide 8

Slide 8 text

Are ACID Properties always desirable? • … But what about: – Latency – Partition Tolerance – High Availability

Slide 9

Slide 9 text

basically available, soft state, eventually consistent

Slide 10

Slide 10 text

The CAP Theorem

Slide 11

Slide 11 text

Choose Any TWO

Slide 12

Slide 12 text

Motives of NoSQL Practitioner • Avoidance of Unneeded Complexity • High Throughput • Horizontal Scalability and Running on Commodity Hardware • Avoidance of Expensive Object-Relational Mapping • Complexity and Cost of Setting up Database Clusters • Compromising Reliability for Better Performance • The Current “One size fit’s it all” Databases Thinking Was and Is Wrong

Slide 13

Slide 13 text

Aim of the Study “To study and apply the available systems of non- relational databases to persist objects in order to obtain a more specific knowledge about the broad range of existing technologies”

Slide 14

Slide 14 text

Operational Definitions Used • NoSQL • Object Persistence • The CAP Theorem • Multi-Version Concurrency Control (MVCC)

Slide 15

Slide 15 text

Objectives of the Study [Other] • Analyze various non-relational databases • Analyze few selected databases and to test by benchmarking and categorizing their performance • Develop a framework in order to assist the creation and execution of the benchmarks • Develop prototypes by making use of NoSQL technologies • Apply the developed prototypes for comparing the NoSQL databases with the traditional solutions of relational databases

Slide 16

Slide 16 text

Literature Survey Survey on various NoSQL databases and their capabilities Analysis of data model, query model, replication model and consistency model. Persistence Layer, Model Driven Development, Object Notations (Not discussed in the presentation)

Slide 17

Slide 17 text

Factors/Variables of the Study • The following are the factors considered for the study of various NoSQL Databases – Data Model – Query Model – Replication Model – Consistency Model – Sharding

Slide 18

Slide 18 text

Factors of the Study for Benchmarking • The following are the factors considered while benchmarking NoSQL Databases. – Raw Performance – Scalability – Elasticity – Read/Write Operations

Slide 19

Slide 19 text

NoSQL Databases Used for the Study

Slide 20

Slide 20 text

Validation Procedure • Comparison of the sorting capabilities of the examined NoSQL databases • Comparison of the range querying capabilities of the examined NoSQL databases • Comparison of the aggregation functionalities • Comparison of the durability properties • The performance of the MongoDB store is compared against the stores for MySQL and the in memory version of HSQL CASE STUDY

Slide 21

Slide 21 text

Findings - 1 MySQL HSQL MongoDB Total time spent for DB operation 30min 51s 41s Slowest operation with Avg. Time Writing one object into wiki page in 202ms Getting one object from job in3ms Getting into an object from a job in 11ms Avg. time for writing one object into wiki page 102ms 2ms 1ms Avg. Time for getting one object from the wiki page 2ms 1ms 2ms Avg. Time for slowest count operation 2ms 1ms 3ms Avg. Time of the five slowest queries in milliseconds 13,5,5,4,2 3,3,1,1,1 2,1,1,1,1

Slide 22

Slide 22 text

Findings - 2 MySQL HSQL MongoDB Time spent for db operations. 140.625M 96.458333M 39.5M Average times of the five slowest operations in seconds 2.85, 2.225 , 1.9375, 1.9075, 1.445 3.6525,1.9425, 1.620, 1.3675,1.7522 3.265,0.52 , 0.355, 0.3475, 0.2725 The five greatest time in Minutes 47.375, 20.5833, 16.8333, 12.375, 5.6666 40.4166, 11.25, 8.70833, 7.75,5.0416 18.0416, 6.25, 4.375, 3.75, 2.3333

Slide 23

Slide 23 text

Conclusion • The knowledge acquired and developed by this study can contribute to the development of a systematic approach for solving problems of data persistence with an alternative non-relational database • The careful examination of NoSQL databases and their application creates a common set of design patterns that may be reused when modeling data and designing a database

Slide 24

Slide 24 text

Future Work • This project focused mainly on the most used NoSQL databases. – Still, a large number of NoSQL databases exists; each building on different aspects that could also be studied • The benchmarks could also be extended in order to include different workload scenarios – by varying the percentage of read and write operations but also by using different distributions for selecting objects other than the uniform distribution • Repeat the benchmarks using an infrastructure to that used in a production environment

Slide 25

Slide 25 text

Questions?

Slide 26

Slide 26 text

No content