Slide 1

Slide 1 text

Sridhar Nanjundeswaran Software Engineer, 10gen © Copyright 2010 10gen Inc.

Slide 2

Slide 2 text

Overview • Non-Relational Databases • MongoDB • MongoDB and Perl

Slide 3

Slide 3 text

Problems with traditional RDBMS • Applications are evolving all the time • Applications need new fields, new indexes • Users need to be able to alter their schemas without making their data unavailable • Replication is a solution for high read loads. Sooner or later, writing becomes a bottleneck • Sharding – partitioning a logical database across multiple database instances • Joins and aggregation become a problem • Distributed transactions are too slow for the web • Manual management of shards

Slide 4

Slide 4 text

• Why do we need them? • Type of non-relational databases Non-Relational Databases

Slide 5

Slide 5 text

Non-Relational Data Models • A non-relational database‟s data model determines the kinds of items it can contain and how they can be retrieved • What can the system store, and what does it know about what it contains? • The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition • What kind of queries can you do? • SQL is a manifestation of the kinds of queries that fall out of relational algebra

Slide 6

Slide 6 text

Types of Non-Relational Data Models • Key-value stores • Document stores • Column-oriented databases • Graph databases

Slide 7

Slide 7 text

Key-Value Stores • A mapping from a key to a value • The store doesn‟t know anything about the the key or value • The store doesn‟t know anything about the insides of the value • Operations • Set, get, or delete a key-value pair

Slide 8

Slide 8 text

Document Stores • The store is a container for documents • Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores • Can create “secondary indexes” • These provide the ability to query on any document field(s) • Operations: • Insert and delete documents • Update fields within documents

Slide 9

Slide 9 text

Column-Oriented Stores • Like a relational store, but flipped around: all data for a column is kept together • An index provides a means to get a column value for a record • Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop

Slide 10

Slide 10 text

Graph Databases • Stores vertex-to-vertex edges • Operations: • Getting and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints

Slide 11

Slide 11 text

Consistency Models • Relational databases support transactions • Can only see committed changes • Commit/abort span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Single vs Multi-Master

Slide 12

Slide 12 text

Single Master • All writes go to a single master and then replicated • Replication can provide arbitrary read scalability • Subject to coping with read-consistency issues • Sooner or later, writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem

Slide 13

Slide 13 text

Single Master - Sharding • Paritition the primary key space via hashing • Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system?

Slide 14

Slide 14 text

Multi-Master • Dynamo like solutions • Writes can occur to any node • The same record can be updated on different nodes by different clients • All writes are replicated everywhere • Collisions can occur • Who wins? • A collision resolution strategy is required • Vector clocks • http://en.wikipedia.org/wiki/Vector_clock

Slide 15

Slide 15 text

No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model Single Master Membase MongoDB Multi- Master/Dynamo Riak CouchDB Cassandra, HBase, Hypertable

Slide 16

Slide 16 text

Where MongoDB fits in the non-relational world MongoDB‟s architecture and features Installing and running MongoDB

Slide 17

Slide 17 text

MongoDB is a Document Store • MongoDB stores JSON objects as BSON • { LastName: „Flintstone‟, FirstName: „Fred‟, …} • Secondary Indexes • db.collection.ensureIndex({LastName : 1, FirstName : 1}); • Simple QBE-like query syntax • db.collection.find({LastName : „Flintstone‟}); • db.collection.find({LastName : { $gte : „Flintstone‟});

Slide 18

Slide 18 text

MongoDB vs Traditional RDBMS databases contain rows server contain tables schema joins

Slide 19

Slide 19 text

MongoDB is a Single-Master System • A database is served by members of a “replica set” • The system elects a primary (master) • Failure of the master is detected, and a new master is elected • Application writes get an error if there is no quorum to elect a new master • Reads continue to be fulfilled

Slide 20

Slide 20 text

MongoDB Storage Management • Data is kept in memory-mapped files • Servers should have a lot of memory • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses

Slide 21

Slide 21 text

Release History • First release – February 2009 • v1.0 - August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets • V1.8 – March 2011 – Journaling, Covered/Sparse indexes, Geo sphere

Slide 22

Slide 22 text

MongoDB – Advanced Queries • Geo-spatial queries • Create a geo index • Find points near a given point, sorted by radial distance • Can be planar or spherical • Find points within a certain radial distance, within a bounding box, or a polygon • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript

Slide 23

Slide 23 text

Scaling MongoDB • Replication • Read scalability • Master/Slave • Replica Sets • Sharding • A collection can be sharded • Each shard is served by its own replica set • New shards (each a replica set) can be added at any time • Shard key ranges are automatically balanced

Slide 24

Slide 24 text

MongoDB – Sharded Deployment

Slide 25

Slide 25 text

MongoDB Access • Drivers are available in many languages • 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools

Slide 26

Slide 26 text

V2.0 • Pretty soon • Concurrency • Faster data compaction • Faster map/reduce • TTL collections • Geospatial polygons • Hash shard key • Index 2.0 (smaller+faster)

Slide 27

Slide 27 text

Future – a short list • Full text Search • More concurrency • Online compaction • Internal compression • New aggregation framework Vote: http://jira.mongodb.org

Slide 28

Slide 28 text

MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License: AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers

Slide 29

Slide 29 text

© Copyright 2010 10gen Inc. try at try.mongodb.org

Slide 30

Slide 30 text

MongoDB and Perl • MongoDB – Official 10gen module • http://search.cpan.org/~kristina/MongoDB- 0.43/lib/MongoDB.pm • Tutorial - http://search.cpan.org/dist/MongoDB/lib/MongoD B/Tutorial.pod • Install MongoDB perl module • Make sure MongoDB is running • cpan –i inc::Module::Install • cpan –i MongoDB

Slide 31

Slide 31 text

@mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]