Mongodb and Perl - An Introduction

Overview • Non-Relational Databases • MongoDB • MongoDB and Perl

Problems with traditional RDBMS • Applications are evolving all the
time • Applications need new fields, new indexes • Users need to be able to alter their schemas without making their data unavailable • Replication is a solution for high read loads. Sooner or later, writing becomes a bottleneck • Sharding – partitioning a logical database across multiple database instances • Joins and aggregation become a problem • Distributed transactions are too slow for the web • Manual management of shards

• Why do we need them? • Type of non-relational
databases Non-Relational Databases

Non-Relational Data Models • A non-relational database‟s data model determines
the kinds of items it can contain and how they can be retrieved • What can the system store, and what does it know about what it contains? • The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition • What kind of queries can you do? • SQL is a manifestation of the kinds of queries that fall out of relational algebra

Types of Non-Relational Data Models • Key-value stores • Document
stores • Column-oriented databases • Graph databases

Key-Value Stores • A mapping from a key to a
value • The store doesn‟t know anything about the the key or value • The store doesn‟t know anything about the insides of the value • Operations • Set, get, or delete a key-value pair

Document Stores • The store is a container for documents
• Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores • Can create “secondary indexes” • These provide the ability to query on any document field(s) • Operations: • Insert and delete documents • Update fields within documents

Column-Oriented Stores • Like a relational store, but flipped around:
all data for a column is kept together • An index provides a means to get a column value for a record • Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop

Graph Databases • Stores vertex-to-vertex edges • Operations: • Getting
and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints

Consistency Models • Relational databases support transactions • Can only
see committed changes • Commit/abort span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Single vs Multi-Master

Single Master • All writes go to a single master
and then replicated • Replication can provide arbitrary read scalability • Subject to coping with read-consistency issues • Sooner or later, writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem

Single Master - Sharding • Paritition the primary key space
via hashing • Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system?

Multi-Master • Dynamo like solutions • Writes can occur to
any node • The same record can be updated on different nodes by different clients • All writes are replicated everywhere • Collisions can occur • Who wins? • A collision resolution strategy is required • Vector clocks • http://en.wikipedia.org/wiki/Vector_clock

No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model
Single Master Membase MongoDB Multi- Master/Dynamo Riak CouchDB Cassandra, HBase, Hypertable

Where MongoDB fits in the non-relational world MongoDB‟s architecture and
features Installing and running MongoDB

MongoDB is a Document Store • MongoDB stores JSON objects
as BSON • { LastName: „Flintstone‟, FirstName: „Fred‟, …} • Secondary Indexes • db.collection.ensureIndex({LastName : 1, FirstName : 1}); • Simple QBE-like query syntax • db.collection.find({LastName : „Flintstone‟}); • db.collection.find({LastName : { $gte : „Flintstone‟});

MongoDB vs Traditional RDBMS databases contain rows server contain tables
schema joins

MongoDB is a Single-Master System • A database is served
by members of a “replica set” • The system elects a primary (master) • Failure of the master is detected, and a new master is elected • Application writes get an error if there is no quorum to elect a new master • Reads continue to be fulfilled

MongoDB Storage Management • Data is kept in memory-mapped files
• Servers should have a lot of memory • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses

Release History • First release – February 2009 • v1.0
- August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets • V1.8 – March 2011 – Journaling, Covered/Sparse indexes, Geo sphere

MongoDB – Advanced Queries • Geo-spatial queries • Create a
geo index • Find points near a given point, sorted by radial distance • Can be planar or spherical • Find points within a certain radial distance, within a bounding box, or a polygon • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript

Scaling MongoDB • Replication • Read scalability • Master/Slave •
Replica Sets • Sharding • A collection can be sharded • Each shard is served by its own replica set • New shards (each a replica set) can be added at any time • Shard key ranges are automatically balanced

MongoDB – Sharded Deployment

MongoDB Access • Drivers are available in many languages •
10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools

V2.0 • Pretty soon • Concurrency • Faster data compaction
• Faster map/reduce • TTL collections • Geospatial polygons • Hash shard key • Index 2.0 (smaller+faster)

Future – a short list • Full text Search •
More concurrency • Online compaction • Internal compression • New aggregation framework Vote: http://jira.mongodb.org

MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:
AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers

MongoDB and Perl • MongoDB – Official 10gen module •
http://search.cpan.org/~kristina/MongoDB- 0.43/lib/MongoDB.pm • Tutorial - http://search.cpan.org/dist/MongoDB/lib/MongoD B/Tutorial.pod • Install MongoDB perl module • Make sure MongoDB is running • cpan –i inc::Module::Install • cpan –i MongoDB

@mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups
http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]

Mongodb and Perl - An Introduction

Mongodb and Perl - An Introduction

Sridhar Nanjundeswaran

More Decks by Sridhar Nanjundeswaran

Other Decks in Programming

Featured

Transcript

Sridhar Nanjundeswaran Software Engineer, 10gen © Copyright 2010 10gen Inc.

Overview • Non-Relational Databases • MongoDB • MongoDB and Perl

Problems with traditional RDBMS • Applications are evolving all the

• Why do we need them? • Type of non-relational

Non-Relational Data Models • A non-relational database‟s data model determines

Types of Non-Relational Data Models • Key-value stores • Document

Key-Value Stores • A mapping from a key to a

Document Stores • The store is a container for documents

Column-Oriented Stores • Like a relational store, but flipped around:

Graph Databases • Stores vertex-to-vertex edges • Operations: • Getting

Consistency Models • Relational databases support transactions • Can only

Single Master • All writes go to a single master

Single Master - Sharding • Paritition the primary key space

Multi-Master • Dynamo like solutions • Writes can occur to

No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model

Where MongoDB fits in the non-relational world MongoDB‟s architecture and

MongoDB is a Document Store • MongoDB stores JSON objects

MongoDB vs Traditional RDBMS databases contain rows server contain tables

MongoDB is a Single-Master System • A database is served

MongoDB Storage Management • Data is kept in memory-mapped files

Release History • First release – February 2009 • v1.0

MongoDB – Advanced Queries • Geo-spatial queries • Create a

Scaling MongoDB • Replication • Read scalability • Master/Slave •

MongoDB – Sharded Deployment

MongoDB Access • Drivers are available in many languages •

V2.0 • Pretty soon • Concurrency • Faster data compaction

Future – a short list • Full text Search •

MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:

© Copyright 2010 10gen Inc. try at try.mongodb.org

MongoDB and Perl • MongoDB – Official 10gen module •

@mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups