Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mongodb and Perl - An Introduction

Mongodb and Perl - An Introduction

Presented at SF Perl Meetup on 6/28/2011

Sridhar Nanjundeswaran

June 29, 2011
Tweet

More Decks by Sridhar Nanjundeswaran

Other Decks in Programming

Transcript

  1. Problems with traditional RDBMS • Applications are evolving all the

    time • Applications need new fields, new indexes • Users need to be able to alter their schemas without making their data unavailable • Replication is a solution for high read loads. Sooner or later, writing becomes a bottleneck • Sharding – partitioning a logical database across multiple database instances • Joins and aggregation become a problem • Distributed transactions are too slow for the web • Manual management of shards
  2. • Why do we need them? • Type of non-relational

    databases Non-Relational Databases
  3. Non-Relational Data Models • A non-relational database‟s data model determines

    the kinds of items it can contain and how they can be retrieved • What can the system store, and what does it know about what it contains? • The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition • What kind of queries can you do? • SQL is a manifestation of the kinds of queries that fall out of relational algebra
  4. Types of Non-Relational Data Models • Key-value stores • Document

    stores • Column-oriented databases • Graph databases
  5. Key-Value Stores • A mapping from a key to a

    value • The store doesn‟t know anything about the the key or value • The store doesn‟t know anything about the insides of the value • Operations • Set, get, or delete a key-value pair
  6. Document Stores • The store is a container for documents

    • Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores • Can create “secondary indexes” • These provide the ability to query on any document field(s) • Operations: • Insert and delete documents • Update fields within documents
  7. Column-Oriented Stores • Like a relational store, but flipped around:

    all data for a column is kept together • An index provides a means to get a column value for a record • Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop
  8. Graph Databases • Stores vertex-to-vertex edges • Operations: • Getting

    and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints
  9. Consistency Models • Relational databases support transactions • Can only

    see committed changes • Commit/abort span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Single vs Multi-Master
  10. Single Master • All writes go to a single master

    and then replicated • Replication can provide arbitrary read scalability • Subject to coping with read-consistency issues • Sooner or later, writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem
  11. Single Master - Sharding • Paritition the primary key space

    via hashing • Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system?
  12. Multi-Master • Dynamo like solutions • Writes can occur to

    any node • The same record can be updated on different nodes by different clients • All writes are replicated everywhere • Collisions can occur • Who wins? • A collision resolution strategy is required • Vector clocks • http://en.wikipedia.org/wiki/Vector_clock
  13. No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model

    Single Master Membase MongoDB Multi- Master/Dynamo Riak CouchDB Cassandra, HBase, Hypertable
  14. MongoDB is a Document Store • MongoDB stores JSON objects

    as BSON • { LastName: „Flintstone‟, FirstName: „Fred‟, …} • Secondary Indexes • db.collection.ensureIndex({LastName : 1, FirstName : 1}); • Simple QBE-like query syntax • db.collection.find({LastName : „Flintstone‟}); • db.collection.find({LastName : { $gte : „Flintstone‟});
  15. MongoDB is a Single-Master System • A database is served

    by members of a “replica set” • The system elects a primary (master) • Failure of the master is detected, and a new master is elected • Application writes get an error if there is no quorum to elect a new master • Reads continue to be fulfilled
  16. MongoDB Storage Management • Data is kept in memory-mapped files

    • Servers should have a lot of memory • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses
  17. Release History • First release – February 2009 • v1.0

    - August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets • V1.8 – March 2011 – Journaling, Covered/Sparse indexes, Geo sphere
  18. MongoDB – Advanced Queries • Geo-spatial queries • Create a

    geo index • Find points near a given point, sorted by radial distance • Can be planar or spherical • Find points within a certain radial distance, within a bounding box, or a polygon • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript
  19. Scaling MongoDB • Replication • Read scalability • Master/Slave •

    Replica Sets • Sharding • A collection can be sharded • Each shard is served by its own replica set • New shards (each a replica set) can be added at any time • Shard key ranges are automatically balanced
  20. MongoDB Access • Drivers are available in many languages •

    10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools
  21. V2.0 • Pretty soon • Concurrency • Faster data compaction

    • Faster map/reduce • TTL collections • Geospatial polygons • Hash shard key • Index 2.0 (smaller+faster)
  22. Future – a short list • Full text Search •

    More concurrency • Online compaction • Internal compression • New aggregation framework Vote: http://jira.mongodb.org
  23. MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:

    AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers
  24. MongoDB and Perl • MongoDB – Official 10gen module •

    http://search.cpan.org/~kristina/MongoDB- 0.43/lib/MongoDB.pm • Tutorial - http://search.cpan.org/dist/MongoDB/lib/MongoD B/Tutorial.pod • Install MongoDB perl module • Make sure MongoDB is running • cpan –i inc::Module::Install • cpan –i MongoDB
  25. @mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups

    http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]