Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perl Oasis - Introduction to MongoDB

Perl Oasis - Introduction to MongoDB

Brendan McAdams

January 14, 2012
Tweet

More Decks by Brendan McAdams

Other Decks in Programming

Transcript

  1. •Web Apps need lightweight, high performance, highly scalable data stores.

    •The cloud changes the expectations and performance profile of data stores. •Focus of the new "NoSQL" paradigm of lots of inexpensive cloud servers and horizontal scalability; slow disks and random failures not as huge an impact Data Access in the Right Context
  2. So We’ve Built an Application with a Database How do

    we integrate that database with our application’s object hierarchy?
  3. Stuffing an object graph into a relational model is like

    fitting a square peg into a round hole.
  4. Sure, we can use an ORM. But who are we

    really fooling? ... and who/what are we going to wake up next to in the morning?
  5. •Object Graphs are often orthogonal to relational models. •Lifetimes of

    blood, sweat, tears and late night plots of homicide surround ORMS. They simply defer the pain (and make debugging harder) Make Your Data Work for You
  6. •Object Graphs are often orthogonal to relational models. •Lifetimes of

    blood, sweat, tears and late night plots of homicide surround ORMS. They simply defer the pain (and make debugging harder) •… Is a generic query really the right answer for your app?) Make Your Data Work for You
  7. •Deter the penalty of "flattening" your Object Graph data to

    the data warehouse •Cache Complex joins, views, one-off long running queries, etc into a rich, complex representation •Access it incredibly quickly •Run complex queries, filters and slicing and dicing… Not just a dumb cache! JSON and Complex Documents fit the Frontend* Beautifully
  8. This is a SQL Model mysql> select * from book;

    +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec)
  9. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data
  10. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables
  11. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala”
  12. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ...
  13. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend
  14. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend • I don’t know about you, but I have better things to do with my time.
  15. The Same Data in MongoDB > db.books.find().forEach(printjson) { "_id" :

    ObjectId("4dfa6baa9c65dae09a4bbda3"), "title" : "The Demon-Haunted World: Science as a Candle in the Dark", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda4"), "title" : "Cosmos", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] }
  16. The Same Data in MongoDB (Part 2) { "_id" :

    ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] }
  17. Access to the embedded objects is integral > db.books.find({"author.first_name": "Martin",

    "author.last_name": "Odersky"}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] }
  18. As is manipulation of the embedded data > db.books.update({"author.first_name": "Bill",

    "author.last_name": "Venners"}, ... {$set: {"author.$.company": "Artima, Inc."}}) > db.books.update({"author.first_name": "Martin", "author.last_name": "Odersky"}, ... {$set: {"author.$.company": "Typesafe, Inc."}}) > db.books.findOne({"title": /Scala$/}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "author" : [ { "company" : "Typesafe, Inc.", "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "company" : "Artima, Inc.", "first_name" : "Bill", "last_name" : "Venners" } ], "title" : "Programming in Scala" }
  19. • MongoDB revolves around memory mapped files (200 gigs of

    MongoDB files creates 200 gigs of virtual memory) •OS controls what data in RAM •When a piece of data isn't found, a page fault occurs (Expensive versus Memory Fetch!) •OS goes to disk to fetch the data •Compare this to the normal trick of sticking a poorly managed memcached cluster in front of MySQL • Dirty Pages flushed to disk in batches; controlled by OS (but at least every 60 seconds; customise with -- syncdelay) Operating System map files on the Filesystem to Virtual Memory
  20. BSON Encoding { _id: ObjectId(XXXXXXXXXXXX), hello: “world”} \x27\x00\x00\x00\x07_id\x00 X X

    X X X X X X X X X X X X \x02 h e l l o \x00\x06\x00 \x00\x00 w o r l d \x00\x00 http://bsonspec.org
  21. • Optimization & Tuning • Schema & Index Design •

    O/S tuning • Hardware configuration • Vertical scaling • Hardware is expensive • Hard to scale in cloud How do you scale now? $$$ throughput
  22. Write scaling - add Shards write read shard1 node_c1 node_b1

    node_a1 shard2 node_c2 node_b2 node_a2
  23. Write scaling - add Shards write read shard1 node_c1 node_b1

    node_a1 shard2 node_c2 node_b2 node_a2 shard3 node_c3 node_b3 node_a3
  24. Getting Started # Let’s lay out some sane pragmas and

    the appropriate modules use Modern::Perl; use MongoDB;
  25. Getting Connected # Connect to default host & port (localhost,

    27017) my $conn = MongoDB::Connection->new; # or, a non-default connection my $conn = MongoDB::Connection->new(“host” => “drunkencamel:27018”); # Replica Sets can “find” the RS, or you can use a URI my $conn = MongoDB::Connection->new(“host” => “drunkencamel:27018”, “find_master” => 1); # URI Style my $conn = MongoDB::Connection->new(“host” => “mongodb://foo:27017,bar:27018”); # Getting a DB & Collection handle (Creation is implicit) my $db = $conn->tutorial; my $books = $db->bookstore;
  26. Creation of Documents # The driver for Perl maps MongoDB

    Documents as hash references $books->insert({“author” => “Brian P. Hogan”, “title” => “HTML5 and CSS3”, “tags” => [ qw/html5 css html css3 pragmatic/ ], “publicationYear” => 2010}); # An ID is automatically generated in this case
  27. Finding Existing Documents # Queries are documents - simple ones

    use key: value to match exactly my $brian_books = $books->find({“author” => “Brian P. Hogan”}); # If MongoDB gets a “scalar” match but finds an array value it # matches if any of the elements in the array match my $perl_books = $books->find({“tags” => “perl”}); # And of course, it wouldn’t be Perl if we couldn’t use Regex... my $programming_books = $books->find({“name” => qr/program*/i});
  28. More Advanced Queries # MongoDB has a query expression language

    with special “$” operators # To find all books published between 2008 and 2011 ... my $newer_books = $books->find({“publicationYear” => {‘$gte’ => 2008, ‘$lt’ => 2012}); # Some Perl concepts such as “exists” are possible server-side as well my $compiled_books = $books->find({“editor” => {‘$exists’ => true}}); # Negation simply uses the $ne operator my $not_brian_books = $books->find({“author” => {‘$ne’ => “Brian P. Hogan}); # The behavior of combined operators is “AND” but ... # There are some nice ways to check multiple values or expressions too my $mongo_books = $books->find({“tags” => “mongodb”, ‘$or’ => { {“publisher” => “O’Reilly”, “author” => “Kristina Chodorow & Mike Dirolf”}, {“publisher” => “Manning”, “author” => “Kyle Banker”}}}); my $web_monkey_books = $books->find({“tags” => {‘$in’ => [ qw/html css/ ]});
  29. Working with Documents # MongoDB uses cursors like a Standard

    Database to return result sets my $newer_books = $books->find({“publicationYear” => {‘$gte’ => 2008, ‘$lt’ => 2012}); # $newer_books is an instance of MongoDB::Cursor while (my $book = $newer_books->next) { say “Author: “ . $book->{‘author’} . “ Title: “ . $book->{‘title’}; } # If you like to waste memory you can get an array of hashrefs instead my @bigblobofbooks = $books->all;
  30. Updating Documents # Updates are a combination of a query

    and a “new” document # But we have several special operators to manipulate discreetly my $today = DateTime->now; my $tomorrow = DateTime->now->set('day' => $today->day+1); $users->update({"bday" => {'$gte' => $today, '$lte' => $tomorrow}}, {'$set' => {'gift' => $gift}}, {‘multiple' => 1}); # We also provide a number of operators to manipulate arrays # Add “perl” to tags on books that have “mongodb” as a tag $users->update({‘tags’ => ‘mongodb’}, {‘$push’ => ‘perl’}); # One downside here: $push doesn’t enforce uniqueness... which is where # $addToSet comes in, instead $users->update({‘tags’ => ‘mongodb’}, {‘$addToSet’ => ‘perl’});
  31. Being a bit Classier... package MyApp::Schema::Novel; use MongoDBx::Class::Moose; use namespace::autoclean;

    with 'MongoDBx::Class::Document'; has 'title' => (is => 'ro', isa => 'Str', required => 1, writer => 'set_title'); holds_one 'author' => (is => 'ro', isa => 'MyApp::Schema::PersonName', required => 1, writer => 'set_author'); has 'year' => (is => 'ro', isa => 'Int', predicate => 'has_year', writer => 'set_year'); has 'added' => (is => 'ro', isa => 'DateTime', traits => ['Parsed'], required => 1); holds_many 'tags' => (is => 'ro', isa => 'MyApp::Schema::Tag', predicate => 'has_tags'); joins_one 'synopsis' => (is => 'ro', isa => 'Synopsis', coll => 'synopsis', ref => 'novel'); has_many 'related_novels' => (is => 'ro', isa => 'Novel', predicate => 'has_related_novels', writer => 'set_related_novels', clearer => 'clear_related_novels'); joins_many 'reviews' => (is => 'ro', isa => 'Review', coll => 'reviews', ref => 'novel'); sub print_related_novels { my $self = shift; foreach my $other_novel ($self->related_novels) { print $other_novel->title, ', ', $other_novel->year, ', ', $other_novel->author->name, "\n"; } } around 'reviews' => sub { my ($orig, $self) = (shift, shift); my $cursor = $self->$orig; return $cursor->sort([ year => -1, title => 1, 'author.last_name' => 1 ]); }; __PACKAGE__->meta->make_immutable;
  32. @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter

    | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit)