Upgrade to Pro — share decks privately, control downloads, hide ads and more …

In The Brain of Brendan McAdams

In The Brain of Brendan McAdams

MongoDB, Scala and Java and deep in my brain on ODM Decoding

Brendan McAdams

November 24, 2011
Tweet

More Decks by Brendan McAdams

Other Decks in Technology

Transcript

  1. Crowbar + Brain == Ow? • Phineas had a simple

    job as a rock blasting foreman • After a coworker bored a hole ... • Insert Blasting Powder • Insert Fuse • Insert Sand • Compact charge with a tamping iron (big iron rod) • Unfortunately, Phineas failed to follow instructions ... Thursday, November 24, 11
  2. Crowbar + Brain == Ow? • Records are hazy on

    why, but it seems poor Phineas forgot the sand • When he compressed the unprotected charge it lit the powder... the powder exploded, carrying an instrument through his head an inch and a fourth in [diameter], and three feet and [seven] inches in length, which he was using at the time. The iron entered on the side of his face...passing back of the left eye, and out at the top of the head. Thursday, November 24, 11
  3. Sometimes, Reading the Instructions Is a good Thing • Medical

    reports (including a series of recent studies in 1994 and 2004) differ on whether he pierced one or both hemispheres of the brain; but it is agreed that there was significant damage to the frontal lobe • Amazingly, he made a full recovery and was walking within a month • His friends however felt he was no longer himself, and the study of his mental changes went on to influence the medical model of the brain • Phineas kept what he called “my iron rod” and made it his “constant companion during the remainder of his life” Thursday, November 24, 11
  4. Er, OK? • Sometimes the way we do things is

    for a reason beyond tradition (aka “We’ve always done it that way”) • Before you decide to break with tradition, do the research • Mistakes happen, but process can save you • Changing your mind is always a good thing. Just don’t do it with an iron rod through the skull. Thursday, November 24, 11
  5. BSON  Date   (Bytes) JVM  Long JDK  Date DBObject “book”

    JODA  DateTime “Publication  Date” JVM  Object Instance  of  “Book” Scala  List “Authors” BSON  Array   (Bytes) Java  Array DBList JVM  Object Instance  of  “Author” DBObject Keys  &  Values BSON  Object   (Bytes) JVM’s BSON Decoding Stack Thursday, November 24, 11
  6. JVM  Object Instance  of  “Author” Keys  &  Values BSON  Object

      (Bytes) DBObject Cure your Painful Stack Bloat in 3 Simple Steps Thursday, November 24, 11
  7. Let’s Look at Decoding Fields BSON  Date   (Bytes) JVM

     Long JDK  Date JODA  DateTime “Publication  Date” The existing system uses global transformers *after* the core JVM type is instantiated. Thursday, November 24, 11
  8. Let’s Look at Decoding Fields The global transformers also mean

    *all* dates must now be converted to JODA DateTime... BSON  Date   (Bytes) JVM  Long JDK  Date JODA  DateTime “Publication  Date” Thursday, November 24, 11
  9. Let’s Look at Decoding Fields This adds a lot of

    bloat to our stack size & performance. Think about how bad embedded Arrays & Objects could get with inner data... BSON  Date   (Bytes) JVM  Long JDK  Date JODA  DateTime “Publication  Date” Thursday, November 24, 11
  10. Let’s Look at Decoding Fields Worse, how do we safely

    and sanely match/ map each field to its slot on an ODM’s JVM Object, enforce constraints etc? BSON  Date   (Bytes) JVM  Long JDK  Date JODA  DateTime “Publication  Date” Thursday, November 24, 11
  11. Encoding can be even worse.... BSON  Date   (Bytes) JVM

     Long JDK  Date JODA  DateTime “Publication  Date” DBObject “book” JVM  Object Instance  of  “Book” Thursday, November 24, 11
  12. Encoding can be even worse.... BSON  Date   (Bytes) JVM

     Long JDK  Date JODA  DateTime “Publication  Date” DBObject “book” JVM  Object Instance  of  “Book” • A number of further problems are presented for the serialization back • Each field in the object must be mappable from its JVM type to a BSON Type • There may be numerous inefficient middle conversions such as List to Array to BSON Array... • In addition to object stack bloat, potential safety of a loose, mutable DBObject container is questioned • The validation/detection of malformed documents happens quite late in the workload Thursday, November 24, 11
  13. There is a cure in sight! BSON  Date   (Bytes)

    JVM  Long JDK  Date JODA  DateTime “Publication  Date” DBObject “book” JVM  Object Instance  of  “Book” • ORMs and ODMs are great rapid prototyping tools, but can limit scalability & growth •ODM users may start finding slow conversions & stack bloat as they get more and more data in their system • I’ve been pondering this problem set for about 18 months, and working on a solution for the last few weeks • My goal? Eradicate this issue and boost performance for Lift-MongoDB-Record & Salat (Morphia & Spring Data evaluated later) • I call my new system “BSON Primitives” Thursday, November 24, 11
  14. BSON Primitives: A New Sanity? • Every solution and idea

    I had over the last year or so ended up having either a marginal improvement, or somehow requiring the ODM author to write raw BSON • Neither of these are successes as having an ODM author have to implement their own BSON ser/deser routines is the definition of sanity • While chasing a more narrow problem in the custom serializer (to fix the globals issue) I stumbled on something Thursday, November 24, 11
  15. BSON Primitives: A New Sanity? BSON  Date   (Bytes) JVM

     Long Every BSON type has a simple set of “Primitive” values in between ‘Bytes’ and ‘JVM Instance’ Thursday, November 24, 11
  16. Back to Basics • There is a fairly small set

    of BSON types, with each having a well defined primitive value for the bytestream • Why not introduce a customizable, type parameterized container which simply requires the user implement methods converting from their type to the primitive and back? • The container can internalize knowledge of how to convert between that primitive and a BSON byte stream • Custom type implementations then simply become a chain of instances of BSON Primitive Containers Thursday, November 24, 11
  17. Flexible Sanity • Encoding an object is really just providing

    a Map-like structure of (Field, BSON Container) • For decoding, two types of custom Decoder systems needed Thursday, November 24, 11
  18. Custom Type Decoding • “Type” based, for registering custom type

    decoders, replacing the previous global system for userspace work • One slot per BSON type, all instances of that type run through defined deserializer (e.g. BSON Date -> JODA DateTime) Thursday, November 24, 11
  19. A Clear Set of Instructions • Field Based, for ODM

    systems like Salat and Lift-Record • Define object fields, what type (or types!) is valid for each • Register custom validators to reject invalid docs as early as possible Thursday, November 24, 11
  20. Containing My Madness • Finalizing the interfaces and logic now;

    the first version of this will be in Casbah 3.0 • Working on integrating into Salat & Lift-MongoDB- Record once the code is final • The userspace type based decoding system should fix a number of known issues and allow more flexible per-collection handling of types Thursday, November 24, 11
  21. Containing My Madness • To implement a custom BSON Primitive

    for say, JODA DateTime, extend the correct base trait for the BSON Type and add code to convert between “native” and “primitive” values • Code a bit in flight as I test integrations, benchmarks and do sanity checks; subject to change • This code reflects a more “type class” like approach, but it’s changing to be a concrete container based on testing Thursday, November 24, 11
  22. Custom BSON Date Handling trait JodaDateTimePrimitive extends BSONDatePrimitive[DateTime] { def

    nativeValue(primitive: Long): DateTime = { new DateTime(primitive) } def primitiveValue(native: DateTime): Long = { native.getMillis } } trait JDKDatePrimitive extends BSONDatePrimitive[java.util.Date] { def nativeValue(primitive: Long): java.util.Date = { new java.util.Date(primitive) } def primitiveValue(native: java.util.Date): Long = { native.getTime } } Thursday, November 24, 11
  23. Almost Ready ... • Finalizing the interfaces and logic now;

    the first version of this will be in Casbah 3.0 • Working on integrating into Salat & Lift-MongoDB-Record once the code is final • The userspace type based decoding system should fix a number of known issues and allow more flexible per-collection handling of types • Looking to hack out the rest and finish a cycle of heavy testing (aka “try as hard as I can to break it”) and integration work with other frameworks & users of Casbah 3.0 in December, release around Christmas • I plan to explore a Java version next for Morphia, Spring Data, etc. Thursday, November 24, 11
  24. Some Solid “non-web” Use Patterns • Event / Pipeline Processing

    • Logging • Graylog2 • Flume Sink • Durable Messaging • Broadcast Messaging • Pub/Sub Thursday, November 24, 11
  25. Today, Let’s Discuss Messaging • Messaging is something that, with

    MongoDB, you “roll your own” • But there are great builtin facilities to make it easier for you to do it • Capped Collections • Tailable cursors • findAndModify Thursday, November 24, 11
  26. WTF Is A Capped Collection? • Special size bounded MongoDB

    collection designed for Replication • Created specially with a number of bytes it may hold • No _id index • Documents are maintained in insertion order • No deletes allowed • Updates only allowed if document won’t “grow” • As collection fills up, oldest entries “fall out” • Allow for a special cursor type: Tailable Cursors Thursday, November 24, 11
  27. tail -f `mongo ‘db.data.find()’` • Tailable Cursors mode are a

    special cursor mode in MongoDB • Similar to Unix’ ‘tail -f’, maintain a pointer to the last document seen; continue moving forward as new documents added • With “Await” cursor mode, can poll until new documents arrive • Incredibly efficient for non-indexed queries Thursday, November 24, 11
  28. Broadcast Messaging Made Easy • This mechanism allows for very

    easy broadcast messaging • ... In fact, it is exactly how MongoDB does replication • Because you can’t delete messages this wouldn’t be ideal for pub/sub • But could be paired carefully with findAndModify Thursday, November 24, 11
  29. Pub/Sub and findAndModify • Compare and Swap / ABA Problems

    can be tricky • AKA “Distributed Locking is Hard - Let’s Go Shopping!” • MongoDB’s update doesn’t allow you to fetch the exact document(s) changed • The findAndModify command enables a proper mechanism • Find and modify first matching document and return new doc or old one • Find and remove first matching document and return the pre-removed document • Isolated; two competing threads won’t get the same document Thursday, November 24, 11
  30. Lots of Ideas to be Explored • Akka (Scala distributed

    computing & Actor framework) now includes a MongoDB based durable mailbox, using these concepts for unbounded (soon: bounded) messaging • 10gen’s MMS monitoring service uses findAndModify to facilitate worker queues Thursday, November 24, 11
  31. Example of Akka Durable 15 class MongoBasedMailboxSpec extends DurableMailboxSpec("mongodb", MongoNaiveDurableMailboxStorage)

    { 16 import org.apache.log4j.{ Logger, Level } 17 import com.mongodb.async._ 18 19 val mongo = MongoConnection("localhost", 27017)("akka") 20 21 mongo.dropDatabase() { success 㱺 } 22 23 Logger.getRootLogger.setLevel(Level.DEBUG) 24 } 25 26 object DurableMongoMailboxSpecActorFactory { 27 28 class MongoMailboxTestActor extends Actor { 29 self.lifeCycle = Temporary 30 def receive = { 31 case "sum" => self.reply("sum") 32 } 33 } 34 35 def createMongoMailboxTestActor(id: String)(implicit dispatcher: MessageDispatcher): ActorRef = { 36 val queueActor = localActorOf[MongoMailboxTestActor] 37 queueActor.dispatcher = dispatcher 38 queueActor.start 39 } 40 } 41 Thursday, November 24, 11
  32. Example of Akka Durable 42 class MongoBasedMailboxSpec extends WordSpec with

    MustMatchers with BeforeAndAfterEach with BeforeAndAfterAll { 43 import DurableMongoMailboxSpecActorFactory._ 44 45 implicit val dispatcher = DurableDispatcher("mongodb", MongoNaiveDurableMailboxStorage, 1) 46 47 "A MongoDB based naive mailbox backed actor" should { 48 "should handle reply to ! for 1 message" in { 49 val latch = new CountDownLatch(1) 50 val queueActor = createMongoMailboxTestActor("mongoDB Backend should handle Reply to !") 51 val sender = localActorOf(new Actor { def receive = { case "sum" => latch.countDown } }).start 52 53 queueActor.!("sum")(Some(sender)) 54 latch.await(10, TimeUnit.SECONDS) must be (true) 55 } 56 57 "should handle reply to ! for multiple messages" in { 58 val latch = new CountDownLatch(5) 59 val queueActor = createMongoMailboxTestActor("mongoDB Backend should handle reply to !") 60 val sender = localActorOf( new Actor { def receive = { case "sum" => latch.countDown } } ).start 61 Thursday, November 24, 11
  33. Example of Akka Durable 62 queueActor.!("sum")(Some(sender)) 63 queueActor.!("sum")(Some(sender)) 64 queueActor.!("sum")(Some(sender))

    65 queueActor.!("sum")(Some(sender)) 66 queueActor.!("sum")(Some(sender)) 67 latch.await(10, TimeUnit.SECONDS) must be (true) 68 } 69 } 70 71 override def beforeEach() { 72 registry.local.shutdownAll 73 } 74 } Thursday, November 24, 11
  34. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongoS   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit) Thursday, November 24, 11