An expanded version of my talk from ScalaDays 2011, this was delivered at Scalathon 2011 and discusses my experiences with learning Scala over 18 months by writing a MongoDB Driver.
Scala • 2009 “Learning Year” • Deeper Python, started to “get” Lambdas, FP • Learned C#; loved the “better Java” with lots of Pythonic functional stuff • Data Processing tools; Disco, Hadoop, OpenCL / CUDA, R, etc • New database technologies (NoSQL); Cassandra, Redis, CouchDB, Riak, MongoDB. • October 2009 ... • Put together NY NoSQL Conference (100+ ppl) • Job Imploded • New Job (Novus Partners), New to Scala • October 2010 ... • Joined 10gen • Fulltime MongoDB Developer, work on Hadoop integration, Casbah & general Scala support as significant portion of my job
Scala • Big Problems, New Tools needed • For much of it, Java wasn’t the answer • Scala brilliant tool for solving problems • Had read Wampler / Payne, not written code • Impulse Control Problem or Good Gut Feeling? • Akka huge part ... #legendofklang • Custom formulas, DSLs and other tools • Began fiddling with MongoDB tools for interstitial caching layer • Rose Toomey (@prasinous) took it all and ran with it • The ultimate result of this project is evinced in yesterday’s presentation by Basil Qunibi, Novus’ CEO
to” ‘mongo-scala-wrappers’ Is Born • Learned MongoDB from Python • Dynamic language with flexible syntax; Dynamic database with flexible schemas • Tooling for MongoDB + Scala was limited or unsuited. Mostly focused on ODM. None of what I loved about Scala or MongoDB possible together. • Java Driver ... No Scala sugar or tricks • scamongo (pre-lift): ODM (ORMey) or JSON tools • mongo-scala-driver: A little syntactic sugar but mostly ODM; didn’t “get” it
absolutely nothing wrong with that Syntax... For Java. • Scala is expressive, fluid and beautiful; so is (IMHO) MongoDB. • My goal: Teach Scala to be as close to Python / Mongo Shell as possible • Self Imposed Limitation: Don’t reinvent the wheel. Java Driver’s Network Layers, BSON Encoding, etc work great. • Just add Syntactic Sugar!
expected" in { val nor = $nor { "foo" $gte 15 $lt 35.2 $ne 16 } nor.getAs[MongoDBList]("$nor") must have size (1) nor.as[MongoDBList]("$nor").getAs[DBObject](0) must haveSomeEntries("foo.$gte" -> 15, "foo.$lt" -> 35.2, "foo.$ne" -> 16) } "Work with multiples" in { val nor = $nor { ("foo" $gte 15 $lt 35 $ne 16) + ("x" -> "y") } nor.getAs[MongoDBList]("$nor") must have size (1) nor.as[MongoDBList]("$nor").getAs[DBObject](0) must haveSomeEntries("foo.$gte" -> 15, "foo.$lt" -> 35, "foo.$ne" -> 16, "x" -> "y") } } "Cursor Operations" should { import scala.util.Random val db = MongoConnection()("casbahTest") val coll = db("test_coll_%d".format(System.currentTimeMillis)) for (i <- 1 to 100) coll += MongoDBObject("foo" -> "bar", "x" -> Random.nextDouble()) val first5 = coll.find(MongoDBObject("foo" -> "bar")) limit 5 "Behave in chains" in { "Chain operations must return the proper *subtype*" in { val cur = coll.find(MongoDBObject("foo" -> "bar")) skip 5 cur must haveClass[MongoCursor] val cur2 = coll.find(MongoDBObject("foo" -> "bar")) limit 25 skip 12 cur2 must haveClass[MongoCursor] } } }
Feb. 12, 2010: Initial Open Source Release (0.1) No Tests. - Initial import Compiles, reflects the working code currently in Novus Trunk but does not have full documentation, or tests yet. NOT FOR PUBLIC CONSUMPTION - USE AT YOUR OWN RISK. * Release 0.1 - May or may not blow your system up... - Updated headers, scaladoc/javadoc documentation, etc. - Next step: Written docs with examples, test classes • July 17, 2010: Release 1.0. • New collaborator/contributor Max Afonov (@max4f) • January 03, 2011: Release 2.0. • Refactoring & Stupidity cleanups. •Today - Solid, stable, robust & used in several large organizations for production code.
2 * Hacky mildly absurd method for converting a <code>Product</code> (Example being any <code>Tuple</ code>) to 3 * a Mongo <code>DBObject</code> on the fly to minimize spaghetti code from long builds of Maps or DBObjects. 4 * 5 * Intended to facilitate fluid code but may be dangerous. 6 * _ * SNIP 17 */ 18 implicit def productToMongoDBObject(p: Product): DBObject = { 19 val builder = BasicDBObjectBuilder.start 20 val arityRange = 0.until(p.productArity) 21 //println("Converting Product P %s with an Arity range of %s to a MongoDB Object".format(p, arityRange)) 22 for (i <- arityRange) { 23 val x = p.productElement(i) 24 //println("\tI: %s X: %s".format(i, x)) 25 if (x.isInstanceOf[Tuple2[_,_]]) { 26 val t = x.asInstanceOf[Tuple2[String, Any]] 27 //println("\t\tT: %s".format(t)) 28 builder.add(t._1, t._2) 29 } else if (p.productArity == 2 && p.productElement(0).isInstanceOf[String]) { 30 // backup plan if it's a one entry tuple, the outer wrapper gets stripped 31 val t = p.asInstanceOf[Tuple2[String, Any]] 32 builder.add(t._1, t._2) 33 return builder.get 34 } else { 35 throw new IllegalArgumentException("Products to convert to DBObject must contain Tuple2's.") 36 } 37 } 38 builder.get 39 } 40
Classes, Abstract & Parameterized Types (Scala’s variant, esp. with Covariance/Contravariance annotation), Structural (aka “sort of a duck” typing) are incredible • For third party libraries though... • “Am I helping my users or hurting them?” • “Have I accounted for all the use cases?” • “Do I have any idea what the f$%k I’m doing?”
...then again, I’m no carpenter • Much of what I love about Scala are often compile time checks and don’t keep you from misunderstanding things, hurting your users or just plain screwing up. • Fun with Type Inference aka “Oops, I screwed the explicit annotators” • Know and understand the “fancy” features, but also know when to use them. • Know when not to use them, and when one choice is better than another. “The difference between a junior and a senior programmer is that the senior knows when not to write code.”
safety /** * I had used Type classes elsewhere, but when I posted the preceding * manifest code as an example of cool stuff to show @ ScalaDays, * Jon-Anders Teigen (@jteigen) sent me a gist with a better way. * Type Classes for this! */ def $type[A](implicit bsonType: BSONType[A]) = op(oper, bsonType.operator) /** * Thats now it for the $type support, it uses a few type class definitions as * well to match the BSON types. */ implicit object BSONDouble extends BSONType[Double](BSON.NUMBER) implicit object BSONString extends BSONType[String](BSON.STRING) implicit object BSONObject extends BSONType[BSONObject](BSON.OBJECT) implicit object DBObject extends BSONType[DBObject](BSON.OBJECT) implicit object DBList extends BSONType[BasicDBList](BSON.ARRAY) implicit object BSONDBList extends BSONType[BasicBSONList](BSON.ARRAY) implicit object BSONBinary extends BSONType[Array[Byte]](BSON.BINARY) implicit object BSONObjectId extends BSONType[ObjectId](BSON.OID) implicit object BSONBoolean extends BSONType[Boolean](BSON.BOOLEAN) implicit object BSONJDKDate extends BSONType[java.util.Date](BSON.DATE) implicit object BSONJodaDateTime extends BSONType[org.joda.time.DateTime](BSON.DATE) implicit object BSONNull extends BSONType[Option[Nothing]](BSON.NULL) implicit object BSONRegex extends BSONType[Regex](BSON.REGEX) implicit object BSONSymbol extends BSONType[Symbol](BSON.SYMBOL) implicit object BSON32BitInt extends BSONType[Int](BSON.NUMBER_INT) implicit object BSON64BitInt extends BSONType[Long](BSON.NUMBER_LONG) implicit object BSONSQLTimestamp extends BSONType[java.sql.Timestamp](BSON.TIMESTAMP)
you skip over certain implicit arguments: def $type[A: BSONType: Manifest] • Is the equivalent of coding: def $type[A](implicit evidence$1: BSONType[A],implicit evidence$2: Manifest[A])
def insert[A <% DBObject](docs: Traversable[A], writeConcern: WriteConcern) = { val b = new scala.collection.mutable.ArrayBuilder.ofRef[DBObject] b.sizeHint(docs.size) for (x <- docs) b += x underlying.insert(b.result, writeConcern) }
def insert[A <% DBObject](docs: Traversable[A], writeConcern: WriteConcern) = { val b = new scala.collection.mutable.ArrayBuilder.ofRef[DBObject] b.sizeHint(docs.size) for (x <- docs) b += x underlying.insert(b.result, writeConcern) } • Answer: This is a “View Boundary” • Code “flattens” at compile time to something like this: def insert[A](docs: Traversable[A], writeConcern: WriteConcern)(implicit ev: A => DBObject) = { val b = new scala.collection.mutable.ArrayBuilder.ofRef[DBObject] b.sizeHint(docs.size) for (x <- docs) b += ev(x) underlying.insert(b.result, writeConcern) }
01, 2010: The death of “mongo-scala-wrappers” - Changed package to com.novus.casbah.mongodb - Rolled Scala version to 2.8.0.RC3 and SBT version to 0.7.4 - Updated dependency libraries as appropriate for 2.8rc3 - Cleaned up package declarations in code - Rolled module version to 1.0-SNAPSHOT as the next release goal is to be complete at a 1.0 • “Casbah” inspired mostly randomly from The Clash • 1.1 Began on a mission of modularisation and functionality expansion • “casbah-mapper” borne unto git, never released mainstream and ultimately reimagined as “Salat” (Russian word салат for “salad”) • salat-avro (@rubbish from T8Webware) • @coda’s “Jerkson” project using some of the ScalaSig code utils from Salat
It Palatable” • There’s a difference between “Fixing bugs in production” and “Shipping libraries to users” • Eating my own dog food was great, but in many ways it made me complacent • In many cases I initially only implemented MongoDB features I was using... • ... In others, only the way I was using them.
It Palatable” • 15 years as a developer taught me this: “Tests seem like a really good idea... I’m tired of fixing my broken crap in production” for (i <- 1 to ∞) println(“Tests. Matter.”)
It Palatable” • 15 years of reality tempered “nice to have” with “shutup and code, monkey”: “I wish I had time to actually write tests and learn to write good tests” <Boss> “Just put it in production and fix it later, we don’t have time to wait” • Let’s face it: This isn’t an excuse but in many cases, reality. Ship code or flip burgers.
It Palatable” • If you plan to ship code to users, “eating your own dog food” is NEVER ENOUGH* • Take the time to learn how to write good tests and GOOD DATA • I am head over heels in love with the tools in Scala • ScalaTest (I don’t use anymore but still amazing) • Specs / Specs 2: Alien Technology for breaking my code • ScalaCheck - Haven’t learned it yet, but does fuzzing, etc • Differentiate between integration tests and unit tests • But *use* integration tests with “conditional skips”, and WRITE THEM. * Assuming of course you care about code quality and/or your users
It Palatable” • Some of why I didn’t test Casbah as well early on is I couldn’t easily test the values as MongoDB saw them. • With moving to Specs2, it was much more strict and I was inspired to write custom matchers to do the job; provided for users too! (Tests all the way down...) • Tests are much cleaner and I feel more confident about them; able to achieve higher coverage • Higher coverage definitively relates to less bugs users find in their production apps
contains a pair (key, value) == (k, v) * Will expand out dot notation for matching. **/ def haveSomeEntry[V](p: (String, V)) = new Matcher[Option[DBObject]] { def apply[S <: Option[DBObject]](map: Expectable[S]) = { result(someField(map, p._1).exists(_ == p._2), // match only the value map.description + " has the pair " + p, map.description + " doesn't have the pair " + p, map) } } /** Special version of "HaveEntry" that expects a list and then uses * "hasSameElements" on it. */ def haveListEntry(k: String, l: => Traversable[Any]) = new Matcher[DBObject] { def apply[S <: DBObject](map: Expectable[S]) = { val objL = listField(map, k).getOrElse(Seq.empty[Any]).toSeq val _l = l.toSeq result(objL.sameElements(_l), // match only the value map.description + " has the pair " + k, map.description + " doesn't have the pair " + k, map) } } /** matches if map contains a pair (key, value) == (k, v) * Will expand out dot notation for matching. **/ def haveEntry[V](p: (String, V)) = new Matcher[DBObject] { def apply[S <: DBObject](map: Expectable[S]) = { result(field(map, p._1).exists(_.equals(p._2)), // match only the value map.description + " has the pair " + p, map.description + "[" + field(map, p._1) + "] doesn't have the pair " + p + "[" + p._2 + "]", map) } } /** matches if Some(map) contains all the specified pairs * can expand dot notation to match specific sub-keys */ def haveSomeEntries[V](pairs: (String, V)*) = new Matcher[Option[DBObject]] { def apply[S <: Option[DBObject]](map: Expectable[S]) = { result(pairs.forall(pair => someField(map, pair._1).exists(_ == pair._2) /* match only the value */ ), map.description + " has the pairs " + pairs.mkString(", "), map.description + " doesn't have the pairs " + pairs.mkString(", "), map) } }
but it also has a younger brother/cousin • “Hammersmith”, purely asynchronous, purely Scala and a distillation of ~2 years of MongoDB knowledge • Only Java is the BSON serialization; still no excuse for reinventing the wheel • Netty for now, but probably will end up as pure NIO • NOT (contrary to popular panic/confusion) a replacement for Casbah • Focused more on framework support than userspace • Will likely offer optional synchronous and asynchronous hammersmith module for casbah-core, with Java driver as casbah- core-classic • Working on sharing as much code as possible between Hammersmith & Casbah for MongoDBObject, etc. • Porting casbah-query to target Hammersmith (as well as Lift)
you want to be serialized or deserialized */ trait SerializableBSONObject[T] { def encode(doc: T, out: OutputBuffer) def encode(doc: T): Array[Byte] def decode(in: InputStream): T def decode(bytes: Seq[Array[Byte]]): Seq[T] = for (b <- bytes) yield decode(b) def decode(b: Array[Byte]): T = decode(new ByteArrayInputStream(b)) /** * These methods are used to validate documents in certain cases. * They will be invoked by the system at the appropriate times and you must * implement them in a manner appropriate for your object to ensure proper mongo saving. */ def checkObject(doc: T, isQuery: Boolean = false): Unit def checkKeys(doc: T): Unit /** * Checks for an ID and generates one, returning a new doc with the id. * The new doc may be a mutation of the old doc, OR a new object * if the old doc was immutable. */ def checkID(doc: T): T def _id(doc: T): Option[AnyRef] }