Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Clojure Monger

αλεx π
October 19, 2012

Clojure Monger

αλεx π

October 19, 2012
Tweet

More Decks by αλεx π

Other Decks in Programming

Transcript

  1. this talk is a nothing I tell you can’t be

    found on interwebz Friday, October 19, 12
  2. Which Databas do you use? (no, this is not a

    typo) Friday, October 19, 12
  3. DB db = new Mongo( "127.0.0.1" ); DBCollection = db.getCollection("test");

    DBObject dbobj = BasicDBObjectBuilder.start() .add("key1","value1") .add("key2","value2").get(); c.insert(dbobj); Meanwhile in world of Java, they insert documents Friday, October 19, 12
  4. Pattern pattern = Pattern.compile("\\w*"); new BasicDBObject("users", new BasicDBObject("$not", new BasicDBObject("$gt",

    2))) .append("language", "Java"); DBCursor cursor = collection.find(query); Meanwhile in world of Java, they query collections Friday, October 19, 12
  5. coll.update(new BasicDBObject(), BasicDBObjectBuilder.start() .push( "$set" ) .add( "nested.attribute" , 2

    ) .get()); Meanwhile in world of Java, they use atomic modifiers Friday, October 19, 12
  6. List<DBObject> groupOperations = new ArrayList<DBObject>(); groupOperations.add(new BasicDBObject()); groupOperations.add(new BasicDBObject("countPerName", new

    BasicDBObject( "$sum", 1 ))); DBObject group = new BasicDBObject( ); group.put("_id", "$name" ); group.put( "docsPerName", new BasicDBObject( "$sum", 1 )); group.put( "countPerName", new BasicDBObject( "$sum", "$count" )); AggregationOutput out = c.aggregate(new BasicDBObject( "$project", projFields ), new BasicDBObject( "$group", group)); Meanwhile in world of Java, they use atomic modifiers Friday, October 19, 12
  7. public class WordCount { private static final Log log =

    LogFactory.getLog( WordCount.class ); public static class TokenizerMapper extends Mapper<Object, BSONObject, Text, IntWritable> { private final static IntWritable one = new IntWritable( 1 ); private final Text word = new Text(); public void map( Object key, BSONObject value, Context context ) throws IOException, InterruptedException{ final StringTokenizer itr = new StringTokenizer( value.get( "x" ).toString() ); while ( itr.hasMoreTokens() ){ word.set( itr.nextToken() ); context.write( word, one ); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private final IntWritable result = new IntWritable(); public void reduce( Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException{ int sum = 0; for ( final IntWritable val : values ){ sum += val.get(); } result.set( sum ); context.write( key, result ); } } Meanwhile in world of Java, they even use hadoop (spooky!). public static void main( String[] args ) throws Exception{ final Configuration conf = new Configuration(); MongoConfigUtil.setInputURI( conf, "mongodb://localhost/test.in" ); MongoConfigUtil.setOutputURI( conf, "mongodb://localhost/test.out" ); System.out.println( "Conf: " + conf ); final Job job = new Job( conf, "word count" ); job.setJarByClass( WordCount.class ); job.setMapperClass( TokenizerMapper.class ); job.setCombinerClass( IntSumReducer.class ); job.setReducerClass( IntSumReducer.class ); job.setOutputKeyClass( Text.class ); job.setOutputValueClass( IntWritable.class ); job.setInputFormatClass( MongoInputFormat.class ); job.setOutputFormatClass( MongoOutputFormat.class ); System.exit( job.waitForCompletion( true ) ? 0 : 1 ); } } Friday, October 19, 12
  8. Clojurefunctional immutable persistent DS Java interop lazy evaluation built for

    concurrency homoiconic dynamic Friday, October 19, 12
  9. first-class functions (def hello (fn [] "Hello world")) just in

    case someone asks ;) Friday, October 19, 12
  10. fn-composition (def not-zero? (comp not zero?)) (not-zero? 1) => true

    (def not-zero? (complement zero?)) (not-zero? 1) => true just in case someone asks ;) Friday, October 19, 12
  11. • GC (swappable, performant, tunable) • Bytecode + JIT compilation

    • Powerful standard library • Ecosystem (jmap, jstack, jstat, VisualVM) • Multilingual • Performant (esp. for long-running tasks) Why JVM? Friday, October 19, 12
  12. • doesn’t have anything to do with disk persistence •

    efficient creation of “modified” versions • structural sharing • inherently thread & iteration-safe • immutable (modification yields new coll) • composite Persistent DS?.. Friday, October 19, 12
  13. • reference, not value • refs / agents / atoms

    / vars • transactional memory access yes, with retries • no user locks, no deadlocks • mutation through pure function • coordinated • readers can read value at any point in time STM?.. Friday, October 19, 12
  14. a world, where you don’t have to select * from

    table; Friday, October 19, 12
  15. where you can mutate records without reading them (don’t tell

    me SQL handles that, as it covers just a small part) Friday, October 19, 12
  16. • Good set of primitives on databas level • Ordering

    data • Limit/offset/pagination • Writing complex queries • Atomic Modifiers • Updating set of records by complex query • Batch inserts (available everywhere for years) (also, indexing, flexible schema, ops-friendliness, maybe native RRDB impl won’t hurt) Our databas requirements Friday, October 19, 12
  17. • Strings • Numeric types • Date • Binary data

    • Embedded records Primitives Friday, October 19, 12
  18. Primitives, hash { :key1 "value1" :key2 "value2" } { key1:

    "value1", key2: "value2" } Clojure Mongo Friday, October 19, 12
  19. Primitives, nested ds { :key1 { :key2 "value1" } :key3

    [ "value2" "value3" "value4" ] } { key1: { key2: "value1" }, key3: [ "value2", "value3", "value4" ] } Clojure Mongo Friday, October 19, 12
  20. it’s all cool but how do I do all that

    fancy stuff with MongoDB from Clojure? Friday, October 19, 12
  21. Monger MongoDB client for a more civilized age: friendly, flexible

    and with batteries included https://github.com/michaelklishin/monger http://clojuremongodb.info/ Friday, October 19, 12
  22. • Idiomatic Clojure driver • Uses MongoDB Java driver underneath

    • powerful expressive query DSL • support for MongoDB 2.0+ features • has next to no performance overhead • well maintained • well documented Monger Friday, October 19, 12
  23. Exact match Querying (monger.collection/find-maps "books" { :author "Joe Armstrong" })

    db.books.find({"author": "Joe Armstrong"}) { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12
  24. Querying Nested attribute match (monger.collection/find-maps "books" { :price.currency "USD" })

    db.books.find({"price.currency": "USD"}) { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12
  25. Querying Nested array match (monger.collection/find-maps collection {:tags "driver"}) db.books.find({"tags": "multicore"})

    { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12
  26. Querying Complex queries (monger.collection/find-maps collection {:price.discount {$lt 25.00}}) db.books.find({"price.discount": {$lt:

    25.00}}) { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12
  27. Querying Operators Comparison ;; Return all rows where :users field

    is greater than 10 (mgcol/find collection {:users {$gt 10}}) •$gt "greater than" comparator •$gte "greater than or equals" comparator •$gt "less than" comparator •$lte "less than or equals" comparator •$all matches all values in the array Friday, October 19, 12
  28. Querying Operators Set Matching ;; Return all rows where tags

    field matches all elements ;; of the given array (mgcol/find-maps collection {:tags {$all [ "functional" "object-oriented" ]}}) • $in analogous to the SQL IN modifier • $nin “not in set” Friday, October 19, 12
  29. Querying Logical Operators $and $or $nor (mgcol/find collection {$and [{:language

    "Clojure"} {:users {$gt 10}}]}) Friday, October 19, 12
  30. Querying Atomic Modifiers $inc ;; increments one or many fields

    for the given value (monger.collection/update "scores" { :_id user-id } { :score 10 } }) $set ;; sets field (or set of fields) to value (monger.collection/update "things" { :_id oid } { $set { :weight 20.5 } }) $unset ;; $unset deletes a given field (monger.collection/update "things" { :_id oid } { $unset { :weight 1 } }) $rename ;; renames a given field (monger.collection/update "things" { :_id oid } { $rename { :old_field_name "new_field_name" } }) Friday, October 19, 12
  31. Querying Atomic Modifiers $push ;; appends _single_ value to field

    (mgcol/update "docs" { :_id oid } { $push { :tags "modifiers" } }) $pushAll ;; appends each value in value_array to field (mgcol/update coll { :_id oid } { $pushAll { :tags ["mongodb" "docs"] } }) $addToSet ;; adds value to the set (go figure) (mgcol/update coll { :_id oid } { $addToSet { :tags "modifiers" } }) And many many more... Friday, October 19, 12
  32. Querying Query DSL Pagination (with-collection "scores" (find {}) (fields [:score

    :name]) (skip 10) (limit 10)) ;; or (with-collection "scores" (find {}) (fields [:score :name]) (paginate :page 1 :per-page 10)) Friday, October 19, 12
  33. Querying Query DSL Sorting (with-collection "scores" (find {}) (fields [:score

    :name]) (sort {:score -1}) (limit 10)) Friday, October 19, 12
  34. Querying Query DSL Batching (with-collection coll (find {:age_in_days {$gt 365}})

    (sort {:age_in_days -1}) (batch-size 5000) Friday, October 19, 12
  35. Clojure Way Making Monger Conversion (defprotocol ConvertToDBObject (^com.mongodb.DBObject to-db-object [input]))

    (extend-protocol ConvertToDBObject String (to-db-object [^String input] input) IPersistentMap (to-db-object [^IPersistentMap input] (let [o (BasicDBObject.)] (doseq [[k v] input] (.put o (to-db-object k) (to-db-object v))) o))) java DBObject -> clojure Map Friday, October 19, 12
  36. • allow you to build a very flexible and easy

    conversion • implement a single function (to-db-object) for multiple different types • Runtime will get the type and call the corresponding method for you. • Recurse into keys/values by simply calling to-db-object. • don’t have to know what type of the object is given _now_, simply implement for all possible. • allows extension by library user by simply implementing the protocol Clojure Way Making Monger Conversion Protocols Friday, October 19, 12
  37. Clojure Way Making Monger Query Operators “$gt” vs $gt (no

    quotes) (defmacro ^{:private true} defoperator [operator] `(def ^{:const true} ~(symbol (str operator)) ~(str operator))) (defoperator $gt) (defoperator $gte) (defoperator $lt) (defoperator $lte) Friday, October 19, 12
  38. Clojure Way Making Monger Query Operators so you could build

    queries that look exactly like shell ones (mc/aggregate "docs" [{$project {:subtotal {$multiply ["$quantity", "$price"]} :_id 1 :state 1}} {$group {:_id "$state" :total {$sum "$subtotal"}}}]) Friday, October 19, 12
  39. • defer evaluation (we’ll talk more about it later) •

    define constants on the fly • extensible (you can add as many defoperator’s as you want) • easy to change impl (change protocol at one place) • actually allows you to get your DSL closer to the domain Clojure Way Making Monger Query Operators Macros Friday, October 19, 12
  40. Clojure Way Making Monger Query Operators using native <, >,

    <=, >= fns (defn match-operation [[operation orig-value & rest]] (let [value (to-cql-value orig-value) res (cond (= operation >) (format " > %s" value) ... (keyword? operation) (format "%s %s" (name operation) value))] (if rest (conj [res] (match-operation rest)) [res]))) (prepare-select-query "column_family_name" :where {:column_1 [>= 1] :column_2 [<= 5]} :limit 5) Friday, October 19, 12
  41. • used in Cassaforte, Cassandra driver • using intuitive, built-in

    operators/fns • extensible • easy to make defaults • recur into values Clojure Way Making Monger Query Operators Alternative Approach Friday, October 19, 12
  42. Clojure Way Making Monger Multiple Connections with-connection (declare ^:dynamic ^Mongo

    *mongodb-connection*) (defmacro with-connection [conn & body] `(binding [*mongodb-connection* ~conn] (do ~@body))) (with-connection (connect :host "server1") (mgcol/insert "books" {:title "Fight Club"})) (with-connection (connect :host "server2") (mgcol/insert "authors" {:name "Chuck Palahniuk"})) Friday, October 19, 12
  43. Clojure Way Making Monger Multiple Connections with-connection (defn set-connection! ^Mongo

    [^Mongo conn] (alter-var-root (var *mongodb-connection*) (constantly conn))) Friday, October 19, 12
  44. • use macros to delay evaluation • use bindings to

    have thread-local dynamic vars • people don’t have to pass/carry connection around • easy defaults (set dynamic by alter-var-root) • easy to understand Clojure Way Making Monger Multiple Connections Bindings Friday, October 19, 12
  45. Clojure Way Making Monger Query DSL with-collection (defmacro with-collection [^String

    coll & body] `(binding [*query-collection* (if (string? ~coll) ;; currently bound database (.getCollection ^DB *mongodb-database* ~coll) ~coll)] (let [query# (-> (empty-query *query-collection*) ~@body)] (exec query#)))) Friday, October 19, 12
  46. Clojure Way Making Monger Query DSL exec (defn exec [{

    :keys [collection query fields skip limit sort...] :or { limit 0 batch-size 256 skip 0 } }] (let [cursor (doto (.find collection (to-db-object query) (as-field-selector fields)) (.limit limit) (.skip skip) (.sort (to-db-object sort)) (.batchSize batch-size) (.hint (to-db-object hint)))] (when snapshot (.snapshot cursor)) (map (fn [x] (from-db-object x keywordize-fields)) cursor))) Friday, October 19, 12
  47. Clojure Way Making Monger Query DSL impl (defn find [m

    query] (merge m { :query query })) (defn fields [m flds] (merge m { :fields flds })) Friday, October 19, 12
  48. Clojure Way Making Monger Query DSL example (with-collection "scores" (find

    {}) ;;=> {:find {}} (fields [:score :name]) ;;=> {:fields {}} (sort {:score -1}) ;; => {:sort {:score -1}} (paginate :page 10 :per-page 10)) ;; {:limit 10 :skip 10 } Friday, October 19, 12
  49. • macros delay evaluation • makes it very easy to

    create DSL that’s close to your domain • specify collection just once • use thrush (->) operator to pass collection to all parts of ~@body • easy to understand • DSL methods that will be evaluated later (find/fields...) return hashes • hashes are destructured in :keys of exec function Clojure Way Making Monger Query DSL Macro/thrush Friday, October 19, 12
  50. • Thread-local binding for passing collection around • provide as

    many operations at once as you want, as they will be very readable (and also composable) • DSL is very extensible, since you can write your own arbitrary functions that return hashes, so you can write (find-person-by- gender :male). Paginate is one of examples Clojure Way Making Monger Query DSL Macro/thrush Friday, October 19, 12
  51. • “Normal” write concern is not “Normal” (you won’t know

    about server errors, only network) • Very (very) high virtual memory usage (yeah, right, let’s just use “repairDatabase”) • Repairs require lot of disk space (quantity of disk space that’s equal to size of your db, so when running into trouble, use --repairpath) • The rest I won’t mention in the talk Facts you probably won’t hear on Mongo Conferences Friday, October 19, 12
  52. Hey, high-tech, do you think you can *k with something

    like that? Friday, October 19, 12
  53. • Simple • Utilizes all powers of Hadoop • Clojure

    (+ entire JVM) available • easy to extend • dynamic queries • Arbitrary inputs/outputs Cascalog Friday, October 19, 12
  54. Cascalog (?<- (stdout) [?person ?age] (age ?person ?age) (= ?person

    "alice")) Query operator Output Output fields Input Query Friday, October 19, 12
  55. Cascalog (<- output [!date ?md5 !count] (input ?type ?hostname ?received_at

    ?md5) (date-hour-precision ?received_at :> !date) (:sort !date) (c/count !count)) Aggregate op Output sort Map operation Implicit grouping Friday, October 19, 12
  56. It’s an early version (was written in under 2 days

    a couple of weeks before the conf) Works in production (on rather small, ~15gb datasets) https://github.com/ifesdjeen/cascading-mongodb Cascading-Mongodb tap Friday, October 19, 12
  57. • Clojure is concise, performant and productive • Amazing for

    processing (big) data • Big set of available tools • Tools are easy-to-use, next-level • Works perfectly with MongoDb • Will change the way you write programs • Benefits ahead Takeaway Friday, October 19, 12
  58. • Utilize DB abstractions • Model your data using DB

    upsides • Get away from db-js map/reduce timely • Evaluate upsides of new tech • Use tools that make your team more productive • Evaluate incidental complexity Takeaway Friday, October 19, 12
  59. Also by ClojureWerkz Langohr Feature-rich Clojure RabbitMQ client that embraces

    AMQP 0.9.1 Model https://github.com/michaelklishin/langohr Friday, October 19, 12
  60. Also by ClojureWerkz Welle An expressive Clojure client for Riak

    https://github.com/michaelklishin/welle http://clojureriak.info/ Friday, October 19, 12
  61. Also by ClojureWerkz Neocons A feature rich idiomatic Clojure client

    for the Neo4J REST API https://github.com/michaelklishin/neocons http://clojureneo4j.info/ Friday, October 19, 12
  62. Also by ClojureWerkz Quartzite A powerful scheduling library for Clojure*

    https://github.com/michaelklishin/quartzite http://clojurequartz.info/ *now comes with Quartz-Mongodb and Quartzite-REST Friday, October 19, 12