Clojure Monger

αλεx π
October 19, 2012

Clojure Monger

αλεx π

October 19, 2012

  3. DB db = new Mongo( "" ); DBCollection = db.getCollection("test");

    
  4. Pattern pattern = Pattern.compile("\\w*"); new BasicDBObject("users", new BasicDBObject("$not", new BasicDBObject("$gt",

    
  5. coll.update(new BasicDBObject(), BasicDBObjectBuilder.start() .push( "$set" ) .add( "nested.attribute" , 2

    
  6. List<DBObject> groupOperations = new ArrayList<DBObject>(); groupOperations.add(new BasicDBObject()); groupOperations.add(new BasicDBObject("countPerName", new

    
  7. public class WordCount { private static final Log log =

    public static void main( String[] args ) throws Exception{ final Configuration conf = new Configuration(); MongoConfigUtil.setInputURI( conf, "mongodb://localhost/test.in" ); MongoConfigUtil.setOutputURI( conf, "mongodb://localhost/test.out" ); System.out.println( "Conf: " + conf ); final Job job = new Job( conf, "word count" ); job.setJarByClass( WordCount.class ); job.setMapperClass( TokenizerMapper.class ); job.setCombinerClass( IntSumReducer.class ); job.setReducerClass( IntSumReducer.class ); job.setOutputKeyClass( Text.class ); job.setOutputValueClass( IntWritable.class ); job.setInputFormatClass( MongoInputFormat.class ); job.setOutputFormatClass( MongoOutputFormat.class ); System.exit( job.waitForCompletion( true ) ? 0 : 1 ); } }
  Clojurefunctional immutable persistent DS Java interop lazy evaluation built for concurrency homoiconic dynamic

    
  first-class functions (def hello (fn [] "Hello world")) just in case someone asks ;)

    
  10. fn-composition (def not-zero? (comp not zero?)) (not-zero? 1) => true

    
  11. • GC (swappable, performant, tunable) • Bytecode + JIT compilation

    
  12. • doesn’t have anything to do with disk persistence •

    
  13. • reference, not value • refs / agents / atoms

    
  a world, where you don't have to select * from table;

    
  15. where you can mutate records without reading them (don’t tell

    
  16. • Good set of primitives on databas level • Ordering

    
  17. • Strings • Numeric types • Date • Binary data

    
  18. Primitives, hash { :key1 "value1" :key2 "value2" } { key1:

    
  19. Primitives, nested ds { :key1 { :key2 "value1" } :key3

    
  20. it’s all cool but how do I do all that

    
  21. Monger MongoDB client for a more civilized age: friendly, flexible

    
  22. • Idiomatic Clojure driver • Uses MongoDB Java driver underneath

    
  23. Exact match Querying (monger.collection/find-maps "books" { :author "Joe Armstrong" })

    
  24. Querying Nested attribute match (monger.collection/find-maps "books" { :price.currency "USD" })

    
  25. Querying Nested array match (monger.collection/find-maps collection {:tags "driver"}) db.books.find({"tags": "multicore"})

    
  26. Querying Complex queries (monger.collection/find-maps collection {:price.discount {$lt 25.00}}) db.books.find({"price.discount": {$lt:

    
  27. Querying Operators Comparison ;; Return all rows where :users field

    
  28. Querying Operators Set Matching ;; Return all rows where tags

    
  Querying Logical Operators $and $or $nor (mgcol/find collection {$and [{:language "Clojure"} {:users {$gt 10}}]})

    
  30. Querying Atomic Modifiers $inc ;; increments one or many fields

    
  31. Querying Atomic Modifiers $push ;; appends _single_ value to field

    
  32. Querying Query DSL Pagination (with-collection "scores" (find {}) (fields [:score

    
  Querying Query DSL Sorting (with-collection "scores" (find {}) (fields [:score :name]) (sort {:score -1}) (limit 10))

    
  Querying Query DSL Batching (with-collection coll (find {:age_in_days {$gt 365}}) (sort {:age_in_days -1}) (batch-size 5000)

    
  35. Clojure Way Making Monger Conversion (defprotocol ConvertToDBObject (^com.mongodb.DBObject to-db-object [input]))

    
  36. • allow you to build a very flexible and easy

    
  

    quotes) (defmacro ^{:private true} defoperator [operator] `(def ^{:const true} ~(symbol (str operator)) ~(str operator))) (defoperator $gt) (defoperator $gte) (defoperator $lt) (defoperator $lte) Friday, October 19, 12
  38. Clojure Way Making Monger Query Operators so you could build

    queries that look exactly like shell ones (mc/aggregate "docs" [{$project {:subtotal {$multiply ["$quantity", "$price"]} :_id 1 :state 1}} {$group {:_id "$state" :total {$sum "$subtotal"}}}]) Friday, October 19, 12
  39. • defer evaluation (we’ll talk more about it later) •

    define constants on the fly • extensible (you can add as many defoperator’s as you want) • easy to change impl (change protocol at one place) • actually allows you to get your DSL closer to the domain Clojure Way Making Monger Query Operators Macros Friday, October 19, 12
  40. Clojure Way Making Monger Query Operators using native <, >,

    <=, >= fns (defn match-operation [[operation orig-value & rest]] (let [value (to-cql-value orig-value) res (cond (= operation >) (format " > %s" value) ... (keyword? operation) (format "%s %s" (name operation) value))] (if rest (conj [res] (match-operation rest)) [res]))) (prepare-select-query "column_family_name" :where {:column_1 [>= 1] :column_2 [<= 5]} :limit 5) Friday, October 19, 12
  41. • used in Cassaforte, Cassandra driver • using intuitive, built-in

    operators/fns • extensible • easy to make defaults • recur into values Clojure Way Making Monger Query Operators Alternative Approach Friday, October 19, 12
  42. Clojure Way Making Monger Multiple Connections with-connection (declare ^:dynamic ^Mongo

    *mongodb-connection*) (defmacro with-connection [conn & body] `(binding [*mongodb-connection* ~conn] (do ~@body))) (with-connection (connect :host "server1") (mgcol/insert "books" {:title "Fight Club"})) (with-connection (connect :host "server2") (mgcol/insert "authors" {:name "Chuck Palahniuk"})) Friday, October 19, 12
  43. Clojure Way Making Monger Multiple Connections with-connection (defn set-connection! ^Mongo

    [^Mongo conn] (alter-var-root (var *mongodb-connection*) (constantly conn))) Friday, October 19, 12
  44. • use macros to delay evaluation • use bindings to

    have thread-local dynamic vars • people don’t have to pass/carry connection around • easy defaults (set dynamic by alter-var-root) • easy to understand Clojure Way Making Monger Multiple Connections Bindings Friday, October 19, 12
  45. Clojure Way Making Monger Query DSL with-collection (defmacro with-collection [^String

    coll & body] `(binding [*query-collection* (if (string? ~coll) ;; currently bound database (.getCollection ^DB *mongodb-database* ~coll) ~coll)] (let [query# (-> (empty-query *query-collection*) ~@body)] (exec query#)))) Friday, October 19, 12
  46. Clojure Way Making Monger Query DSL exec (defn exec [{

    :keys [collection query fields skip limit sort...] :or { limit 0 batch-size 256 skip 0 } }] (let [cursor (doto (.find collection (to-db-object query) (as-field-selector fields)) (.limit limit) (.skip skip) (.sort (to-db-object sort)) (.batchSize batch-size) (.hint (to-db-object hint)))] (when snapshot (.snapshot cursor)) (map (fn [x] (from-db-object x keywordize-fields)) cursor))) Friday, October 19, 12
  47. Clojure Way Making Monger Query DSL impl (defn find [m

    query] (merge m { :query query })) (defn fields [m flds] (merge m { :fields flds })) Friday, October 19, 12
  48. Clojure Way Making Monger Query DSL example (with-collection "scores" (find

    {}) ;;=> {:find {}} (fields [:score :name]) ;;=> {:fields {}} (sort {:score -1}) ;; => {:sort {:score -1}} (paginate :page 10 :per-page 10)) ;; {:limit 10 :skip 10 } Friday, October 19, 12
  49. • macros delay evaluation • makes it very easy to

    create DSL that’s close to your domain • specify collection just once • use thrush (->) operator to pass collection to all parts of ~@body • easy to understand • DSL methods that will be evaluated later (find/fields...) return hashes • hashes are destructured in :keys of exec function Clojure Way Making Monger Query DSL Macro/thrush Friday, October 19, 12
  50. • Thread-local binding for passing collection around • provide as

    many operations at once as you want, as they will be very readable (and also composable) • DSL is very extensible, since you can write your own arbitrary functions that return hashes, so you can write (find-person-by- gender :male). Paginate is one of examples Clojure Way Making Monger Query DSL Macro/thrush Friday, October 19, 12
  51. • “Normal” write concern is not “Normal” (you won’t know

    about server errors, only network) • Very (very) high virtual memory usage (yeah, right, let’s just use “repairDatabase”) • Repairs require lot of disk space (quantity of disk space that’s equal to size of your db, so when running into trouble, use --repairpath) • The rest I won’t mention in the talk Facts you probably won’t hear on Mongo Conferences Friday, October 19, 12
  52. Hey, high-tech, do you think you can *k with something

    like that? Friday, October 19, 12
  53. • Simple • Utilizes all powers of Hadoop • Clojure

    (+ entire JVM) available • easy to extend • dynamic queries • Arbitrary inputs/outputs Cascalog Friday, October 19, 12
  54. Cascalog (?<- (stdout) [?person ?age] (age ?person ?age) (= ?person

    "alice")) Query operator Output Output fields Input Query Friday, October 19, 12
  55. Cascalog (<- output [!date ?md5 !count] (input ?type ?hostname ?received_at

    ?md5) (date-hour-precision ?received_at :> !date) (:sort !date) (c/count !count)) Aggregate op Output sort Map operation Implicit grouping Friday, October 19, 12
  56. It’s an early version (was written in under 2 days

    a couple of weeks before the conf) Works in production (on rather small, ~15gb datasets) https://github.com/ifesdjeen/cascading-mongodb Cascading-Mongodb tap Friday, October 19, 12
  57. • Clojure is concise, performant and productive • Amazing for

    processing (big) data • Big set of available tools • Tools are easy-to-use, next-level • Works perfectly with MongoDb • Will change the way you write programs • Benefits ahead Takeaway Friday, October 19, 12
  58. • Utilize DB abstractions • Model your data using DB

    upsides • Get away from db-js map/reduce timely • Evaluate upsides of new tech • Use tools that make your team more productive • Evaluate incidental complexity Takeaway Friday, October 19, 12
