Clojure Monger

Wo bist du? Friday, October 19, 12

this talk is a nothing I tell you can’t be
found on interwebz Friday, October 19, 12

on-the-fly talk Friday, October 19, 12

way more slides than we can cover Friday, October 19,
12

some slides made in case of questions Friday, October 19,
12

Which languages do you use? Friday, October 19, 12

Which Databas do you use? (no, this is not a
typo) Friday, October 19, 12

Do you use Hadoop? Friday, October 19, 12

let’s jump right to the guts Friday, October 19, 12

DB db = new Mongo( "127.0.0.1" ); DBCollection = db.getCollection("test");
DBObject dbobj = BasicDBObjectBuilder.start() .add("key1","value1") .add("key2","value2").get(); c.insert(dbobj); Meanwhile in world of Java, they insert documents Friday, October 19, 12

Pattern pattern = Pattern.compile("\\w*"); new BasicDBObject("users", new BasicDBObject("$not", new BasicDBObject("$gt",
2))) .append("language", "Java"); DBCursor cursor = collection.find(query); Meanwhile in world of Java, they query collections Friday, October 19, 12

coll.update(new BasicDBObject(), BasicDBObjectBuilder.start() .push( "$set" ) .add( "nested.attribute" , 2
) .get()); Meanwhile in world of Java, they use atomic modifiers Friday, October 19, 12

List<DBObject> groupOperations = new ArrayList<DBObject>(); groupOperations.add(new BasicDBObject()); groupOperations.add(new BasicDBObject("countPerName", new
BasicDBObject( "$sum", 1 ))); DBObject group = new BasicDBObject( ); group.put("_id", "$name" ); group.put( "docsPerName", new BasicDBObject( "$sum", 1 )); group.put( "countPerName", new BasicDBObject( "$sum", "$count" )); AggregationOutput out = c.aggregate(new BasicDBObject( "$project", projFields ), new BasicDBObject( "$group", group)); Meanwhile in world of Java, they use atomic modifiers Friday, October 19, 12

public class WordCount { private static final Log log =
LogFactory.getLog( WordCount.class ); public static class TokenizerMapper extends Mapper<Object, BSONObject, Text, IntWritable> { private final static IntWritable one = new IntWritable( 1 ); private final Text word = new Text(); public void map( Object key, BSONObject value, Context context ) throws IOException, InterruptedException{ final StringTokenizer itr = new StringTokenizer( value.get( "x" ).toString() ); while ( itr.hasMoreTokens() ){ word.set( itr.nextToken() ); context.write( word, one ); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private final IntWritable result = new IntWritable(); public void reduce( Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException{ int sum = 0; for ( final IntWritable val : values ){ sum += val.get(); } result.set( sum ); context.write( key, result ); } } Meanwhile in world of Java, they even use hadoop (spooky!). public static void main( String[] args ) throws Exception{ final Configuration conf = new Configuration(); MongoConfigUtil.setInputURI( conf, "mongodb://localhost/test.in" ); MongoConfigUtil.setOutputURI( conf, "mongodb://localhost/test.out" ); System.out.println( "Conf: " + conf ); final Job job = new Job( conf, "word count" ); job.setJarByClass( WordCount.class ); job.setMapperClass( TokenizerMapper.class ); job.setCombinerClass( IntSumReducer.class ); job.setReducerClass( IntSumReducer.class ); job.setOutputKeyClass( Text.class ); job.setOutputValueClass( IntWritable.class ); job.setInputFormatClass( MongoInputFormat.class ); job.setOutputFormatClass( MongoOutputFormat.class ); System.exit( job.waitForCompletion( true ) ? 0 : 1 ); } } Friday, October 19, 12

pretty cool, huh? Friday, October 19, 12

or Friday, October 19, 12

can you actually read it? Friday, October 19, 12

you understand it? Friday, October 19, 12

when you come back in a month? Friday, October 19,
12

now... Friday, October 19, 12

imagine Friday, October 19, 12

a world, where parallel execution is so easy Friday, October
19, 12

and people spend time solving actual problems Friday, October 19,
12

instead of fighting the environment Friday, October 19, 12

where build tool doesn’t cause pain Friday, October 19, 12

dependencies are resolved in a sane way Friday, October 19,
12

where libs are... Friday, October 19, 12

maintained Friday, October 19, 12

up-to-date Friday, October 19, 12

documented Friday, October 19, 12

and migrations between runtime versions are smooth Friday, October 19,
12

enter Clojure Friday, October 19, 12

Clojurefunctional immutable persistent DS Java interop lazy evaluation built for
concurrency homoiconic dynamic Friday, October 19, 12

first-class functions (def hello (fn [] "Hello world")) just in
case someone asks ;) Friday, October 19, 12

fn-composition (def not-zero? (comp not zero?)) (not-zero? 1) => true
(def not-zero? (complement zero?)) (not-zero? 1) => true just in case someone asks ;) Friday, October 19, 12

Java interop widens your horizon extends your toolchain Friday, October
19, 12

concurrency STM java.util.concurrent Friday, October 19, 12

• GC (swappable, performant, tunable) • Bytecode + JIT compilation
• Powerful standard library • Ecosystem (jmap, jstack, jstat, VisualVM) • Multilingual • Performant (esp. for long-running tasks) Why JVM? Friday, October 19, 12

• doesn’t have anything to do with disk persistence •
efficient creation of “modified” versions • structural sharing • inherently thread & iteration-safe • immutable (modification yields new coll) • composite Persistent DS?.. Friday, October 19, 12

• reference, not value • refs / agents / atoms
/ vars • transactional memory access yes, with retries • no user locks, no deadlocks • mutation through pure function • coordinated • readers can read value at any point in time STM?.. Friday, October 19, 12

now... Friday, October 19, 12

a world, where you don’t have to select * from
table; Friday, October 19, 12

where people can do actual work Friday, October 19, 12

instead of fighting their databas (not a typo) Friday, October
19, 12

schema is flexible Friday, October 19, 12

adding nodes to cluster is easy Friday, October 19, 12

without losing all the features of relational database (besides transactions,
but we’ll cover that) Friday, October 19, 12

where you can mutate records without reading them (don’t tell
me SQL handles that, as it covers just a small part) Friday, October 19, 12

enter MongoDb Friday, October 19, 12

• Good set of primitives on databas level • Ordering
data • Limit/offset/pagination • Writing complex queries • Atomic Modifiers • Updating set of records by complex query • Batch inserts (available everywhere for years) (also, indexing, flexible schema, ops-friendliness, maybe native RRDB impl won’t hurt) Our databas requirements Friday, October 19, 12

• Strings • Numeric types • Date • Binary data
• Embedded records Primitives Friday, October 19, 12

Primitives, hash { :key1 "value1" :key2 "value2" } { key1:
"value1", key2: "value2" } Clojure Mongo Friday, October 19, 12

Primitives, array/vector [ "value1" "value2" "value3" ] [ "value1", "value2",
"value3" ] Clojure Mongo Friday, October 19, 12

Primitives, nested ds { :key1 { :key2 "value1" } :key3
[ "value2" "value3" "value4" ] } { key1: { key2: "value1" }, key3: [ "value2", "value3", "value4" ] } Clojure Mongo Friday, October 19, 12

Dates (java.util.Date. 112 10 1 11 12 32) ISODate("2012-10-01T11:12:32.121Z") Clojure
Mongo Friday, October 19, 12

(almost) 1:1 mapping of Clojure data structures Friday, October 19,
12

it’s all cool but how do I do all that
fancy stuff with MongoDB from Clojure? Friday, October 19, 12

enter Clojure Friday, October 19, 12

Friday, October 19, 12

Werkz Friday, October 19, 12

It just werkz Friday, October 19, 12

Monger MongoDB client for a more civilized age: friendly, flexible
and with batteries included https://github.com/michaelklishin/monger http://clojuremongodb.info/ Friday, October 19, 12

• Idiomatic Clojure driver • Uses MongoDB Java driver underneath
• powerful expressive query DSL • support for MongoDB 2.0+ features • has next to no performance overhead • well maintained • well documented Monger Friday, October 19, 12

Exact match Querying (monger.collection/find-maps "books" { :author "Joe Armstrong" })
db.books.find({"author": "Joe Armstrong"}) { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12

Querying Nested attribute match (monger.collection/find-maps "books" { :price.currency "USD" })
db.books.find({"price.currency": "USD"}) { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12

Querying Nested array match (monger.collection/find-maps collection {:tags "driver"}) db.books.find({"tags": "multicore"})
{ "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12

Querying Complex queries (monger.collection/find-maps collection {:price.discount {$lt 25.00}}) db.books.find({"price.discount": {$lt:
25.00}}) { "title": "Programming Erlang: Software for a Concurrent World", "author": "Joe Armstrong", "publicationYear": 2007, "price": { "currency": "USD", "discount": 24.14, "msrp": 36.95 }, "publisher": "The Pragmatic Programmers, LLC", "tags": [ "erlang", "programming" ] } Friday, October 19, 12

Querying Operators Comparison ;; Return all rows where :users field
is greater than 10 (mgcol/find collection {:users {$gt 10}}) •$gt "greater than" comparator •$gte "greater than or equals" comparator •$gt "less than" comparator •$lte "less than or equals" comparator •$all matches all values in the array Friday, October 19, 12

Querying Operators Set Matching ;; Return all rows where tags
field matches all elements ;; of the given array (mgcol/find-maps collection {:tags {$all [ "functional" "object-oriented" ]}}) • $in analogous to the SQL IN modifier • $nin “not in set” Friday, October 19, 12

Querying Logical Operators $and $or $nor (mgcol/find collection {$and [{:language
"Clojure"} {:users {$gt 10}}]}) Friday, October 19, 12

Querying Atomic Modifiers $inc ;; increments one or many fields
for the given value (monger.collection/update "scores" { :_id user-id } { :score 10 } }) $set ;; sets field (or set of fields) to value (monger.collection/update "things" { :_id oid } { $set { :weight 20.5 } }) $unset ;; $unset deletes a given field (monger.collection/update "things" { :_id oid } { $unset { :weight 1 } }) $rename ;; renames a given field (monger.collection/update "things" { :_id oid } { $rename { :old_field_name "new_field_name" } }) Friday, October 19, 12

Querying Atomic Modifiers $push ;; appends _single_ value to field
(mgcol/update "docs" { :_id oid } { $push { :tags "modifiers" } }) $pushAll ;; appends each value in value_array to field (mgcol/update coll { :_id oid } { $pushAll { :tags ["mongodb" "docs"] } }) $addToSet ;; adds value to the set (go figure) (mgcol/update coll { :_id oid } { $addToSet { :tags "modifiers" } }) And many many more... Friday, October 19, 12

Querying Query DSL Pagination (with-collection "scores" (find {}) (fields [:score
:name]) (skip 10) (limit 10)) ;; or (with-collection "scores" (find {}) (fields [:score :name]) (paginate :page 1 :per-page 10)) Friday, October 19, 12

Querying Query DSL Sorting (with-collection "scores" (find {}) (fields [:score
:name]) (sort {:score -1}) (limit 10)) Friday, October 19, 12

Querying Query DSL Batching (with-collection coll (find {:age_in_days {$gt 365}})
(sort {:age_in_days -1}) (batch-size 5000) Friday, October 19, 12

Clojure Way Making Monger Conversion (defprotocol ConvertToDBObject (^com.mongodb.DBObject to-db-object [input]))
(extend-protocol ConvertToDBObject String (to-db-object [^String input] input) IPersistentMap (to-db-object [^IPersistentMap input] (let [o (BasicDBObject.)] (doseq [[k v] input] (.put o (to-db-object k) (to-db-object v))) o))) java DBObject -> clojure Map Friday, October 19, 12

• allow you to build a very flexible and easy
conversion • implement a single function (to-db-object) for multiple different types • Runtime will get the type and call the corresponding method for you. • Recurse into keys/values by simply calling to-db-object. • don’t have to know what type of the object is given _now_, simply implement for all possible. • allows extension by library user by simply implementing the protocol Clojure Way Making Monger Conversion Protocols Friday, October 19, 12

Clojure Way Making Monger Query Operators “$gt” vs $gt (no
quotes) (defmacro ^{:private true} defoperator [operator] `(def ^{:const true} ~(symbol (str operator)) ~(str operator))) (defoperator $gt) (defoperator $gte) (defoperator $lt) (defoperator $lte) Friday, October 19, 12

Clojure Way Making Monger Query Operators so you could build
queries that look exactly like shell ones (mc/aggregate "docs" [{$project {:subtotal {$multiply ["$quantity", "$price"]} :_id 1 :state 1}} {$group {:_id "$state" :total {$sum "$subtotal"}}}]) Friday, October 19, 12

• defer evaluation (we’ll talk more about it later) •
define constants on the fly • extensible (you can add as many defoperator’s as you want) • easy to change impl (change protocol at one place) • actually allows you to get your DSL closer to the domain Clojure Way Making Monger Query Operators Macros Friday, October 19, 12

Clojure Way Making Monger Query Operators using native <, >,
<=, >= fns (defn match-operation [[operation orig-value & rest]] (let [value (to-cql-value orig-value) res (cond (= operation >) (format " > %s" value) ... (keyword? operation) (format "%s %s" (name operation) value))] (if rest (conj [res] (match-operation rest)) [res]))) (prepare-select-query "column_family_name" :where {:column_1 [>= 1] :column_2 [<= 5]} :limit 5) Friday, October 19, 12

• used in Cassaforte, Cassandra driver • using intuitive, built-in
operators/fns • extensible • easy to make defaults • recur into values Clojure Way Making Monger Query Operators Alternative Approach Friday, October 19, 12

Clojure Way Making Monger Multiple Connections with-connection (declare ^:dynamic ^Mongo
*mongodb-connection*) (defmacro with-connection [conn & body] `(binding [*mongodb-connection* ~conn] (do ~@body))) (with-connection (connect :host "server1") (mgcol/insert "books" {:title "Fight Club"})) (with-connection (connect :host "server2") (mgcol/insert "authors" {:name "Chuck Palahniuk"})) Friday, October 19, 12

Clojure Way Making Monger Multiple Connections with-connection (defn set-connection! ^Mongo
[^Mongo conn] (alter-var-root (var *mongodb-connection*) (constantly conn))) Friday, October 19, 12

• use macros to delay evaluation • use bindings to
have thread-local dynamic vars • people don’t have to pass/carry connection around • easy defaults (set dynamic by alter-var-root) • easy to understand Clojure Way Making Monger Multiple Connections Bindings Friday, October 19, 12

Clojure Way Making Monger Query DSL with-collection (defmacro with-collection [^String
coll & body] `(binding [*query-collection* (if (string? ~coll) ;; currently bound database (.getCollection ^DB *mongodb-database* ~coll) ~coll)] (let [query# (-> (empty-query *query-collection*) ~@body)] (exec query#)))) Friday, October 19, 12

Clojure Way Making Monger Query DSL exec (defn exec [{
:keys [collection query fields skip limit sort...] :or { limit 0 batch-size 256 skip 0 } }] (let [cursor (doto (.find collection (to-db-object query) (as-field-selector fields)) (.limit limit) (.skip skip) (.sort (to-db-object sort)) (.batchSize batch-size) (.hint (to-db-object hint)))] (when snapshot (.snapshot cursor)) (map (fn [x] (from-db-object x keywordize-fields)) cursor))) Friday, October 19, 12

Clojure Way Making Monger Query DSL impl (defn find [m
query] (merge m { :query query })) (defn fields [m flds] (merge m { :fields flds })) Friday, October 19, 12

Clojure Way Making Monger Query DSL example (with-collection "scores" (find
{}) ;;=> {:find {}} (fields [:score :name]) ;;=> {:fields {}} (sort {:score -1}) ;; => {:sort {:score -1}} (paginate :page 10 :per-page 10)) ;; {:limit 10 :skip 10 } Friday, October 19, 12

• macros delay evaluation • makes it very easy to
create DSL that’s close to your domain • specify collection just once • use thrush (->) operator to pass collection to all parts of ~@body • easy to understand • DSL methods that will be evaluated later (find/fields...) return hashes • hashes are destructured in :keys of exec function Clojure Way Making Monger Query DSL Macro/thrush Friday, October 19, 12

• Thread-local binding for passing collection around • provide as
many operations at once as you want, as they will be very readable (and also composable) • DSL is very extensible, since you can write your own arbitrary functions that return hashes, so you can write (find-person-by- gender :male). Paginate is one of examples Clojure Way Making Monger Query DSL Macro/thrush Friday, October 19, 12

• “Normal” write concern is not “Normal” (you won’t know
about server errors, only network) • Very (very) high virtual memory usage (yeah, right, let’s just use “repairDatabase”) • Repairs require lot of disk space (quantity of disk space that’s equal to size of your db, so when running into trouble, use --repairpath) • The rest I won’t mention in the talk Facts you probably won’t hear on Mongo Conferences Friday, October 19, 12

Back to Hadoop topic Friday, October 19, 12

Remember that spooky example? Friday, October 19, 12

Hey, high-tech, do you think you can *k with something
like that? Friday, October 19, 12

Cascalog (<- [?count] (in _) (c/count ?count)) Remember that spooky
hadoop example? Friday, October 19, 12

• Simple • Utilizes all powers of Hadoop • Clojure
(+ entire JVM) available • easy to extend • dynamic queries • Arbitrary inputs/outputs Cascalog Friday, October 19, 12

Cascalog (?<- (stdout) [?person ?age] (age ?person ?age) (= ?person
"alice")) Query operator Output Output fields Input Query Friday, October 19, 12

Cascalog (?<- (stdout) [?person ?count] (follows ?person _) (c/count ?count))
Aggregate op Friday, October 19, 12

Cascalog (<- output [!date ?md5 !count] (input ?type ?hostname ?received_at
?md5) (date-hour-precision ?received_at :> !date) (:sort !date) (c/count !count)) Aggregate op Output sort Map operation Implicit grouping Friday, October 19, 12

It’s an early version (was written in under 2 days
a couple of weeks before the conf) Works in production (on rather small, ~15gb datasets) https://github.com/ifesdjeen/cascading-mongodb Cascading-Mongodb tap Friday, October 19, 12

• Clojure is concise, performant and productive • Amazing for
processing (big) data • Big set of available tools • Tools are easy-to-use, next-level • Works perfectly with MongoDb • Will change the way you write programs • Benefits ahead Takeaway Friday, October 19, 12

• Utilize DB abstractions • Model your data using DB
upsides • Get away from db-js map/reduce timely • Evaluate upsides of new tech • Use tools that make your team more productive • Evaluate incidental complexity Takeaway Friday, October 19, 12

Also by ClojureWerkz Langohr Feature-rich Clojure RabbitMQ client that embraces
AMQP 0.9.1 Model https://github.com/michaelklishin/langohr Friday, October 19, 12

Also by ClojureWerkz Welle An expressive Clojure client for Riak
https://github.com/michaelklishin/welle http://clojureriak.info/ Friday, October 19, 12

Also by ClojureWerkz Neocons A feature rich idiomatic Clojure client
for the Neo4J REST API https://github.com/michaelklishin/neocons http://clojureneo4j.info/ Friday, October 19, 12

Also by ClojureWerkz Elastisch Clojure client for ElasticSearch https://github.com/clojurewerkz/elastisch Friday,
October 19, 12

Also by ClojureWerkz Quartzite A powerful scheduling library for Clojure*
https://github.com/michaelklishin/quartzite http://clojurequartz.info/ *now comes with Quartz-Mongodb and Quartzite-REST Friday, October 19, 12

ifesdjeen clojurewerkz / / michaelklishin / Also, you should follow

Clojure Monger

Clojure Monger

More Decks by αλεx π

Other Decks in Programming

Featured

Transcript