Efficient, Concurrent and Concise Data Access in Clojure

Reinventing Haxl Efﬁcient & Concise Access to Remote Data by
Alexey Kachayev for EuroClojure 2015

About Me ‣ Alexey Kachayev, @kachayev ‣ CTO at Attendify.com
• Almost-all-day-coding-kind-of-CTO ‣ Active open source contributor

Agenda ‣ The Problem ‣ The Solution ‣ (applause) ‣
Future Plans

Multi-Level Data

Multi-Level Data friendship payload likes comments claims proﬁle mentions tags
spam notes

Simplest “Fetch”

Simplest “Fetch” ‣ Pros • concise & clean ‣ Cons
• very slow with sequential HTTP/TCP calls • duplicate requests (fetch same User a few times) • hard to batch requests (i.e. use MGET for Redis)

core.async & dedup

Optimized “Fetch” ‣ Optimization leads to unnecessary complexity in code
‣ Hard to dig through the code ‣ We want concise code that operates eﬃciently ‣ Not only a Clojure problem

Known Solutions

Known Solutions ‣ Haxl - Haskell library, Facebook, open sourced
• Key idea: Applicative functors to manage dependencies between data fetches under the hood with implicit concurrency & batches

Known Solutions ‣ Stitch - Scala library, Twitter, not open
sourced • Key idea: Build declarative AST of all data fetch operations and run special interpreter that will choose the most eﬃcient evaluation strategy

Known Solutions ‣ The idea behind Stitch should sound familiar
‣ go macro from core.async uses the same approach

What About Clojure? ‣ Muse library ‣ github.com/kachayev/muse ‣ Open
sourced a couple of weeks ago ‣ Still in active development ‣ Uses the idea of building and interpreting AST ‣ Uses core.async to deal with concurrency

Protocols (defprotocol DataSource (fetch [this]))
(defprotocol LabeledSource (resource-‐id [this])) (defprotocol BatchedSource (fetch-‐multi [this]))

Latency (defn remote-‐req [id result] (let [wait (rand
1000)] (println "-‐-‐>" id ".." wait) (go (<! (timeout wait)) (println "<-‐-‐" id) result)))

DataSource (defrecord Speaker [id] DataSource (fetch
[_] (remote-‐req id {:id id :name "Alexey" :topic "DataSource" :slides (inc id)})))

Operations Tree (fmap :name (Speaker. 42)) ;; #<MuseMap (:name
Speaker[42])> (fmap inc (fmap :slides (Speaker. 42))) ;; #<MuseMap ...> (fmap compare (fmap :slides (Speaker. 3)) (fmap :slides (Speaker. 7))) ;; #<MuseMap ...>

Nested Data (defrecord Speaker [id] DataSource
(fetch [_] (remote-‐req id {:name (str "Alexey #" id)}))) (defrecord Session [id] DataSource (fetch [_] (remote-‐req id {:id id :topic "Haxl in Clojure" :slides (inc id) :speaker (dec id)})))

Nested Data (defrecord Speaker [id] DataSource
(fetch [_] (remote-‐req id {:name (str "Alexey #" id)}))) (defrecord Session [id] DataSource (fetch [_] (remote-‐req id {:id id :topic "Reinventing Haxl" :slides (inc id) :speaker (dec id)})))

Nested Data (defn speaker-‐name [speaker-‐id] (fmap :name (Speaker.
speaker-‐id))) (defn who-‐speaks-‐at [id] (flat-‐map #(speaker-‐name (:speaker %)) (Session. id)))

Runner (run! (Speaker. 10)) ;; #<ManyToManyChannel ...> (<!!
(run! (Speaker. 10))) ;; {:name "Alexey #10"} (run!! (fmap :name (Speaker. 10))) ;; "Alexey" (run!! (who-‐speaks-‐at 20)) ;; "Alexey #19"

Common Friends (require '[clojure.set :refer :all]) (defrecord FriendsOf [id]
DataSource (fetch [_] (remote-‐req id (set (range id))))) (defn common-‐friends [x y] (fmap intersection (FriendsOf. x) (FriendsOf. y))) (defn num-‐common-‐friends [x y] (fmap count (common-‐friends x y)))

Common Friends (run!! (num-‐common-‐friends 3 4)) ;; —> 3
.. 335.57122610718227 ;; —> 4 .. 125.78371543747402 ;; <— 4 ;; <— 3 ;; 3

Common Friends (run!! (num-‐common-‐friends 5 5)) ;; —> 5
.. 145.0165111103837 ;; <— 5 ;; 5

Friends Of Friends (defn friends-‐of-‐friends [id] (-‐>> (FriendsOf.
id) (traverse -‐>FriendsOf) (fmap (partial apply union))))

Friends Of Friends (run!! (friends-‐of-‐friends 3)) ;; —> 3
.. 268.0963965301999 ;; <— 3 ;; —> 0 .. 233.01360724232333 ;; —> 1 .. 424.35908747904415 ;; —> 2 .. 778.2748225589665 ;; <— 0 ;; <— 1 ;; <— 2 ;; #{0 1}

Protocols (defprotocol DataSource (fetch [this]))
(defprotocol LabeledSource (resource-‐id [this])) (defprotocol BatchedSource (fetch-‐multi [this]))

Batched Source (defrecord FriendsOf [id] DataSource
(fetch [_] (remote-‐req id (set (range id)))) BatchedSource (fetch-‐multi [_ users] (let [ids (cons id (map :id users))] (-‐>> ids (map (juxt identity (comp set range))) (into {}) (remote-‐req ids)))))

Batched Source (run!! (friends-‐of-‐friends 3)) ;; —> 3 ..
433.9830317453879 ;; <— 3 ;; —> (0 1 2) .. 268.8396567924334 ;; <— (0 1 2) ;; #{0 1}

What Can It Do For You? ‣ Runs independent data
fetches concurrently • Uses BFS to group fetches level-by-level ‣ Caches previously made fetches during execution ‣ Batches requests when applicable

With Muse

With Muse T1 P1 P2 P3 P4 U1 US1 U2
US2 U3 US3 Timeline Posts Users & Scores

With Muse ‣ how did we achieve this? ‣ nice
separation of concerns: • muse to tell WHAT do I want to do • core.async to tell HOW ‣ generalized abstraction that doesn’t know nothing about concrete data storages

Known Restrictions ‣ Assumes your data fetches are “side-eﬀect free”
• You should not rely on the order ‣ You need enough memory to store fetches ‣ Uses core.async to run fetches concurrently

Future Plans ‣ Better error handling ‣ Debug-mode to trace
all fetches with latencies ‣ Applicative functors interface ‣ Get rid of fmap & flat-map ‣ ClojureScript support

Future Plans ‣ Looking for feedback from adopters ‣ Stay
tuned for more!

Thank You! Questions?

Efficient, Concurrent and Concise Data Access i...

Efficient, Concurrent and Concise Data Access in Clojure

More Decks by Oleksii Kachaiev

Other Decks in Technology

Featured

Transcript