Efficient, Concurrent and Concise Data Access in Clojure

Slide 1

Slide 1 text

Reinventing Haxl Efﬁcient & Concise Access to Remote Data by Alexey Kachayev for EuroClojure 2015

Slide 2

Slide 2 text

About Me ‣ Alexey Kachayev, @kachayev ‣ CTO at Attendify.com • Almost-all-day-coding-kind-of-CTO ‣ Active open source contributor

Slide 3

Slide 3 text

Agenda ‣ The Problem ‣ The Solution ‣ (applause) ‣ Future Plans

Slide 4

Slide 4 text

Multi-Level Data

Slide 5

Slide 5 text

Multi-Level Data friendship payload likes comments claims proﬁle mentions tags spam notes

Slide 6

Slide 6 text

Simplest “Fetch”

Slide 7

Slide 7 text

Simplest “Fetch” ‣ Pros • concise & clean ‣ Cons • very slow with sequential HTTP/TCP calls • duplicate requests (fetch same User a few times) • hard to batch requests (i.e. use MGET for Redis)

Slide 8

Slide 8 text

core.async & dedup

Slide 9

Slide 9 text

Optimized “Fetch” ‣ Optimization leads to unnecessary complexity in code ‣ Hard to dig through the code ‣ We want concise code that operates eﬃciently ‣ Not only a Clojure problem

Slide 10

Slide 10 text

Known Solutions

Slide 11

Slide 11 text

Known Solutions ‣ Haxl - Haskell library, Facebook, open sourced • Key idea: Applicative functors to manage dependencies between data fetches under the hood with implicit concurrency & batches

Slide 12

Slide 12 text

Known Solutions ‣ Stitch - Scala library, Twitter, not open sourced • Key idea: Build declarative AST of all data fetch operations and run special interpreter that will choose the most eﬃcient evaluation strategy

Slide 13

Slide 13 text

Known Solutions ‣ The idea behind Stitch should sound familiar ‣ go macro from core.async uses the same approach

Slide 14

Slide 14 text

What About Clojure? ‣ Muse library ‣ github.com/kachayev/muse ‣ Open sourced a couple of weeks ago ‣ Still in active development ‣ Uses the idea of building and interpreting AST ‣ Uses core.async to deal with concurrency

Slide 15

Slide 15 text

Protocols (defprotocol DataSource (fetch [this])) (defprotocol LabeledSource (resource-‐id [this])) (defprotocol BatchedSource (fetch-‐multi [this]))

Slide 16

Slide 16 text

Protocols (defprotocol DataSource (fetch [this])) (defprotocol LabeledSource (resource-‐id [this])) (defprotocol BatchedSource (fetch-‐multi [this]))

Slide 17

Slide 17 text

Latency (defn remote-‐req [id result] (let [wait (rand 1000)] (println "-‐-‐>" id ".." wait) (go (

Slide 18

Slide 18 text

Latency (defn remote-‐req [id result] (let [wait (rand 1000)] (println "-‐-‐>" id ".." wait) (go (

Slide 19

Slide 19 text

DataSource (defrecord Speaker [id] DataSource (fetch [_] (remote-‐req id {:id id :name "Alexey" :topic "DataSource" :slides (inc id)})))

Slide 20

Slide 20 text

Operations Tree (fmap :name (Speaker. 42)) ;; # (fmap inc (fmap :slides (Speaker. 42))) ;; # (fmap compare (fmap :slides (Speaker. 3)) (fmap :slides (Speaker. 7))) ;; #

Slide 21

Slide 21 text

Operations Tree (fmap :name (Speaker. 42)) ;; # (fmap inc (fmap :slides (Speaker. 42))) ;; # (fmap compare (fmap :slides (Speaker. 3)) (fmap :slides (Speaker. 7))) ;; #

Slide 22

Slide 22 text

Nested Data (defrecord Speaker [id] DataSource (fetch [_] (remote-‐req id {:name (str "Alexey #" id)}))) (defrecord Session [id] DataSource (fetch [_] (remote-‐req id {:id id :topic "Haxl in Clojure" :slides (inc id) :speaker (dec id)})))

Slide 23

Slide 23 text

Nested Data (defrecord Speaker [id] DataSource (fetch [_] (remote-‐req id {:name (str "Alexey #" id)}))) (defrecord Session [id] DataSource (fetch [_] (remote-‐req id {:id id :topic "Reinventing Haxl" :slides (inc id) :speaker (dec id)})))

Slide 24

Slide 24 text

Nested Data (defn speaker-‐name [speaker-‐id] (fmap :name (Speaker. speaker-‐id))) (defn who-‐speaks-‐at [id] (flat-‐map #(speaker-‐name (:speaker %)) (Session. id)))

Slide 25

Slide 25 text

Nested Data (defn speaker-‐name [speaker-‐id] (fmap :name (Speaker. speaker-‐id))) (defn who-‐speaks-‐at [id] (flat-‐map #(speaker-‐name (:speaker %)) (Session. id)))

Slide 26

Slide 26 text

Runner (run! (Speaker. 10)) ;; # (

Slide 27

Slide 27 text

Runner (run! (Speaker. 10)) ;; # (

Slide 28

Slide 28 text

Common Friends (require '[clojure.set :refer :all]) (defrecord FriendsOf [id] DataSource (fetch [_] (remote-‐req id (set (range id))))) (defn common-‐friends [x y] (fmap intersection (FriendsOf. x) (FriendsOf. y))) (defn num-‐common-‐friends [x y] (fmap count (common-‐friends x y)))

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Common Friends (run!! (num-‐common-‐friends 3 4)) ;; —> 3 .. 335.57122610718227 ;; —> 4 .. 125.78371543747402 ;; <— 4 ;; <— 3 ;; 3

Slide 31

Slide 31 text

Common Friends (run!! (num-‐common-‐friends 5 5)) ;; —> 5 .. 145.0165111103837 ;; <— 5 ;; 5

Slide 32

Slide 32 text

Friends Of Friends (defn friends-‐of-‐friends [id] (-‐>> (FriendsOf. id) (traverse -‐>FriendsOf) (fmap (partial apply union))))

Slide 33

Slide 33 text

Friends Of Friends (run!! (friends-‐of-‐friends 3)) ;; —> 3 .. 268.0963965301999 ;; <— 3 ;; —> 0 .. 233.01360724232333 ;; —> 1 .. 424.35908747904415 ;; —> 2 .. 778.2748225589665 ;; <— 0 ;; <— 1 ;; <— 2 ;; #{0 1}

Slide 34

Slide 34 text

Protocols (defprotocol DataSource (fetch [this])) (defprotocol LabeledSource (resource-‐id [this])) (defprotocol BatchedSource (fetch-‐multi [this]))

Slide 35

Slide 35 text

Batched Source (defrecord FriendsOf [id] DataSource (fetch [_] (remote-‐req id (set (range id)))) BatchedSource (fetch-‐multi [_ users] (let [ids (cons id (map :id users))] (-‐>> ids (map (juxt identity (comp set range))) (into {}) (remote-‐req ids)))))

Slide 36

Slide 36 text

Batched Source (run!! (friends-‐of-‐friends 3)) ;; —> 3 .. 433.9830317453879 ;; <— 3 ;; —> (0 1 2) .. 268.8396567924334 ;; <— (0 1 2) ;; #{0 1}

Slide 37

Slide 37 text

What Can It Do For You? ‣ Runs independent data fetches concurrently • Uses BFS to group fetches level-by-level ‣ Caches previously made fetches during execution ‣ Batches requests when applicable

Slide 38

Slide 38 text

With Muse

Slide 39

Slide 39 text

With Muse T1 P1 P2 P3 P4 U1 US1 U2 US2 U3 US3 Timeline Posts Users & Scores

Slide 40

Slide 40 text

With Muse T1 P1 P2 P3 P4 U1 US1 U2 US2 U3 US3 Timeline Posts Users & Scores

Slide 41

Slide 41 text

With Muse T1 P1 P2 P3 P4 U1 US1 U2 US2 U3 US3 Timeline Posts Users & Scores

Slide 42

Slide 42 text

With Muse ‣ how did we achieve this? ‣ nice separation of concerns: • muse to tell WHAT do I want to do • core.async to tell HOW ‣ generalized abstraction that doesn’t know nothing about concrete data storages

Slide 43

Slide 43 text

Known Restrictions ‣ Assumes your data fetches are “side-eﬀect free” • You should not rely on the order ‣ You need enough memory to store fetches ‣ Uses core.async to run fetches concurrently

Slide 44

Slide 44 text

Future Plans ‣ Better error handling ‣ Debug-mode to trace all fetches with latencies ‣ Applicative functors interface ‣ Get rid of fmap & flat-map ‣ ClojureScript support