Slide 1

Slide 1 text

Reinventing Haxl Efficient & Concise Access to Remote Data by Alexey Kachayev for EuroClojure 2015

Slide 2

Slide 2 text

About Me ‣ Alexey Kachayev, @kachayev ‣ CTO at Attendify.com • Almost-all-day-coding-kind-of-CTO ‣ Active open source contributor

Slide 3

Slide 3 text

Agenda ‣ The Problem ‣ The Solution ‣ (applause) ‣ Future Plans

Slide 4

Slide 4 text

Multi-Level Data

Slide 5

Slide 5 text

Multi-Level Data friendship payload likes comments claims profile mentions tags spam notes

Slide 6

Slide 6 text

Simplest “Fetch”

Slide 7

Slide 7 text

Simplest “Fetch” ‣ Pros • concise & clean ‣ Cons • very slow with sequential HTTP/TCP calls • duplicate requests (fetch same User a few times) • hard to batch requests (i.e. use MGET for Redis)

Slide 8

Slide 8 text

core.async & dedup

Slide 9

Slide 9 text

Optimized “Fetch” ‣ Optimization leads to unnecessary complexity in code ‣ Hard to dig through the code ‣ We want concise code that operates efficiently ‣ Not only a Clojure problem

Slide 10

Slide 10 text

Known Solutions

Slide 11

Slide 11 text

Known Solutions ‣ Haxl - Haskell library, Facebook, open sourced • Key idea: Applicative functors to manage dependencies between data fetches under the hood with implicit concurrency & batches

Slide 12

Slide 12 text

Known Solutions ‣ Stitch - Scala library, Twitter, not open sourced • Key idea: Build declarative AST of all data fetch operations and run special interpreter that will choose the most efficient evaluation strategy

Slide 13

Slide 13 text

Known Solutions ‣ The idea behind Stitch should sound familiar ‣ go macro from core.async uses the same approach

Slide 14

Slide 14 text

What About Clojure? ‣ Muse library ‣ github.com/kachayev/muse ‣ Open sourced a couple of weeks ago ‣ Still in active development ‣ Uses the idea of building and interpreting AST ‣ Uses core.async to deal with concurrency

Slide 15

Slide 15 text

Protocols (defprotocol  DataSource      (fetch  [this]))         (defprotocol  LabeledSource      (resource-­‐id  [this]))         (defprotocol  BatchedSource      (fetch-­‐multi  [this]))

Slide 16

Slide 16 text

Protocols (defprotocol  DataSource      (fetch  [this]))         (defprotocol  LabeledSource      (resource-­‐id  [this]))         (defprotocol  BatchedSource      (fetch-­‐multi  [this]))

Slide 17

Slide 17 text

Latency (defn  remote-­‐req  [id  result]      (let  [wait  (rand  1000)]          (println  "-­‐-­‐>"  id  ".."  wait)          (go            (

Slide 18

Slide 18 text

Latency (defn  remote-­‐req  [id  result]      (let  [wait  (rand  1000)]          (println  "-­‐-­‐>"  id  ".."  wait)          (go            (

Slide 19

Slide 19 text

DataSource (defrecord  Speaker  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:id  id                                          :name  "Alexey"                                          :topic  "DataSource"                                          :slides  (inc  id)})))

Slide 20

Slide 20 text

Operations Tree (fmap  :name  (Speaker.  42))   ;;  #   (fmap  inc  (fmap  :slides  (Speaker.  42)))   ;;  #   (fmap  compare              (fmap  :slides  (Speaker.  3))              (fmap  :slides  (Speaker.  7)))   ;;  #

Slide 21

Slide 21 text

Operations Tree (fmap  :name  (Speaker.  42))   ;;  #   (fmap  inc  (fmap  :slides  (Speaker.  42)))   ;;  #   (fmap  compare              (fmap  :slides  (Speaker.  3))              (fmap  :slides  (Speaker.  7)))   ;;  #

Slide 22

Slide 22 text

Nested Data (defrecord  Speaker  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:name  (str  "Alexey  #"  id)})))   (defrecord  Session  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:id  id                                          :topic  "Haxl  in  Clojure"                                          :slides  (inc  id)                                          :speaker  (dec  id)})))

Slide 23

Slide 23 text

Nested Data (defrecord  Speaker  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:name  (str  "Alexey  #"  id)})))   (defrecord  Session  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:id  id                                          :topic  "Reinventing  Haxl"                                          :slides  (inc  id)                                          :speaker  (dec  id)})))

Slide 24

Slide 24 text

Nested Data (defn  speaker-­‐name  [speaker-­‐id]      (fmap  :name  (Speaker.  speaker-­‐id)))   (defn  who-­‐speaks-­‐at  [id]      (flat-­‐map  #(speaker-­‐name  (:speaker  %))                          (Session.  id)))

Slide 25

Slide 25 text

Nested Data (defn  speaker-­‐name  [speaker-­‐id]      (fmap  :name  (Speaker.  speaker-­‐id)))   (defn  who-­‐speaks-­‐at  [id]      (flat-­‐map  #(speaker-­‐name  (:speaker  %))                          (Session.  id)))

Slide 26

Slide 26 text

Runner (run!  (Speaker.  10))   ;;  #   (

Slide 27

Slide 27 text

Runner (run!  (Speaker.  10))   ;;  #   (

Slide 28

Slide 28 text

Common Friends (require  '[clojure.set  :refer  :all])   (defrecord  FriendsOf  [id]      DataSource      (fetch  [_]  (remote-­‐req  id  (set  (range  id)))))   (defn  common-­‐friends  [x  y]      (fmap  intersection  (FriendsOf.  x)  (FriendsOf.  y)))   (defn  num-­‐common-­‐friends  [x  y]      (fmap  count  (common-­‐friends  x  y)))

Slide 29

Slide 29 text

Common Friends (require  '[clojure.set  :refer  :all])   (defrecord  FriendsOf  [id]      DataSource      (fetch  [_]  (remote-­‐req  id  (set  (range  id)))))   (defn  common-­‐friends  [x  y]      (fmap  intersection  (FriendsOf.  x)  (FriendsOf.  y)))   (defn  num-­‐common-­‐friends  [x  y]      (fmap  count  (common-­‐friends  x  y)))

Slide 30

Slide 30 text

Common Friends (run!!  (num-­‐common-­‐friends  3  4))   ;;  —>  3  ..  335.57122610718227   ;;  —>  4  ..  125.78371543747402   ;;  <—  4   ;;  <—  3   ;;  3

Slide 31

Slide 31 text

Common Friends (run!!  (num-­‐common-­‐friends  5  5))   ;;  —>  5  ..  145.0165111103837   ;;  <—  5   ;;  5  

Slide 32

Slide 32 text

Friends Of Friends (defn  friends-­‐of-­‐friends  [id]      (-­‐>>  (FriendsOf.  id)                (traverse  -­‐>FriendsOf)                (fmap  (partial  apply  union))))

Slide 33

Slide 33 text

Friends Of Friends (run!!  (friends-­‐of-­‐friends  3))   ;;  —>  3  ..  268.0963965301999   ;;  <—  3   ;;  —>  0  ..  233.01360724232333   ;;  —>  1  ..  424.35908747904415   ;;  —>  2  ..  778.2748225589665   ;;  <—  0   ;;  <—  1   ;;  <—  2   ;;  #{0  1}

Slide 34

Slide 34 text

Protocols (defprotocol  DataSource      (fetch  [this]))         (defprotocol  LabeledSource      (resource-­‐id  [this]))         (defprotocol  BatchedSource      (fetch-­‐multi  [this]))

Slide 35

Slide 35 text

Batched Source (defrecord  FriendsOf  [id]      DataSource      (fetch  [_]  (remote-­‐req  id  (set  (range  id))))      BatchedSource      (fetch-­‐multi  [_  users]          (let  [ids  (cons  id  (map  :id  users))]              (-­‐>>  ids                        (map  (juxt  identity  (comp  set  range)))                        (into  {})                        (remote-­‐req  ids)))))

Slide 36

Slide 36 text

Batched Source (run!!  (friends-­‐of-­‐friends  3))   ;;  —>  3  ..  433.9830317453879   ;;  <—  3   ;;  —>  (0  1  2)  ..  268.8396567924334   ;;  <—  (0  1  2)   ;;  #{0  1}

Slide 37

Slide 37 text

What Can It Do For You? ‣ Runs independent data fetches concurrently • Uses BFS to group fetches level-by-level ‣ Caches previously made fetches during execution ‣ Batches requests when applicable

Slide 38

Slide 38 text

With Muse

Slide 39

Slide 39 text

With Muse T1 P1 P2 P3 P4 U1 US1 U2 US2 U3 US3 Timeline Posts Users & Scores

Slide 40

Slide 40 text

With Muse T1 P1 P2 P3 P4 U1 US1 U2 US2 U3 US3 Timeline Posts Users & Scores

Slide 41

Slide 41 text

With Muse T1 P1 P2 P3 P4 U1 US1 U2 US2 U3 US3 Timeline Posts Users & Scores

Slide 42

Slide 42 text

With Muse ‣ how did we achieve this? ‣ nice separation of concerns: • muse to tell WHAT do I want to do • core.async to tell HOW ‣ generalized abstraction that doesn’t know nothing about concrete data storages

Slide 43

Slide 43 text

Known Restrictions ‣ Assumes your data fetches are “side-effect free” • You should not rely on the order ‣ You need enough memory to store fetches ‣ Uses core.async to run fetches concurrently

Slide 44

Slide 44 text

Future Plans ‣ Better error handling ‣ Debug-mode to trace all fetches with latencies ‣ Applicative functors interface ‣ Get rid of fmap & flat-map ‣ ClojureScript support

Slide 45

Slide 45 text

Future Plans ‣ Looking for feedback from adopters ‣ Stay tuned for more!

Slide 46

Slide 46 text

Thank You! Questions?