Efficient, Concurrent and Concise Data Access in Clojure

Efficient, Concurrent and Concise Data Access in Clojure

Often times your business logic relies on remote data that you need to fetch from different sources: databases, caches, web services or 3rd party APIs, and you can't mess things up. You need to keep your business logic clear of low-level details while performing efficiently: fetching data in parallel, batching requests, handling failures, retries, and so forth. There are a few projects that aim to solve this problem: Haxl (open-source Haskell library from Facebook) and Stitch (Scala library from Twitter, not open-sourced yet but a few talks are available). Both projects give you the ability to access remote data sources in a concise and consistent way, while the library handles batching and overlapping requests to multiple data sources behind the scenes. How can you solve the same problem with Clojure code? In this talk we'll go over concrete examples and see how to achieve the same results with Clojure.

B9b7a5ffa24e2af6f877a7950461ba0f?s=128

Oleksii Kachaiev

June 26, 2015
Tweet

Transcript

  1. Reinventing Haxl Efficient & Concise Access to Remote Data by

    Alexey Kachayev for EuroClojure 2015
  2. About Me ‣ Alexey Kachayev, @kachayev ‣ CTO at Attendify.com

    • Almost-all-day-coding-kind-of-CTO ‣ Active open source contributor
  3. Agenda ‣ The Problem ‣ The Solution ‣ (applause) ‣

    Future Plans
  4. Multi-Level Data

  5. Multi-Level Data friendship payload likes comments claims profile mentions tags

    spam notes
  6. Simplest “Fetch”

  7. Simplest “Fetch” ‣ Pros • concise & clean ‣ Cons

    • very slow with sequential HTTP/TCP calls • duplicate requests (fetch same User a few times) • hard to batch requests (i.e. use MGET for Redis)
  8. core.async & dedup

  9. Optimized “Fetch” ‣ Optimization leads to unnecessary complexity in code

    ‣ Hard to dig through the code ‣ We want concise code that operates efficiently ‣ Not only a Clojure problem
  10. Known Solutions

  11. Known Solutions ‣ Haxl - Haskell library, Facebook, open sourced

    • Key idea: Applicative functors to manage dependencies between data fetches under the hood with implicit concurrency & batches
  12. Known Solutions ‣ Stitch - Scala library, Twitter, not open

    sourced • Key idea: Build declarative AST of all data fetch operations and run special interpreter that will choose the most efficient evaluation strategy
  13. Known Solutions ‣ The idea behind Stitch should sound familiar

    ‣ go macro from core.async uses the same approach
  14. What About Clojure? ‣ Muse library ‣ github.com/kachayev/muse ‣ Open

    sourced a couple of weeks ago ‣ Still in active development ‣ Uses the idea of building and interpreting AST ‣ Uses core.async to deal with concurrency
  15. Protocols (defprotocol  DataSource      (fetch  [this]))      

      (defprotocol  LabeledSource      (resource-­‐id  [this]))         (defprotocol  BatchedSource      (fetch-­‐multi  [this]))
  16. Protocols (defprotocol  DataSource      (fetch  [this]))      

      (defprotocol  LabeledSource      (resource-­‐id  [this]))         (defprotocol  BatchedSource      (fetch-­‐multi  [this]))
  17. Latency (defn  remote-­‐req  [id  result]      (let  [wait  (rand

     1000)]          (println  "-­‐-­‐>"  id  ".."  wait)          (go            (<!  (timeout  wait))            (println  "<-­‐-­‐"  id)            result)))
  18. Latency (defn  remote-­‐req  [id  result]      (let  [wait  (rand

     1000)]          (println  "-­‐-­‐>"  id  ".."  wait)          (go            (<!  (timeout  wait))            (println  "<-­‐-­‐"  id)            result)))
  19. DataSource (defrecord  Speaker  [id]      DataSource      (fetch

     [_]          (remote-­‐req  id  {:id  id                                          :name  "Alexey"                                          :topic  "DataSource"                                          :slides  (inc  id)})))
  20. Operations Tree (fmap  :name  (Speaker.  42))   ;;  #<MuseMap  (:name

     Speaker[42])>   (fmap  inc  (fmap  :slides  (Speaker.  42)))   ;;  #<MuseMap  ...>   (fmap  compare              (fmap  :slides  (Speaker.  3))              (fmap  :slides  (Speaker.  7)))   ;;  #<MuseMap  ...>
  21. Operations Tree (fmap  :name  (Speaker.  42))   ;;  #<MuseMap  (:name

     Speaker[42])>   (fmap  inc  (fmap  :slides  (Speaker.  42)))   ;;  #<MuseMap  ...>   (fmap  compare              (fmap  :slides  (Speaker.  3))              (fmap  :slides  (Speaker.  7)))   ;;  #<MuseMap  ...>
  22. Nested Data (defrecord  Speaker  [id]      DataSource    

     (fetch  [_]          (remote-­‐req  id  {:name  (str  "Alexey  #"  id)})))   (defrecord  Session  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:id  id                                          :topic  "Haxl  in  Clojure"                                          :slides  (inc  id)                                          :speaker  (dec  id)})))
  23. Nested Data (defrecord  Speaker  [id]      DataSource    

     (fetch  [_]          (remote-­‐req  id  {:name  (str  "Alexey  #"  id)})))   (defrecord  Session  [id]      DataSource      (fetch  [_]          (remote-­‐req  id  {:id  id                                          :topic  "Reinventing  Haxl"                                          :slides  (inc  id)                                          :speaker  (dec  id)})))
  24. Nested Data (defn  speaker-­‐name  [speaker-­‐id]      (fmap  :name  (Speaker.

     speaker-­‐id)))   (defn  who-­‐speaks-­‐at  [id]      (flat-­‐map  #(speaker-­‐name  (:speaker  %))                          (Session.  id)))
  25. Nested Data (defn  speaker-­‐name  [speaker-­‐id]      (fmap  :name  (Speaker.

     speaker-­‐id)))   (defn  who-­‐speaks-­‐at  [id]      (flat-­‐map  #(speaker-­‐name  (:speaker  %))                          (Session.  id)))
  26. Runner (run!  (Speaker.  10))   ;;  #<ManyToManyChannel  ...>   (<!!

     (run!  (Speaker.  10)))   ;;  {:name  "Alexey  #10"}   (run!!  (fmap  :name  (Speaker.  10)))   ;;  "Alexey"   (run!!  (who-­‐speaks-­‐at  20))   ;;  "Alexey  #19"
  27. Runner (run!  (Speaker.  10))   ;;  #<ManyToManyChannel  ...>   (<!!

     (run!  (Speaker.  10)))   ;;  {:name  "Alexey  #10"}   (run!!  (fmap  :name  (Speaker.  10)))   ;;  "Alexey"   (run!!  (who-­‐speaks-­‐at  20))   ;;  "Alexey  #19"
  28. Common Friends (require  '[clojure.set  :refer  :all])   (defrecord  FriendsOf  [id]

         DataSource      (fetch  [_]  (remote-­‐req  id  (set  (range  id)))))   (defn  common-­‐friends  [x  y]      (fmap  intersection  (FriendsOf.  x)  (FriendsOf.  y)))   (defn  num-­‐common-­‐friends  [x  y]      (fmap  count  (common-­‐friends  x  y)))
  29. Common Friends (require  '[clojure.set  :refer  :all])   (defrecord  FriendsOf  [id]

         DataSource      (fetch  [_]  (remote-­‐req  id  (set  (range  id)))))   (defn  common-­‐friends  [x  y]      (fmap  intersection  (FriendsOf.  x)  (FriendsOf.  y)))   (defn  num-­‐common-­‐friends  [x  y]      (fmap  count  (common-­‐friends  x  y)))
  30. Common Friends (run!!  (num-­‐common-­‐friends  3  4))   ;;  —>  3

     ..  335.57122610718227   ;;  —>  4  ..  125.78371543747402   ;;  <—  4   ;;  <—  3   ;;  3
  31. Common Friends (run!!  (num-­‐common-­‐friends  5  5))   ;;  —>  5

     ..  145.0165111103837   ;;  <—  5   ;;  5  
  32. Friends Of Friends (defn  friends-­‐of-­‐friends  [id]      (-­‐>>  (FriendsOf.

     id)                (traverse  -­‐>FriendsOf)                (fmap  (partial  apply  union))))
  33. Friends Of Friends (run!!  (friends-­‐of-­‐friends  3))   ;;  —>  3

     ..  268.0963965301999   ;;  <—  3   ;;  —>  0  ..  233.01360724232333   ;;  —>  1  ..  424.35908747904415   ;;  —>  2  ..  778.2748225589665   ;;  <—  0   ;;  <—  1   ;;  <—  2   ;;  #{0  1}
  34. Protocols (defprotocol  DataSource      (fetch  [this]))      

      (defprotocol  LabeledSource      (resource-­‐id  [this]))         (defprotocol  BatchedSource      (fetch-­‐multi  [this]))
  35. Batched Source (defrecord  FriendsOf  [id]      DataSource    

     (fetch  [_]  (remote-­‐req  id  (set  (range  id))))      BatchedSource      (fetch-­‐multi  [_  users]          (let  [ids  (cons  id  (map  :id  users))]              (-­‐>>  ids                        (map  (juxt  identity  (comp  set  range)))                        (into  {})                        (remote-­‐req  ids)))))
  36. Batched Source (run!!  (friends-­‐of-­‐friends  3))   ;;  —>  3  ..

     433.9830317453879   ;;  <—  3   ;;  —>  (0  1  2)  ..  268.8396567924334   ;;  <—  (0  1  2)   ;;  #{0  1}
  37. What Can It Do For You? ‣ Runs independent data

    fetches concurrently • Uses BFS to group fetches level-by-level ‣ Caches previously made fetches during execution ‣ Batches requests when applicable
  38. With Muse

  39. With Muse T1 P1 P2 P3 P4 U1 US1 U2

    US2 U3 US3 Timeline Posts Users & Scores
  40. With Muse T1 P1 P2 P3 P4 U1 US1 U2

    US2 U3 US3 Timeline Posts Users & Scores
  41. With Muse T1 P1 P2 P3 P4 U1 US1 U2

    US2 U3 US3 Timeline Posts Users & Scores
  42. With Muse ‣ how did we achieve this? ‣ nice

    separation of concerns: • muse to tell WHAT do I want to do • core.async to tell HOW ‣ generalized abstraction that doesn’t know nothing about concrete data storages
  43. Known Restrictions ‣ Assumes your data fetches are “side-effect free”

    • You should not rely on the order ‣ You need enough memory to store fetches ‣ Uses core.async to run fetches concurrently
  44. Future Plans ‣ Better error handling ‣ Debug-mode to trace

    all fetches with latencies ‣ Applicative functors interface ‣ Get rid of fmap & flat-map ‣ ClojureScript support
  45. Future Plans ‣ Looking for feedback from adopters ‣ Stay

    tuned for more!
  46. Thank You! Questions?