Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Duality of Transducers

The Duality of Transducers

An introduction to Clojure's transducers aiming to finally get you comfortable and using them.

Discusses how to use transducers, how to implement them, and when they might be the tool of choice.

Cameron Desautels

January 11, 2024
Tweet

More Decks by Cameron Desautels

Other Decks in Programming

Transcript

  1. Presentation Goals 1. Introduce transducers. 2. Show how to use

    them. 3. Explain the basics of how they are implemented. 4. Describe their tradeoffs, and how to choose between transducers and other similar mechanisms. 5. Convince you that you should probably be using tranducers today. 2
  2. Introduction Per the : Transducers… "are composable algorithmic transformations" "are

    independent from the context of their input and output sources" "specify only the essence of the transformationin terms of an individual element" "are decoupled from input or output sources" "can be used in many different processes—collections, streams, channels, observables, etc." "compose directly, without awareness of input or creation of intermediate aggregates" Transducers reference page on clojure.org 3
  3. An Aside About Clojure Docs Clojure documentation can be brilliantly

    dense—or is it densely brilliant? Every word carries unique semantic weight A single missed word can cause the reader to lose the thread Not necessarily an indictment—high density may be optimal for the experienced But having additional education resources is important 4
  4. Obligatory Etymology transducer (n.) 1924, "device which converts energy from

    one form to another," from Latin transducere / traducere "lead across, transfer, carry over," from trans "across, beyond" + ducere "to lead" (from PIE root *deuk- "to lead"). — In Clojure, it may also be helpful to think of this like transform + reducer. Online Etymology Dictionary 5
  5. History & Status Transducers were introduced in Clojure 1.7 in

    2015. 20+ functions in clojure.core can produce transducers. Adopted into other languages: JavaScript (Transducers-JS, RxJS, etc.) Python Surely others Mature, deeply-integrated, and well-regarded. 6
  6. What Problem are We Solving? 1. Traditional transformations (map, reduce,

    etc.) complect transformation with sequence types / "processes" (collections, streams, etc.). e.g.map returns a lazy sequence… If we map over a vector, do we want a lazy sequence? What if we map over a set? What about "mapping" over the values passing through a channel or an observable? Should each of these require their own function(s)? (e.g. mapv) Or is there a more generic way to tackle this? 7
  7. 2. Performance Lazy sequences have a very real performance cost

    (more on this later) Should we be required to pay that cost if we don't ultimately want a lazy sequence? Composing transformations compounds the overhead (e.g. mapping twice) Quashing this can reduce readability n.b.mapv is only one small piece of the story 8
  8. Lazy Sequence Recap Provide / produce elements on demand In

    batches of 32 Buffered pull model Often we ignore these qualities …and sometimes get reminded when… 1. Side effects in map aren't realized 2. A lazy seq escapes a with-open and we try to read from a closed file 9
  9. They have a lot in common with Unix pipelines: Complex,

    blocking control flow "Processes" connected by bounded buffers (backpressure) Max data size not bounded by memory Challenging to write directly in, say, C Differences: pipe version is actually parallel, buffered in bytes (not elements), doesn't wait for pull. (require '[clojure.java.io :as io]) (->> "transducers.org" io/reader line-seq (filter (partial re-find #"transducers")) (take 3)) cat transducers.org | grep transducers | head -3 11
  10. Basic Usage (seq → seq) Using transducers is easy, so

    we'll start there and work our way back to how they work and how to implement them. 12
  11. We'll use this data for a few examples… (require '[clojure.string

    :as str]) (def langs [{:id :clojure, :family :lisp, :created 2007} {:id :racket, :family :lisp, :created 1995} {:id :haskell, :family :ml, :created 1990}]) (def cur-year 2024) 13
  12. A familiar transformation pipeline: Such pipelines can be mechanically transformed

    to transducers: (->> langs (filter (comp #{:lisp} :family)) (map (comp str/capitalize name :id)) (take 1)) ("Clojure") (->> langs (into [] (comp (filter (comp #{:lisp} :family)) (map (comp str/capitalize name :id)) (take 1)))) ["Clojure"] 14
  13. Notice: 1. Much of the code remained the same. 2.

    We chose the resulting sequence type ([]). 3. The transformations are directly composed into a single function (that we could let, def, or pass in). 4. …but they appear to be missing arguments. 5. …and their order of application appears backwards. When starting out, I recommend focusing on the mechanical transformation from the familiar form. (->> langs (into [] (comp (filter (comp #{:lisp} :family)) (map (comp str/capitalize name :id)) (take 1)))) 15
  14. How Does This Work? map, filter, and take (and many

    others) have a 1-arity form which elides the input collection and returns a transducer Transducers compose with comp …in an order that may counter-intuitive—except in mechanical transformation from ->> into has a 3-arity form that applies a transducer to each element of the sequence (->> langs (into [] (comp (filter (comp #{:lisp} :family)) (map (comp str/capitalize name :id)) (take 1)))) 16
  15. What Did We Gain? Composable transformations Control over the result

    type Improved performance No intermediate sequences Efficient use of transients under the hood Compound transformations merged into one pass No lazy seq overhead Increased applicability Transform a core.async channel Extend your own abstractions with transduction (observables, Kafka streams, zippers? Anywhere you want.) À la carte laziness, eager by default 17
  16. Composable Transformations Aren't (regular sequence) functions "composable algorithmic transformations"? Sort

    of. The sequences compose. #'user/lisps #'user/lang-names ("Clojure" "Racket") (def lisps (filter (comp #{:lisp} :family) langs)) (def lang-names (map (comp str/capitalize name :id) lisps)) lang-names 18
  17. With transducers, the transformations compose. #'user/only-lisps #'user/lang-names ["Clojure" "Racket"] #{"Clojure"

    "Racket"} (def only-lisps (filter (comp #{:lisp} :family))) (def lang-names (map (comp str/capitalize name :id))) (into [] (comp only-lisps lang-names) langs) (into #{} (comp only-lisps lang-names) langs) 19
  18. Compound Transformation Merging Ideally take would have come before map

    to avoid unnecessary work. Even worse, I could have written it like this, generating 4 intermediate sequences: This is even more likely if the pipeline is large and composed in various places. (->> langs (filter (comp #{:lisp} :family)) (map (comp str/capitalize name :id)) (take 1)) (->> langs (filter (comp #{:lisp} :family)) (map :id) (map name) (map str/capitalize) (take 1)) 20
  19. Chunking helps a lot with the map before take problem,

    but the ideal solution would look more like: But this requires controlling the entire transformation pipeline, and is often less readable. (->> langs (keep (fn [x] (when (#{:lisp} (:family x)) (-> x :id name str/capitalize)))) (take 1)) 21
  20. Transducers pull precisely what they need (element-wise) i.e.map before take

    doesn't matter n.b. lazy sequences are also a pull model, but operate in chunks (of 32) Transducers compose the individual mapping operations Meaning we can write it in the most expressive format And we don't need to control the entire transformation pipeline to get an optimal result 22
  21. Performance Benchmarks Clojure Goes Fast has that's worth reading fully.

    Basic numbers on a simple, multi-step mapping: Variant Time per call Alloc per call Lazy map 410.22 µs 480,296 b Eager mapv 63.66 µs 28,465 b Transducers 43.95 µs 6,264 b a great breakdown The eager version utilizing mapv is 6.5 times faster and allocates 16 times less for the same result […] The transducer version is even faster at 44 µs and even less garbage spawned because it fuses all the mappings into a single step. 23
  22. Basic Usage (reduce-like) With reduce: With transduce: (->> langs (map

    :created) (map #(- cur-year %)) (reduce +)) 80 (->> langs (transduce (comp (map :created) (map #(- cur-year %))) +)) 80 24
  23. The Homogenous Type (reduce + [1 2 3]) 6 (reduce

    merge [{:a 1} {:b 2} {:c 3}]) {:a 1, :b 2, :c 3} (reduce (fn [acc s] (str acc " " s)) "Users:" ["alice" "bob" "charlie"]) "Users: alice bob charlie" 27
  24. The Asymmetric / Aggregative Type We always supply a initial

    value with these because the first value in the input collection (which would otherwise be used) doesn't have the right type to be the accumulator. (reduce (fn [acc x] (update acc x (fnil inc 0))) {} [:a :b :c :a]) {:a 2, :b 1, :c 1} 28
  25. map #'user/map' [2 3 4] (defn map' [f coll] (reduce

    (fn [acc x] (conj acc (f x))) [] coll)) (map' inc [1 2 3]) 30
  26. filter #'user/filter' [1 3] (defn filter' [pred coll] (reduce (fn

    [acc x] (if (pred x) (conj acc x) acc)) [] coll)) (filter' odd? [1 2 3]) 31
  27. One particularly notable reducing function is conj: #'user/into' #{:c :b

    :a} (defn into' [to from] (reduce conj to from)) (into' #{} [:a :a :b :c]) 33
  28. Duality Wave-particle duality of light Physicists famously discovered light exhibits

    properties of wave and particles in light (quantum physics) Both at once Some experiments are more understandable when treating one way or another 34
  29. Dualities aren't foreign to Clojure programmers Sometimes forms are code,

    sometimes data (in macros) Sometimes first-order functions are data, sometimes they're behavior (higher-order functions) Sometimes maps / keywords are functions 35
  30. Extended Reducing Functions Transducers extend the notion of a reducing

    function with two additional arities. Arity Purpose Description 0 Init Call the Init arity on the nested transform / transducing process 1 Completion Produce a final value or flush state 2 Step Standard reducer function behavior 39
  31. Implementing a Basic Transducer Let's implement a basic transducer that

    increments each value it sees: #'user/inc-xf [2 3 4] [2 4] (defn inc-xf [rf] (fn ([] (rf)) ([x] (rf x)) ([acc x] (rf acc (inc x))))) (into [] inc-xf [1 2 3]) (into [] (comp (filter odd?) inc-xf) [1 2 3]) 40
  32. Generalizing to map If that was inc, clearly map is

    something like: #'user/map-xf [2 3 4] [2 4] (defn map-xf [f] (fn [rf] (fn ([] (rf)) ([x] (rf x)) ([acc x] (rf acc (f x)))))) (into [] (map-xf inc) [1 2 3]) (into [] (comp (filter odd?) (map-xf inc)) [1 2 3]) 41
  33. From map to filter From map we can imagine filter:

    #'user/filter-xf [1 3] [2 4] (defn filter-xf [pred] (fn [rf] (fn ([] (rf)) ([x] (rf x)) ([acc x] (if (pred x) (rf acc x) acc))))) (into [] (filter-xf odd?) [1 2 3]) (into [] (comp (filter-xf odd?) (map-xf inc)) [1 2 3]) 42
  34. Why This rf Business? A transducer's job is to transform

    and call whatever comes next to accumulate (zero to many times) It can't do the accumulation or the data structure gets baked in We escape infinite regress with a final non-transducer reducing function i.e. one that doesn't call out to yet another reducing function Providing this is the job of the transducible process (((map inc) conj) [] 0) [1] 43
  35. Transducible Processes Drives the reduction Answers "what ultimately happens" (where

    values go) #'user/transduce' 9 #'user/into' #{4 3 2} (defn transduce' [xform f init coll] (let [f' (xform f)] (f' (reduce f' init coll)))) (transduce' (map inc) + 0 [1 2 3]) (defn into' [to xform from] (transduce' xform conj to from)) (into' #{} (map inc) [1 2 3]) 44
  36. Additional Reading reduced / Transducer examples / utilities Stateful transducers

    early termination henrygarner/redux cgrand/xforms 45