Stream Fusion, to Completeness

Stream Fusion, to Completeness

Presentation of Stream Fusion, to Completeness at the local, annual, Athens PL Seminar organized by Nick Papaspyrou, Kostis Sagonas and Stathes Zachos at NTUA.

for more information about the project follow: http://strymonas.github.io

B81db221127979fbf254c4ffba7ba286?s=128

Aggelos Biboudis

December 28, 2016
Tweet

Transcript

  1. 1.

    Stream Fusion, to Completeness Oleg Kiselyov1, Aggelos Biboudis2, Nick Palladinos3

    and Yannis Smaragdakis2 14nth Athens Programming Languages Seminar, NTUA 28/12/2016 (to appear at POPL17) 1 Tohoku University 2 University of Athens 3 Nessos I.T.
  2. 2.

    Stream Fusion, to Completeness design a stream fusion library …

    … that supports many and complex combinations of operators … … and generates loop-based, fused code with zero allocations 2
  3. 9.

    Stream Shapes type α stream = ∀ω. ((α,ω) stream_shape !

    ω) ! ω type α stream = {step: unit -> (α, α stream) stream_shape} * Yasuhiko Minamide, Greg Morrisett, and Robert Harper. 1996. Typed closure conversion. In POPL ’96 9
  4. 10.

    Stream Shapes type α stream = ∀ω. ((α,ω) stream_shape !

    ω) ! ω type α stream = ∃σ. σ * (σ ! (α,σ) stream_shape) 10
  5. 11.

    Push let of_arr : 'a array -> 'a stream =

    fun arr -> fun folder -> let s = ref (folder Nil) in for i=0 to Array.length arr - 1 do s := folder (Cons (arr.(i),!s)) done; !s;; let map : ('a -> 'b) -> 'a stream -> 'b stream = fun f str -> fun folder -> str(fun x -> match x with | Nil -> folder Nil | Cons (a, x) -> folder (Cons (f a, x))) let fold : ('z -> 'a -> 'z) -> 'z -> 'a stream -> 'z = fun f z str -> str (function Nil -> z | Cons (a, x) -> f x a) 11 for is isolated, transformations are “pushed” inside applied to the stream function
  6. 12.

    Pull let of_arr : 'a array -> 'a stream =

    let step (i,arr) = if i < Array.length arr then Cons (arr.(i), (i+1,arr)) else Nil in fun arr -> Stream ((0,arr),step);; let map : ('a -> 'b) -> 'a stream -> 'b stream = fun f (Stream (s,step)) -> let new_step = fun s -> match step s with | Nil -> Nil | Cons (a,t) -> Cons (f a, t) in Stream (s,new_step) let fold : ('z -> 'a -> 'z) -> 'z -> 'a stream -> 'z = fun f z (Stream (s,step)) -> let rec loop z s = match step s with | Nil -> z | Cons (a,t) -> loop (f z a) t in loop z s;; 12 loop is driven by the consumer, elements are “pulled”
  7. 13.

    Current Status • no buffers ✓ • closure creation ✗

    • function calls ✗ • deconstructions and constructions of tuples ✗ 13
  8. 14.

    Multi-Stage Programming • think of code templates • brackets to

    create well-{formed, scoped, typed} templates let c = .< 1 + 2 >. • create holes in templates let cf x = .< .~x + .~x >. • cf c = .< (1 + 2) + (1 + 2) >. 14
  9. 15.

    let square x = x * x let rec power

    n x = if n = 0 then 1 else if n mod 2 = 0 then square (power (n/2) x) else x * (power (n-1) x) (* val power : int -> int -> int = <fun> *) let rec spower n x = if n = 0 then .<1>. else if n mod 2 = 0 then .<square .~(spower (n/2) x)>. else .<.~x * .~(spower (n-1) x)>.;; (* val spower : int -> int code -> int code = <fun> *) let spower7_code = .<fun x -> .~(spower 7 .<x>.)>.;; (* fun x_2 -> x_2 * (square(x_2 * (square(x_2 * 1)))) *) Multi-Stage Programming 15 n is static, x is dynamic
  10. 16.

    Staging Streams (step 0) type α stream = ∃σ. σ

    * (σ ! (α,σ) stream_shape) let map : ('a -> 'b ) -> 'a st_stream -> 'b st_stream 16
  11. 17.

    Staging Streams (step 0) type α stream = ∃σ. σ

    code * (σ code ! (α,σ) stream_shape code) let map : ('a code-> 'b code) -> 'a st_stream -> 'b st_stream 17 function inlining • state not known statically + propagation • step is known!
  12. 18.

    Staging map (step 0) let map : ('a code ->

    'b code) -> 'a stream -> 'b stream = fun f (s,step) -> let new_step = fun s -> .<match .~(step s) with | Nil -> Nil | Cons (a,t) -> Cons (.~(f .<a>.), t)>. in (s,new_step);; 18
  13. 19.

    Result (step 0) let rec loop_1 z_2 s_3 = match

    match match s_3 with | (i_4, arr_5) -> if i_4 < (Array.length arr_5) then Cons ((arr_5.(i_4)),((i_4 + 1), arr_5)) else Nil with | Nil -> Nil | Cons (a_6,t_7) -> Cons ((a_6 * a_6), t_7) with | Nil -> z_2 | Cons (a_8,t_9) -> loop_1 (z_2 + a_8) t_9 iterate map fold 19
  14. 20.

    Staging Streams (step 0 again) type α st_stream = ∃σ.

    σ code * (σ code ! (α,σ) stream_shape code) 20 stream_shape should exist at compile time, not dynamically
  15. 21.

    type α st_stream = ∃σ. σ code * (∀ω. σ

    code ! ((α code,σ code) stream_shape ! ω code) ! ω code) Staging Streams (step 1 - fusing the stepper) * Anders Bondorf. 1992. Improving binding times without explicit CPS-conversion. In LFP ’92 * Oleg Kiselyov, Why a program in CPS specializes better, http://okmij.org/ftp/meta-programming/#bti 21
  16. 22.

    Staging map (step 1 - fusing the stepper) let map

    : ('a code -> 'b code) -> 'a st_stream -> 'b st_stream = fun f (s, step) -> let new_step s k = step s @@ function | Nil -> k Nil | Cons (a,t) -> .<let a' = .~(f a) in .~(k @@ Cons (.<a'>., t))>. in (s, new_step) ;; 22
  17. 23.

    Result (step 1 - fusing the stepper) let rec loop_1

    z_2 s_3 = match s_3 with | (i_4, arr_5) -> if i_4 < (Array.length arr_5) then let el_6 = arr_5.(i_4) in let a'_7 = el_6 * el_6 in loop_1 (z_2 + a'_7) ((i_4 + 1), arr_5) else z_2 23
  18. 24.

    type α st_stream = ∃σ. σ code * (∀ω. σ

    code ! ((α code,σ code) stream_shape ! ω code) ! ω code) Staging Streams (step 1 again) 24
  19. 25.

    type α st_stream = ∃σ. (∀ω. (σ ! ω code)

    ! ω code) * (∀ω. σ ! ((α code, unit) stream_shape ! ω code) ! ω code) Staging Streams (step 2 - fusing the state) * Anders Bondorf. 1992. Improving binding times without explicit CPS-conversion. In LFP ’92 * Oleg Kiselyov, Why a program in CPS specializes better, http://okmij.org/ftp/meta-programming/#bti init step 25 • no need to return state • state not dynamic • let insertion in CPS
  20. 26.

    Staging of_arr (step 2 - fusing the state) let of_arr

    : 'a array code -> 'a st_stream = let init arr k = .<let i = ref 0 and arr = .~arr in .~(k (.<i>.,.<arr>.))>. and step (i,arr) k = .<if !(.~i) < Array.length .~arr then let el = (.~arr).(!(.~i)) in incr .~i; .~(k @@ Cons (.<el>., ())) else .~(k Nil)>. in fun arr -> (init arr,step) (int * α array) code ~> int ref code * α array code incr-ing the ref 26
  21. 27.

    Result (step 2) let i_8 = ref 0 and arr_9

    = [|0;1;2;3;4|] in let rec loop_10 z_11 = if ! i_8 < Array.length arr_9 then let el_12 = arr_9.(! i_8) in incr i_8; let a'_13 = el_12 * el_12 in loop_10 (z_11+a'_13) else z_11 27 tail rec, loops? what kind? accumulation is threaded
  22. 28.

    Staging Streams (step 2 again) type α st_stream = ∃σ.

    (∀ω. (σ ! ω code) ! ω code) * (∀ω. σ ! ((α code, unit) stream_shape ! ω code) ! ω code) 28
  23. 29.

    type (α,σ) producer_t = | For of {upb: σ !

    int code; index: σ ! int code ! (α ! unit code) ! unit code} | Unfold of {term: σ ! bool code; step: σ ! (α ! unit code) ! unit code} and α st_stream = ∃σ. (∀ω. (σ ! ω code) ! ω code) * (α,σ) producer_t and α stream = α code st_stream Staging Streams (step 3 - generating imperative loops) 29 • step is refactored • loop-forms • term • element’s structure known!
  24. 30.

    Staging of_arr (step 3 - generating imperative loops) let of_arr

    : 'a array code -> 'a stream = fun arr -> let init k = .<let arr = .~arr in .~(k .<arr>.)>. and upb arr = .<Array.length .~arr - 1>. and index arr i k = .<let el = (.~arr).(.~i) in .~(k .<el>.)>. in (init, For {upb;index}) 30
  25. 31.

    Staging map, fold (step 3 - generating imperative loops) •

    internal combinator to convert a for-based producer to a while-based one (for_unfold) • extract from map a counterpart responsible for code motion • extract from fold a counterpart responsible for loop generation 31
  26. 32.

    Result (step 3 - generating imperative loops) let s_1 =

    ref 0 in let arr_2 = [|0;1;2;3;4|] in for i_3 = 0 to (Array.length arr_2) - 1 do let el_4 = arr_2.(i_3) in let t_5 = el_4 * el_4 in s_1 := !s_1 + t_5 done; !s_1 32
  27. 33.

    Linearity • for each element on the input ~> one

    element on the output • linearity breaks with filtering (0 or 1) and nested streams (flat_maps) (0, 1, or more) • filter is a particular case of flat_map in our system 33
  28. 34.

    Sub-Ranging and Infinite Streams • transform for to while •

    must limit both linear and non-linear cases • allocate a reference cell to hold the termination state (add to state) • propagate the termination test to all producers 34
  29. 35.

    Fusing Parallel Streams stream 1 stream 2 Linear Linear fuse

    whiles (or fors) Linear Non- Linear • advance linear only when we get an element of the non-linear • push termination check of the linear to the non-linear Non- Linear Linear Non- Linear Non- Linear make one linear (a stream ~> (unit -> a option)) 35
  30. 36.

    type card_t = AtMost1 | Many type (α,σ) producer_t =

    | For of {upb: σ ! int code; index: σ ! int code ! (α ! unit code) ! unit code} | Unfold of {term: σ ! bool code; card: card_t; step: σ ! (α ! unit code) ! unit code} and α producer = ∃σ. (∀ω. (σ ! ω code) ! ω code) * (α,σ) producer_t and α st_stream = | Linear of α producer | Nested of ∃β. β producer * (β ! α st_stream) and α stream = α code st_stream Staged Streams 36
  31. 39.

    The takeaways • stream-fusion is a domain-specific optimization • domain-specific

    optimizations are better tackled outside the general purpose compiler • multi-stage programming is not a trivial sprinkling of staging annotations 39
  32. 40.

    Wednesday 18 Jan 2017 16:55 - 17:20, Compiler Optimisation at

    Auditorium, POPL17 40 strymonas.github.io