Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Streams to Object Algebras to Fusion via Staging

From Streams to Object Algebras to Fusion via Staging

Talk at CWI

Aggelos Biboudis

February 02, 2016
Tweet

More Decks by Aggelos Biboudis

Other Decks in Research

Transcript

  1. From Streams to Object Algebras to Fusion via Staging Aggelos

    Biboudis www.twitter.com/biboudis CWI, Research Intern, Jan 2016
  2. Hello! • Name stuff • Άγγελος Μπιμπούδης (in greek) •

    Angelos Bimpoudis (ISO) • Angelos Mpimpoydhs (what the university gave me) • Aggelos Biboudis (how I always preferred :P) • Engel Bimpoudis (in Dutch?) • PhD Student, University of Athens, advised by Yannis Smaragdakis
  3. Our Quest 1. Clash of the Lambdas 
 (Biboudis, Palladinos,

    Smaragdakis, ICOOOLPS 14) 2. Streams a la Carte
 (Biboudis, Palladinos, Fourtounis, Smaragdakis, ECOOP 15) 3. Stream Fusion, Staged 
 (with Oleg Kiselyov, Palladinos, in progress, ICFP 16?) 4. Internship in CWI
 (WITH SWAT, OOPSLA 16?)
  4. What did we want to know? • Performance of streaming

    APIs • Benchmarks in Scala, Java, C#, F# • Bird’s eye view of the lambda translation techniques • Simple pipelines • Sequential and Parallel, Windows and Linux • Optimising Frameworks (ScalaBlitz, LinqOptimizer): declarative queries -> loop based code
  5. Scala, Lambdas • lambda: a class that extends scala.runtime.AbstractFunction[0-22] is

    generated • lambda /w free variables: the generated class includes private member fields that get initialised at instantiation time
  6. Scala, Streams • Views for Iterable collections are defined by

    re- interpreting the iterator method • e.g., 3 virtual calls (next, hasNext, f) per element pointed by the iterator of a map operation
  7. Java, Lambdas • A lambda can be used anywhere a

    Functional Interface is needed • Like inner and anonymous classes, lambdas can capture variables • invokedynamic refers to a recipe instead of generating bytecode (1-time cost) • class-generation at compile time is avoided • fewer classes for class-loading • favours inlining optimisations (e.g., in non-capturing lambdas we get constant loads)
  8. Java, Streams • The main philosophy is: stream & bits

    of characteristics • source/generator |> lazy |> lazy |> lazy |> terminal • intermediate ops are gathered in CPS-style • a compact data structure of transformations is applied at the source • flow of characteristics can be used for optimisations • a bulk operation can be optimised (e.g. a do … while loop)
  9. C#/F#, Lambdas • C# lambdas are always assigned to delegates.

    If they capture free variables, these are fields in a compiler-generated type (otherwise just static- methods). • F# lambdas are represented as compiler-generated classes that inherit FSharpFunc<T,R>
  10. C#/F#, Streams • LINQ introduced in C# 3.0. • as

    fluent-style method calls
 nums.Where(x => x % 2 == 0).Select(x => x * x).Sum(); • with the equivalent query comprehension syntactic sugar
 (from x in nums
 where x % 2 == 0
 select x * x).Sum(); • F# is inspired by OCaml and first class citizen of .NET. 
 The Seq module is the Streaming API of F#
 nums |> Seq.filter(funx->x%2=0) 
 |> Seq.map (fun x -> x * x) 
 |> Seq.sum
  11. C#/F#, Streams • IEnumerable is a factory for IEnumerator objects

    • IEnumerator keeps state of the iteration (Current & MoveNext) e.g., a .Select(func) combinator: • Returns a SelectEnumerable object encapsulating the inner source. We get a SelectEnumerator that passes inner’s Current to func. • 3 virtual calls (MoveNext, Current, func) per element per iterator.
  12. What was missing? • Current libraries: fixed behaviour and operators

    • We wanted pluggable semantics 
 (e.g., push vs. pull) • And the ability to retrofit new operators
  13. Why pluggable semantics? • Operators are naturally push or pull

    㱺 variable performance • Extra functionality mixed in, e.g.: o Log with push • Fuse with pull • Blocking or not with push or pull
  14. • We proposed a library design based on object algebras

    • Provided extensible streams with: • Pluggable operators • Pluggable behaviours • Mixed-in behaviours • Affect performance (in a good way)
  15. Use

  16. Use

  17. Cool stuff • we saw implementations that • do not

    support zip (being naturally pull) • push to pull transition is not performant • infinite & early termination combinators do not play well • for-based loops are not yet matched
  18. More cool stuff • Stream Fusion is still an open

    problem: • Duncan Coutts et al. (Stream Fusion 2007) proposed a deforestation technique that still didn’t fuse concatMap (a pull-based approach btw) • Andrew Farmer et al. (Hermit in the Stream 2014) use Hermit (not just ghc RULES) • Implement list fusion using streams instead of foldr/ build (ticket opened 9 years ago, “we close this ticket as requiring more research”)
  19. Staging 101 let sum = fold (fun z a →

    .⟨∼a + ∼z⟩.) .⟨0⟩. ofArr .⟨arr⟩. |> map (fun x →.⟨∼x ∗ ∼x⟩.) |> sum
  20. Streams, Staged ( zip_with (fun e1 e2 → .⟨(∼e1, ∼e2)⟩.)

    (ofArr .⟨arr1⟩. |> map (fun x →.⟨∼x ∗ ∼x⟩.) |> take .⟨12⟩. |> filter (fun x →.⟨∼x mod 2 = 0⟩.) |> map (fun x →.⟨∼x ∗ ∼x⟩.)) (iota .⟨1⟩. |> flat_map (fun x → iota .⟨∼x+ 1⟩. |> take .⟨3⟩.) |> filter (fun x →.⟨∼x mod 2=0⟩.)) ) |> fold (fun z a → .⟨∼a :: ∼z⟩.) .⟨[]⟩.
  21. Stream Fusion, Staged type 'a stream_val = | Skip of

    bool | El of 'a code | Stream of 'a repr and ('a,'st) gen = { init: 'w. ('st -> 'w code) -> 'w code; advance: 'w. 'st -> ('a stream_val -> 'w code) -> 'w code } and 'a repr = G : ('a,'st) gen -> 'a repr
  22. init( (st -> .<let rec loop acc = .~(advance st

    (elem -> …) in loop (zero) >.)))
  23. init(init( (st -> .<let rec loop acc = .~(advance st

    (elem -> (advance(…)) in loop (zero) >.)))
  24. init(init(init(st -> .<let rec loop acc = .~(advance st (elem

    -> advance(advance(…)))) in loop (zero) >.)))
  25. Main observation • Interfaces of object algebras resemble to computation

    expressions methods • Computation expressions is a language feature of F# to allow overridable semantics • Computation expressions = monadic bind + monadic return + methods that correspond to the syntax of F#
  26. Computation Expressions let getLength url = async { let! html

    = fetchAsync url do! Async.Sleep 1000 return html.Length }
  27. Computation Expressions Method Typical signature(s) Bind M<'T> * ('T ->

    M<'U>) -> M<'U> Delay (unit -> M<'T>) -> M<'T> Return T -> M<'T> ReturnFrom M<'T> -> M<'T> Run M<'T> -> M<'T> Combine M<'T> * M<'T> -> M<'T> For seq<'T> * ('T -> M<'U>) -> M<'U> TryFinally M<'T> * (unit -> unit) -> M<'T> TryWith M<'T> * (exn -> M<'T>) -> M<'T> Using T * ('T -> M<'U>) -> M<'U> when 'U :> IDisposable While (unit -> bool) * M<'T> -> M<'T> Yield T -> M<'T> YieldFrom M<'T> -> M<'T> Zero unit -> M<'T>
  28. So far… • Dart has an interesting approach with multiple

    continuation types (statements, expressions, exceptions, etc) • F# uses continuation monad for asynchronous workflows • async/await in C# and Dart as essentially, reset/shift in languages that support it • await in expressions means that we could write int i = 1 + await (async { some computation }) + 3 • scala supports reset/shift • regular continuations (T -> Unit) -> Unit • polymorphic in the return type (A -> B) -> C • a syntax driven shift/reset with with bind and return is super easy • Filinski’s Representing Monads is followed by effects people (all monads are encoded as the cont monad) • scala-virtualized follows tagless interpreters approach (thus object algebras), not Filinski’s
  29. If we could: A. DSL’s that look like java with

    extended syntax B. Semantics as libraries C. async, cloud, staging