Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elixir's GenStage and Flow

Elixir's GenStage and Flow

Guilherme de Maio, nirev

March 18, 2017
Tweet

More Decks by Guilherme de Maio, nirev

Other Decks in Programming

Transcript

  1. @nirev
    Guilherme de Maio
    Go with the

    Flow
    Elixir's GenStage e Flow

    View Slide

  2. Why?
    Flow e GenStage

    View Slide

  3. Data Pipelines
    Flow e GenStage

    View Slide

  4. Data Pipelines
    Flow e GenStage
    • Log Processing
    • Indexing data in Search systems (ES/Solr)
    • Recommendation Algorithms
    • Realtime dashboards w/ metric aggregation
    • etc…

    View Slide

  5. Data Processing
    1.eager
    2.lazy
    3.???
    4.PROFIT

    View Slide

  6. 1.eager
    2.lazy
    3.concurrent
    4.distributed
    Data Processing

    View Slide

  7. Eager

    View Slide

  8. Eager
    File.read!("path/to/some/file")
    |> Enum.flat_map(&String.split(&1, " "))
    |> Enum.reduce(%{}, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()

    View Slide

  9. Eager
    • Simple
    • All in memory
    • Slow for large files
    • 0% concurrent
    File.read!("path/to/some/file")
    |> Enum.flat_map(&String.split(&1, " "))
    |> Enum.reduce(%{}, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()

    View Slide

  10. Lazy

    View Slide

  11. Lazy
    File.stream!("path/to/some/file")
    |> Stream.flat_map(&String.split(&1, " "))
    |> Enum.reduce(%{}, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()

    View Slide

  12. Lazy
    • Better, one line at a time
    • Less memory usage
    • Still slow for large files
    • 0% concurrent
    File.stream!("path/to/some/file")
    |> Stream.flat_map(&String.split(&1, " "))
    |> Enum.reduce(%{}, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()

    View Slide

  13. Concurrent

    View Slide

  14. Concurrent
    File.stream!("path/to/some/file")
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, " "))
    |> Flow.partition()
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()

    View Slide

  15. Concurrent
    • Oh, yeah! Multi-process!
    • Does not guarantee order
    File.stream!("path/to/some/file")
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, " "))
    |> Flow.partition()
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()

    View Slide

  16. Concurrent
    File.stream!("path/to/some/file")
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, " "))
    |> Flow.partition()
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()
    • Oh, yeah! Multi-process!
    • Does not guarantee order

    View Slide

  17. Flow

    View Slide

  18. Flow
    • A way to express data processing of collections (such
    as Enum and Stream), but done in parallel with
    GenStage
    • Works for both bounded and unbounded data
    • API inspired by Apache Spark

    View Slide

  19. GenStage

    View Slide

  20. GenStage
    • It’s a behaviour
    • Done for exchanging data among “ processing stages”
    • in a transparent way
    • and with back-pressure

    View Slide

  21. GenStage
    Producer/
    Consumer
    Producer
    Producer/
    Consumer
    Consumer

    View Slide

  22. GenStage
    • The Consumer subscribes to the Producer
    • The Consumer dictates the pace
    • Producer can send to several Consumers
    • Configurable dispatch policies

    View Slide

  23. GenStage
    Producer Consumer
    Asks for X
    Sends at maximum X

    View Slide

  24. GenStage
    Producer Consumer
    max_demand: maximum number of events the consumer can have
    min_demand: minimum of events, when reaching this number, asks for more
    Asks for X
    Sends at maximum X

    View Slide

  25. GenStage
    Producer
    1
    2,4
    3
    Dispatch
    1,2,3,4
    DemandDispatcher

    View Slide

  26. GenStage
    Producer
    1,2
    1,2
    1,2
    1,2
    BroadcastDispatcher
    Dispatch

    View Slide

  27. GenStage
    Producer
    1,4
    2,5
    3,6
    1,2,3,4,5,6
    PartitionDispatcher
    rem(e,3)
    Dispatch

    View Slide

  28. GenStage
    Example
    (https://github.com/nirev/gen_stage_example)

    View Slide

  29. Distributed

    View Slide

  30. Distributed
    • Not yet
    • Lack of guarantees in Flow
    • Do we really need to?
    • More soon…

    View Slide

  31. Takeaways

    View Slide

  32. Takeways
    • Data Pipelines are now “citizens" of Elixir
    • Very familiar API for those who used other frameworks
    • Very promising and consistent
    • No guarantees yet: 

    which means, please don’t use it to process payments ;D

    View Slide

  33. Thanks!
    https://xerpa.recruiterbox.com/
    @nirev
    Guilherme Nogueira

    View Slide

  34. References
    • https://hexdocs.pm/gen_stage
    • https://hexdocs.pm/flow
    • https://www.youtube.com/watch?v=aZuY5-2lwW4
    • https://www.youtube.com/watch?v=IBcLOxW1Zgs
    • http://teamon.eu/2016/tuning-elixir-genstage-flow-pipeline-processing/
    • https://blog.discordapp.com/how-discord-handles-push-request-bursts-of-over-a-million-per-
    minute-with-elixirs-genstage-8f899f0221b4
    Documentation and Talks
    Cases

    View Slide