Pro Yearly is on sale from $80 to $50! »

Elixir's GenStage and Flow

Elixir's GenStage and Flow

4b178f929b750c873b4d2b0c0a682051?s=128

Guilherme de Maio, nirev

March 18, 2017
Tweet

Transcript

  1. @nirev Guilherme de Maio Go with the
 Flow Elixir's GenStage

    e Flow
  2. Why? Flow e GenStage

  3. Data Pipelines Flow e GenStage

  4. Data Pipelines Flow e GenStage • Log Processing • Indexing

    data in Search systems (ES/Solr) • Recommendation Algorithms • Realtime dashboards w/ metric aggregation • etc…
  5. Data Processing 1.eager 2.lazy 3.??? 4.PROFIT

  6. 1.eager 2.lazy 3.concurrent 4.distributed Data Processing

  7. Eager

  8. Eager File.read!("path/to/some/file") |> Enum.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word,

    acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  9. Eager • Simple • All in memory • Slow for

    large files • 0% concurrent File.read!("path/to/some/file") |> Enum.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  10. Lazy

  11. Lazy File.stream!("path/to/some/file") |> Stream.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word,

    acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  12. Lazy • Better, one line at a time • Less

    memory usage • Still slow for large files • 0% concurrent File.stream!("path/to/some/file") |> Stream.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  13. Concurrent

  14. Concurrent File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition()

    |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  15. Concurrent • Oh, yeah! Multi-process! • Does not guarantee order

    File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  16. Concurrent File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition()

    |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list() • Oh, yeah! Multi-process! • Does not guarantee order
  17. Flow

  18. Flow • A way to express data processing of collections

    (such as Enum and Stream), but done in parallel with GenStage • Works for both bounded and unbounded data • API inspired by Apache Spark
  19. GenStage

  20. GenStage • It’s a behaviour • Done for exchanging data

    among “ processing stages” • in a transparent way • and with back-pressure
  21. GenStage Producer/ Consumer Producer Producer/ Consumer Consumer

  22. GenStage • The Consumer subscribes to the Producer • The

    Consumer dictates the pace • Producer can send to several Consumers • Configurable dispatch policies
  23. GenStage Producer Consumer Asks for X Sends at maximum X

  24. GenStage Producer Consumer max_demand: maximum number of events the consumer

    can have min_demand: minimum of events, when reaching this number, asks for more Asks for X Sends at maximum X
  25. GenStage Producer 1 2,4 3 Dispatch 1,2,3,4 DemandDispatcher

  26. GenStage Producer 1,2 1,2 1,2 1,2 BroadcastDispatcher Dispatch

  27. GenStage Producer 1,4 2,5 3,6 1,2,3,4,5,6 PartitionDispatcher rem(e,3) Dispatch

  28. GenStage Example (https://github.com/nirev/gen_stage_example)

  29. Distributed

  30. Distributed • Not yet • Lack of guarantees in Flow

    • Do we really need to? • More soon…
  31. Takeaways

  32. Takeways • Data Pipelines are now “citizens" of Elixir •

    Very familiar API for those who used other frameworks • Very promising and consistent • No guarantees yet: 
 which means, please don’t use it to process payments ;D
  33. Thanks! https://xerpa.recruiterbox.com/ @nirev Guilherme Nogueira

  34. References • https://hexdocs.pm/gen_stage • https://hexdocs.pm/flow • https://www.youtube.com/watch?v=aZuY5-2lwW4 • https://www.youtube.com/watch?v=IBcLOxW1Zgs •

    http://teamon.eu/2016/tuning-elixir-genstage-flow-pipeline-processing/ • https://blog.discordapp.com/how-discord-handles-push-request-bursts-of-over-a-million-per- minute-with-elixirs-genstage-8f899f0221b4 Documentation and Talks Cases