Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elixir's GenStage and Flow

Elixir's GenStage and Flow

Guilherme de Maio, nirev

March 18, 2017
Tweet

More Decks by Guilherme de Maio, nirev

Other Decks in Programming

Transcript

  1. Data Pipelines Flow e GenStage • Log Processing • Indexing

    data in Search systems (ES/Solr) • Recommendation Algorithms • Realtime dashboards w/ metric aggregation • etc…
  2. Eager • Simple • All in memory • Slow for

    large files • 0% concurrent File.read!("path/to/some/file") |> Enum.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  3. Lazy • Better, one line at a time • Less

    memory usage • Still slow for large files • 0% concurrent File.stream!("path/to/some/file") |> Stream.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  4. Concurrent File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition()

    |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  5. Concurrent • Oh, yeah! Multi-process! • Does not guarantee order

    File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()
  6. Concurrent File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition()

    |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list() • Oh, yeah! Multi-process! • Does not guarantee order
  7. Flow • A way to express data processing of collections

    (such as Enum and Stream), but done in parallel with GenStage • Works for both bounded and unbounded data • API inspired by Apache Spark
  8. GenStage • It’s a behaviour • Done for exchanging data

    among “ processing stages” • in a transparent way • and with back-pressure
  9. GenStage • The Consumer subscribes to the Producer • The

    Consumer dictates the pace • Producer can send to several Consumers • Configurable dispatch policies
  10. GenStage Producer Consumer max_demand: maximum number of events the consumer

    can have min_demand: minimum of events, when reaching this number, asks for more Asks for X Sends at maximum X
  11. Distributed • Not yet • Lack of guarantees in Flow

    • Do we really need to? • More soon…
  12. Takeways • Data Pipelines are now “citizens" of Elixir •

    Very familiar API for those who used other frameworks • Very promising and consistent • No guarantees yet: 
 which means, please don’t use it to process payments ;D
  13. References • https://hexdocs.pm/gen_stage • https://hexdocs.pm/flow • https://www.youtube.com/watch?v=aZuY5-2lwW4 • https://www.youtube.com/watch?v=IBcLOxW1Zgs •

    http://teamon.eu/2016/tuning-elixir-genstage-flow-pipeline-processing/ • https://blog.discordapp.com/how-discord-handles-push-request-bursts-of-over-a-million-per- minute-with-elixirs-genstage-8f899f0221b4 Documentation and Talks Cases