Slide 1

Slide 1 text

@nirev Guilherme de Maio Go with the
 Flow Elixir's GenStage e Flow

Slide 2

Slide 2 text

Why? Flow e GenStage

Slide 3

Slide 3 text

Data Pipelines Flow e GenStage

Slide 4

Slide 4 text

Data Pipelines Flow e GenStage • Log Processing • Indexing data in Search systems (ES/Solr) • Recommendation Algorithms • Realtime dashboards w/ metric aggregation • etc…

Slide 5

Slide 5 text

Data Processing 1.eager 2.lazy 3.??? 4.PROFIT

Slide 6

Slide 6 text

1.eager 2.lazy 3.concurrent 4.distributed Data Processing

Slide 7

Slide 7 text

Eager

Slide 8

Slide 8 text

Eager File.read!("path/to/some/file") |> Enum.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()

Slide 9

Slide 9 text

Eager • Simple • All in memory • Slow for large files • 0% concurrent File.read!("path/to/some/file") |> Enum.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()

Slide 10

Slide 10 text

Lazy

Slide 11

Slide 11 text

Lazy File.stream!("path/to/some/file") |> Stream.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()

Slide 12

Slide 12 text

Lazy • Better, one line at a time • Less memory usage • Still slow for large files • 0% concurrent File.stream!("path/to/some/file") |> Stream.flat_map(&String.split(&1, " ")) |> Enum.reduce(%{}, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()

Slide 13

Slide 13 text

Concurrent

Slide 14

Slide 14 text

Concurrent File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()

Slide 15

Slide 15 text

Concurrent • Oh, yeah! Multi-process! • Does not guarantee order File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list()

Slide 16

Slide 16 text

Concurrent File.stream!("path/to/some/file") |> Flow.from_enumerable() |> Flow.flat_map(&String.split(&1, " ")) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, acc -> Map.update(acc, word, 1, & &1 + 1) end) |> Enum.to_list() • Oh, yeah! Multi-process! • Does not guarantee order

Slide 17

Slide 17 text

Flow

Slide 18

Slide 18 text

Flow • A way to express data processing of collections (such as Enum and Stream), but done in parallel with GenStage • Works for both bounded and unbounded data • API inspired by Apache Spark

Slide 19

Slide 19 text

GenStage

Slide 20

Slide 20 text

GenStage • It’s a behaviour • Done for exchanging data among “ processing stages” • in a transparent way • and with back-pressure

Slide 21

Slide 21 text

GenStage Producer/ Consumer Producer Producer/ Consumer Consumer

Slide 22

Slide 22 text

GenStage • The Consumer subscribes to the Producer • The Consumer dictates the pace • Producer can send to several Consumers • Configurable dispatch policies

Slide 23

Slide 23 text

GenStage Producer Consumer Asks for X Sends at maximum X

Slide 24

Slide 24 text

GenStage Producer Consumer max_demand: maximum number of events the consumer can have min_demand: minimum of events, when reaching this number, asks for more Asks for X Sends at maximum X

Slide 25

Slide 25 text

GenStage Producer 1 2,4 3 Dispatch 1,2,3,4 DemandDispatcher

Slide 26

Slide 26 text

GenStage Producer 1,2 1,2 1,2 1,2 BroadcastDispatcher Dispatch

Slide 27

Slide 27 text

GenStage Producer 1,4 2,5 3,6 1,2,3,4,5,6 PartitionDispatcher rem(e,3) Dispatch

Slide 28

Slide 28 text

GenStage Example (https://github.com/nirev/gen_stage_example)

Slide 29

Slide 29 text

Distributed

Slide 30

Slide 30 text

Distributed • Not yet • Lack of guarantees in Flow • Do we really need to? • More soon…

Slide 31

Slide 31 text

Takeaways

Slide 32

Slide 32 text

Takeways • Data Pipelines are now “citizens" of Elixir • Very familiar API for those who used other frameworks • Very promising and consistent • No guarantees yet: 
 which means, please don’t use it to process payments ;D

Slide 33

Slide 33 text

Thanks! https://xerpa.recruiterbox.com/ @nirev Guilherme Nogueira

Slide 34

Slide 34 text

References • https://hexdocs.pm/gen_stage • https://hexdocs.pm/flow • https://www.youtube.com/watch?v=aZuY5-2lwW4 • https://www.youtube.com/watch?v=IBcLOxW1Zgs • http://teamon.eu/2016/tuning-elixir-genstage-flow-pipeline-processing/ • https://blog.discordapp.com/how-discord-handles-push-request-bursts-of-over-a-million-per- minute-with-elixirs-genstage-8f899f0221b4 Documentation and Talks Cases