Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Crashing BEAM Applications @ElixirConf.EU 2020

Crashing BEAM Applications @ElixirConf.EU 2020

We often talk about embracing failures and making resilient applications, and how BEAM helps us do that. Nonetheless, there are several ways you can write code that will crash your application, and sometimes the whole VM! This talk is about those things, showing several ways I may or may have not managed to crash BEAM in the past, some obvious, and some not that obvious ;)

Guilherme de Maio, nirev

October 08, 2020
Tweet

More Decks by Guilherme de Maio, nirev

Other Decks in Programming

Transcript

  1. @nirev Based in São Paulo, Brazil
 
 Elixir since 2015


    (mostly Java and C before that)
 
 co-organizer of SP Elixir User Group ;)
  2. @nirev Based in São Paulo, Brazil
 
 Elixir since 2015


    (mostly Java and C before that)
 
 co-organizer of SP Elixir User Group ;)
 @Telnyx since 2017
  3. @nirev defmodule X do def explode do Enum.each(1 ..2_000_000, fn

    i -> IO.puts(" #{i}") 20 |> :crypto.strong_rand_bytes() |> Base.encode64() |> String.to_atom() end) end end
  4. @nirev 82534 ... 82535 ... 82536 ... 82537 ... 82538

    ... 82539 ... 82540 ... no more index entries in atom_tab (max=100000) Crash dump is being written to: erl_crash.dump ...done
  5. @nirev defmodule X do def spawn do spawn(fn -> {:ok,

    _pid} = Agent.start_link(fn -> 42 end) Process.sleep(1_000) exit(:normal) end) end end
  6. @nirev iex(1)> Process.list() |> Enum.count() 50 iex(2)> for _ <-

    1 ..10, do: X.spawn [#PID<0.94.0>, #PID<0.95.0>, #PID<0.96.0>, #PID<0.97.0>, #PID<0.98.0>, #PID<0.99.0>, #PID<0.100.0>, #PID<0.101.0>, #PID<0.102.0>, #PID<0.103.0>] iex(3)> Process.list() |> Enum.count() 70 iex(4)> Process.sleep(2_000) :ok iex(5)> Process.list() |> Enum.count() 60
  7. @nirev ref = Process.monitor(pid) receive do {:DOWN, ^ref, :process, ^pid,

    _reason} -> # receive down message and do something
  8. @nirev request Tracker read data from db api call update

    db publish to queue CRASH breadcrumbs = Tracker.get_breadcrumbs(tracker) send_report(exception, breadcrumbs)
  9. @nirev For each request: - Start a new Agent -

    Monitor the agent and the process - kill agent when process ends normally
 OR 
 in case of exception:
 take breadcrumbs from agent, report, and kill
  10. @nirev Cowboy implements the keep-alive mechanism by reusing the same

    process for all requests. This allows Cowboy to save memory. This works well because most code will not have any side effect impacting subsequent requests. But it also means you need to clean up if you do have code with side effects. The terminate/3 function can be used for this purpose.
  11. @nirev |No | Pid | Memory |Name or Initial Call

    | Reductions| MsgQueue |Current Function | |1 |<0.433.0> | 137.1796 MB |inet_gethost_native | 3016210668| 179710 |inet_gethost_native:do_handle_call|
  12. @nirev Process Control Block Stack Heap Private Heap Garbage Collection

    Generational • young generation: newly allocated data • old generation: data that survives GC Fullsweep vs Generational runs: • min_heap_size • fullsweep_after
  13. @nirev Shared Heap Garbage Collection Reference Counting • any binary

    without references will be cleaned Shared Heap Refc Binary
  14. @nirev The problem with Message Router It receives messages >

    64bytes (Refc bins) So, it adds a reference to that to its heap But, since it doesn’t use much memory, it won’t grow past min_heap_size That means, references are not cleaned, so large binaries linger on Shared Heap. Message Router
  15. @nirev The SOLUTION for Message Router 1) fullsweep_after process flag

    configures how often a fullsweep GC will happen, and that will collect ProcBins 2) hibernating process when it hibernates, a fullsweep GC will run 3) moving work to short-lived processes if possible Message Router
  16. @nirev $ iex Erlang/OTP 23 [erts-11.0] [source] [64-bit] [smp:4:4] [ds:4:4:10]

    [async-threads:1] [hipe] Interactive Elixir (1.10.4) - press Ctrl+C to exit (type h() ENTER for help) iex(1)> Node.list []
  17. @nirev $ iex --sname node1@localhost --cookie ilovecookies Erlang/OTP 23 [erts-11.0]

    [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] Interactive Elixir (1.10.4) - press Ctrl+C to exit (type h() ENTER for help) iex(node1@localhost)1> Node.list []
  18. @nirev $ iex --sname node2 --cookie ilovecookies --remsh node1@localhost Erlang/OTP

    23 [erts-11.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] Interactive Elixir (1.10.4) - press Ctrl+C to exit (type h() ENTER for help) iex(node1@localhost)1> Node.list [:node2@C02T24Z2HF1R]
  19. @nirev $ iex --sname node2 --cookie ilovecookies --remsh node1@localhost Erlang/OTP

    23 [erts-11.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] Interactive Elixir (1.10.4) - press Ctrl+C to exit (type h() ENTER for help) iex(node1@localhost)1> Node.list [:node2@C02T24Z2HF1R]
  20. @nirev https: //ferd.github.io/recon/ Recon is a library to be dropped

    into any other Erlang project, to be used to assist DevOps people diagnose problems in production nodes. recon
  21. @nirev iex( ..)1> :recon.bin_leak(3) [ {#PID<0.80.0>, -606, [current_function: {Process, :sleep,

    1}, initial_call: {:erlang, :apply, 2} ]}, {#PID<0.124.0>, -176, [ :ssl_manager, {:current_function, {:gen_server, :loop, 7}}, {:initial_call, {:proc_lib, :init_p, 5}} ]}, {#PID<0.1905.0>, -165, [current_function: {:ranch_conns_sup, :loop, 4}, initial_call: {:proc_lib, :init_p, 5} ]} ] recon
  22. @nirev vmstats: send vm metrics to statsd https: //github.com/ferd/vmstats prometheus:

    send vm metrics to… prometheus https: //github.com/deadtrickster/prometheus.ex