Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zen and the Art of Elixir Applications Monitoring

Zen and the Art of Elixir Applications Monitoring

Application monitoring is one of the most crucial aspects that facilitates application's success.
In this talk, we will have a journey from metrics to alerting, along the way visiting major highlights including the aggregation and downsampling concerns, and understanding how this all is tied together.

Avatar for Aleksei Magusev

Aleksei Magusev

November 19, 2020
Tweet

More Decks by Aleksei Magusev

Other Decks in Programming

Transcript

  1. Retention and downsampling Before measurement value time temperature 21 9:11

    temperature 24 9:12 temperature 21 9:13 temperature 23 9:14 temperature 22 9:15 temperature 23 9:16 After measurement value time temperature 22 9:13 temperature 23 9:16
  2. Alerting measurement value time temperature 22 9:31 temperature 25 9:32

    temperature 31 9:33 temperature 35 9:34 temperature 37 9:35 If temperature > 33 ⾠ The temperature value 35 is beyond the unacceptable limit
  3. Metric anatomy measurement labels value timestamp ecto_repo_queue_time repo=foo 14 123

    ecto_repo_query_time repo=foo 57 123 ecto_repo_decode_time repo=foo 4 124
  4. def handle_event([:my_app, :repo, :query], measurements, metadata, %{}) do fields =

    [] |> Fluxter.put_request_id() |> include_if_present(measurements, :queue_time) |> include_if_present(measurements, :query_time) |> include_if_present(measurements, :decode_time) MyApp.Fluxter.write("ecto_repo_exec", [repo: metadata.repo], fields) end defp include_if_present(fields, measurements, key) do case measurements do %{^key => value} -> microseconds = System.convert_time_unit(value, :native, :microsecond) [{key, microseconds} | fields] _ -> fields end end lexmag/fluxter
  5. VM metrics :erlang.system_info/1 ‣ atom_count ‣ process_count ‣ port_count :erlang.process_info/2

    ‣ message_queue_len :erlang.statistics/1 ‣ run_queue ‣ garbage_collection
  6. Reporting app start # We do this at compile-time as

    we don't have Mix available in releases. @version Mix.Project.config().version def start(_type, _args) do #... with {:ok, _} = result <- Supervisor.start_link(children, options) do MyApp.Fluxter.write("start", version: @version) result end end
  7. Host metrics CPU (usage, idle, iowait, etc.)
 Memory (used, swap,

    etc.) Disks (free_space, utilization, etc.) Network (bytes in/out, drops, errors, etc.)