Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My Elixir App is crashing more than my Ruby App!

Elle Imhoff
September 01, 2016

My Elixir App is crashing more than my Ruby App!

This is the story of some of the production bugs I’ve had to fix in our Elixir app that didn’t affect our Ruby apps. Both our Elixir app and Ruby apps use RabbitMQ clients to talk to RabbitMQ RPC servers on the other apps, so they have similar implementation.

Elle Imhoff

September 01, 2016
Tweet

More Decks by Elle Imhoff

Other Decks in Technology

Transcript

  1. My Elixir App is crashing more than my Ruby App!

    The wording in docs is specific and meaningful This is the story of some of the production bugs I’ve had to fix in our Elixir app that didn’t affect our Ruby apps. Both our Elixir app and Ruby apps use RabbitMQ clients to talk to RabbitMQ RPC servers on the other apps, so they have similar implementation.
  2. Process leaks are memory leaks 1. Call Phoenix Controller action

    2. Start a GenServer with start_link 3. Phoenix Controller action process exits 4. GenServer keeps running In the Elixir app I start an RPC client (a GenServer) during controller actions that need data from the Ruby app. I used start_link, so I expected the GenServer to be cleaned up when the controller action exited. Figuring out this was the memory leak was difficult due to our host using Herokuish, so we couldn’t use Observer. I eventually go that working with reverse SSH tunnels. If anyone needs a similar solution come talk to me. It’s tied to our RPC servers right now.
  3. Process leaks are memory leaks 1. Phoenix Controller exit is

    :normal 2. :normal exit doesn’t killed linked GenServer This bug was due to me not thinking through how start_link works with GenServer: a controller exitiing is normal exit, not a crash, a normal exit doesn’t killed linked GenServer.
  4. Process leaks are memory leaks Trap Exits! This bug killed

    our Elixir app by eating all 1 GB of it’s container’s memory because each client was eaching about 4MB of memory due to deserializing JSONAPI documents. The fix was a single line change to trap exits.
  5. Process leaks are memory leaks The docs for GenServer.start_link/3 say

    the that the GenServer will only exit when the linked process crashes, but it didn’t click that I needed to trap exits for the controller process exiting.
  6. RabbitMQ crashes if you have too many queues 1. Declare

    RabbitMQ a. Auto-delete b. exclusive We kept getting queue not draining and dead (unconnected) queue alerts. We declare the queues as auto-delete and exclusive, in both Ruby and Elixir, but we only had leaking queues from Elixir, so what’s the problem?
  7. Ruby 1. Client creates a connection 2. Client creates a

    channel on that connection 3. Client is destroyed a. Connection is destroyed In Ruby, the RPC clients and their connection are destroyed when the controller action completes.
  8. Elixir 1. Client ask GenServer for connection 2. Client creates

    a channel on that connection 3. Client is destroyed The RPC clients now properly exits when the controller action exits normal because of the fix for the last bug. Unlike Ruby, the connection is maintained in a GenServer as RabbitMQ’s docs tell you to pool the connection, so you only have one connection and multiple channels on that connection.
  9. :exclusive :exclusive doesn’t delete the queue until your connection is

    lost! While in Ruby, exclusive cleans up the queue immediately because destroying the client closes the connection, in Elixir, we’re pooling the connection, so exclusive does nothing until the connection goes down due to a network error or RabbitMQ restart.
  10. :”auto-delete” :”auto-delete” doesn’t delete the queue until your channel closes!

    Auto-delete is slightly nicer than exclusive, in that it’s controlled by the channel, but we weren’t explicitly closer our channels and assuming it would die when the RPC client GenServer died, instead we are now closing the channel in the terminate callback explictly to fix the bug.