Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bloated Chefs: A Tale of Gluttony and the Path to Enlightenment

Bloated Chefs: A Tale of Gluttony and the Path to Enlightenment

As your infrastructure grows and your recipes become more complex, you may suddenly find yourself with chef-client runs that take on the order of minutes to complete. With Chef being the primary mechanism for pushing critical fixes in many orgs, the amount of time it takes for the fleet to converge is of the utmost importance. Non-performant chef-client runs will impact both agility and your ability to scale, so keeping them lean can make a large impact in your operational capacity. You will hear tales of chef-client run duration horror, and how we at PagerDuty have brought our chef-client runs back to the land of ponies and rainbows.

Evan Gilman

April 02, 2015
Tweet

More Decks by Evan Gilman

Other Decks in Technology

Transcript

  1. 4/3/15 Agenda BLOATED CHEFS 1. Chef resources in use at

    PD 2. Problems encountered as we grew 3. Measuring chef-client run 4. How we fixed it 5. How fast is it now?
  2. 4/3/15 As we grew… BLOATED CHEFS • CPU spikes during

    chef-client runs • Awkward pauses at the beginning of the run
  3. 4/3/15 As we grew… BLOATED CHEFS • CPU spikes during

    chef-client runs • Awkward pauses at the beginning of the run • chef-client run took several minutes
  4. 4/3/15 As we grew… BLOATED CHEFS • CPU spikes during

    chef-client runs • Awkward pauses at the beginning of the run • chef-client run took several minutes • chef-client OOM
  5. 4/3/15 As we grew… BLOATED CHEFS • CPU spikes during

    chef-client runs • Awkward pauses at the beginning of the run • chef-client run took several minutes • chef-client OOM
  6. 4/3/15 Measuring Resources BLOATED CHEFS • Total number of resources

    per run, by type • Number of updated resources per run, by type
  7. 4/3/15 Measuring Memory BLOATED CHEFS • Gather proc stats with

    sys-proctable • Gather GC stats • Can be emitted as statsd
  8. 4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of

    desired results • Minimizes volume of node data returned/handled
  9. 4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of

    desired results • Minimizes volume of node data returned/handled • hash2node
  10. 4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of

    desired results • Minimizes volume of node data returned/handled • hash2node • Two searches touched
  11. 4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of

    desired results • Minimizes volume of node data returned/handled • hash2node • Two searches touched 90s -> 60s
  12. 4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of

    desired results • Minimizes volume of node data returned/handled • hash2node • Two searches touched 90s -> 60s 30%
  13. 4/3/15 Result Memoization BLOATED CHEFS • Common search data •

    API-backed LWRP’s • Can be generalized
  14. 4/3/15 Result Memoization BLOATED CHEFS • Common search data •

    API-backed LWRP’s • Can be generalized
  15. 4/3/15 Other Nasties BLOATED CHEFS • Too many conditional guards

    • tmpfs storage • Multiple package resources (Chef 12)
  16. 4/3/15 Other Nasties BLOATED CHEFS • Too many conditional guards

    • tmpfs storage • Multiple package resources (Chef 12) Six seconds for twelve packages