4/3/15
@evan2645
EVAN GILMAN
Bloated Chefs
A Tale of Gluttony, and the Path to Enlightenment
Slide 2
Slide 2 text
4/3/15
BLOATED CHEFS
Slide 3
Slide 3 text
4/3/15
Agenda
BLOATED CHEFS
1. Chef resources in use at PD
2. Problems encountered as we grew
3. Measuring chef-client run
4. How we fixed it
5. How fast is it now?
Slide 4
Slide 4 text
4/3/15
BLOATED CHEFS
CHEF @ PAGERDUTY
Slide 5
Slide 5 text
4/3/15
BLOATED CHEFS
PD CHEF
RESOURCES
Slide 6
Slide 6 text
4/3/15
pd_iptables
BLOATED CHEFS
Slide 7
Slide 7 text
4/3/15
pd-ipsec::policies
BLOATED CHEFS
Slide 8
Slide 8 text
4/3/15
sumo_source
BLOATED CHEFS
Slide 9
Slide 9 text
4/3/15
pd_datadog_alert
BLOATED CHEFS
Slide 10
Slide 10 text
4/3/15
BLOATED CHEFS
Slide 11
Slide 11 text
4/3/15
BLOATED CHEFS
ALL WAS NOT WELL
Slide 12
Slide 12 text
4/3/15
As we grew…
BLOATED CHEFS
Slide 13
Slide 13 text
4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
Slide 14
Slide 14 text
4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
Slide 15
Slide 15 text
4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
• chef-client run took several minutes
Slide 16
Slide 16 text
4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
• chef-client run took several minutes
• chef-client OOM
Slide 17
Slide 17 text
4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
• chef-client run took several minutes
• chef-client OOM
Slide 18
Slide 18 text
4/3/15
BLOATED CHEFS
Slide 19
Slide 19 text
4/3/15
BLOATED CHEFS
Slide 20
Slide 20 text
4/3/15
BLOATED CHEFS
MEASURING
Slide 21
Slide 21 text
4/3/15
Measuring Run Time
BLOATED CHEFS
Slide 22
Slide 22 text
4/3/15
Measuring Run Time
BLOATED CHEFS
https://github.com/joemiller/chef-handler-profiler
Slide 23
Slide 23 text
4/3/15
Measuring Resources
BLOATED CHEFS
• Total number of resources per run, by type
• Number of updated resources per run, by type
Slide 24
Slide 24 text
4/3/15
Measuring Memory
BLOATED CHEFS
• Gather proc stats with sys-proctable
• Gather GC stats
• Can be emitted as statsd
Slide 25
Slide 25 text
4/3/15
Measuring Memory
BLOATED CHEFS
Slide 26
Slide 26 text
4/3/15
BLOATED CHEFS
WHAT WE FOUND
AND
WHAT WE DID
Slide 27
Slide 27 text
4/3/15
Step-through Searches
BLOATED CHEFS
Slide 28
Slide 28 text
4/3/15
Step-through Searches
BLOATED CHEFS
From this
Slide 29
Slide 29 text
4/3/15
Step-through Searches
BLOATED CHEFS
From this
To this
4/3/15
Partial Searches
BLOATED CHEFS
• Provide hash map of desired results
Slide 34
Slide 34 text
4/3/15
Partial Searches
BLOATED CHEFS
• Provide hash map of desired results
• Minimizes volume of node data returned/handled
Slide 35
Slide 35 text
4/3/15
Partial Searches
BLOATED CHEFS
• Provide hash map of desired results
• Minimizes volume of node data returned/handled
• hash2node
Slide 36
Slide 36 text
4/3/15
Partial Searches
BLOATED CHEFS
• Provide hash map of desired results
• Minimizes volume of node data returned/handled
• hash2node
• Two searches touched
Slide 37
Slide 37 text
4/3/15
Partial Searches
BLOATED CHEFS
• Provide hash map of desired results
• Minimizes volume of node data returned/handled
• hash2node
• Two searches touched
90s -> 60s
Slide 38
Slide 38 text
4/3/15
Partial Searches
BLOATED CHEFS
• Provide hash map of desired results
• Minimizes volume of node data returned/handled
• hash2node
• Two searches touched
90s -> 60s
30%
Slide 39
Slide 39 text
4/3/15
Result Memoization
BLOATED CHEFS
Slide 40
Slide 40 text
4/3/15
Result Memoization
BLOATED CHEFS
• Common search data
Slide 41
Slide 41 text
4/3/15
Result Memoization
BLOATED CHEFS
• Common search data
• API-backed LWRP’s
Slide 42
Slide 42 text
4/3/15
Result Memoization
BLOATED CHEFS
• Common search data
• API-backed LWRP’s
• Can be generalized
Slide 43
Slide 43 text
4/3/15
Result Memoization
BLOATED CHEFS
• Common search data
• API-backed LWRP’s
• Can be generalized
Slide 44
Slide 44 text
4/3/15
API Tarpitting
BLOATED CHEFS
Slide 45
Slide 45 text
4/3/15
API Tarpitting
BLOATED CHEFS
Centralize calls
Slide 46
Slide 46 text
4/3/15
BLOATED CHEFS
OTHER NASTIES
Slide 47
Slide 47 text
4/3/15
Other Nasties
BLOATED CHEFS
• Too many conditional guards
Slide 48
Slide 48 text
4/3/15
Other Nasties
BLOATED CHEFS
• Too many conditional guards
• tmpfs storage
Slide 49
Slide 49 text
4/3/15
Other Nasties
BLOATED CHEFS
• Too many conditional guards
• tmpfs storage
• Multiple package resources (Chef 12)
Slide 50
Slide 50 text
4/3/15
Other Nasties
BLOATED CHEFS
• Too many conditional guards
• tmpfs storage
• Multiple package resources (Chef 12)
Six seconds for twelve packages