Slide 1

Slide 1 text

4/3/15 @evan2645 EVAN GILMAN Bloated Chefs A Tale of Gluttony, and the Path to Enlightenment

Slide 2

Slide 2 text

4/3/15 BLOATED CHEFS

Slide 3

Slide 3 text

4/3/15 Agenda BLOATED CHEFS 1. Chef resources in use at PD 2. Problems encountered as we grew 3. Measuring chef-client run 4. How we fixed it 5. How fast is it now?

Slide 4

Slide 4 text

4/3/15 BLOATED CHEFS CHEF @ PAGERDUTY

Slide 5

Slide 5 text

4/3/15 BLOATED CHEFS PD CHEF RESOURCES

Slide 6

Slide 6 text

4/3/15 pd_iptables BLOATED CHEFS

Slide 7

Slide 7 text

4/3/15 pd-ipsec::policies BLOATED CHEFS

Slide 8

Slide 8 text

4/3/15 sumo_source BLOATED CHEFS

Slide 9

Slide 9 text

4/3/15 pd_datadog_alert BLOATED CHEFS

Slide 10

Slide 10 text

4/3/15 BLOATED CHEFS

Slide 11

Slide 11 text

4/3/15 BLOATED CHEFS ALL WAS NOT WELL

Slide 12

Slide 12 text

4/3/15 As we grew… BLOATED CHEFS

Slide 13

Slide 13 text

4/3/15 As we grew… BLOATED CHEFS • CPU spikes during chef-client runs

Slide 14

Slide 14 text

4/3/15 As we grew… BLOATED CHEFS • CPU spikes during chef-client runs • Awkward pauses at the beginning of the run

Slide 15

Slide 15 text

4/3/15 As we grew… BLOATED CHEFS • CPU spikes during chef-client runs • Awkward pauses at the beginning of the run • chef-client run took several minutes

Slide 16

Slide 16 text

4/3/15 As we grew… BLOATED CHEFS • CPU spikes during chef-client runs • Awkward pauses at the beginning of the run • chef-client run took several minutes • chef-client OOM

Slide 17

Slide 17 text

4/3/15 As we grew… BLOATED CHEFS • CPU spikes during chef-client runs • Awkward pauses at the beginning of the run • chef-client run took several minutes • chef-client OOM

Slide 18

Slide 18 text

4/3/15 BLOATED CHEFS

Slide 19

Slide 19 text

4/3/15 BLOATED CHEFS

Slide 20

Slide 20 text

4/3/15 BLOATED CHEFS MEASURING

Slide 21

Slide 21 text

4/3/15 Measuring Run Time BLOATED CHEFS

Slide 22

Slide 22 text

4/3/15 Measuring Run Time BLOATED CHEFS https://github.com/joemiller/chef-handler-profiler

Slide 23

Slide 23 text

4/3/15 Measuring Resources BLOATED CHEFS • Total number of resources per run, by type • Number of updated resources per run, by type

Slide 24

Slide 24 text

4/3/15 Measuring Memory BLOATED CHEFS • Gather proc stats with sys-proctable • Gather GC stats • Can be emitted as statsd

Slide 25

Slide 25 text

4/3/15 Measuring Memory BLOATED CHEFS

Slide 26

Slide 26 text

4/3/15 BLOATED CHEFS WHAT WE FOUND AND WHAT WE DID

Slide 27

Slide 27 text

4/3/15 Step-through Searches BLOATED CHEFS

Slide 28

Slide 28 text

4/3/15 Step-through Searches BLOATED CHEFS From this

Slide 29

Slide 29 text

4/3/15 Step-through Searches BLOATED CHEFS From this To this

Slide 30

Slide 30 text

4/3/15 Step-through Searches BLOATED CHEFS 417MB -> 190MB

Slide 31

Slide 31 text

4/3/15 Step-through Searches BLOATED CHEFS 417MB -> 190MB ~54%

Slide 32

Slide 32 text

4/3/15 Partial Searches BLOATED CHEFS

Slide 33

Slide 33 text

4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of desired results

Slide 34

Slide 34 text

4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of desired results • Minimizes volume of node data returned/handled

Slide 35

Slide 35 text

4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of desired results • Minimizes volume of node data returned/handled • hash2node

Slide 36

Slide 36 text

4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of desired results • Minimizes volume of node data returned/handled • hash2node • Two searches touched

Slide 37

Slide 37 text

4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of desired results • Minimizes volume of node data returned/handled • hash2node • Two searches touched 90s -> 60s

Slide 38

Slide 38 text

4/3/15 Partial Searches BLOATED CHEFS • Provide hash map of desired results • Minimizes volume of node data returned/handled • hash2node • Two searches touched 90s -> 60s 30%

Slide 39

Slide 39 text

4/3/15 Result Memoization BLOATED CHEFS

Slide 40

Slide 40 text

4/3/15 Result Memoization BLOATED CHEFS • Common search data

Slide 41

Slide 41 text

4/3/15 Result Memoization BLOATED CHEFS • Common search data • API-backed LWRP’s

Slide 42

Slide 42 text

4/3/15 Result Memoization BLOATED CHEFS • Common search data • API-backed LWRP’s • Can be generalized

Slide 43

Slide 43 text

4/3/15 Result Memoization BLOATED CHEFS • Common search data • API-backed LWRP’s • Can be generalized

Slide 44

Slide 44 text

4/3/15 API Tarpitting BLOATED CHEFS

Slide 45

Slide 45 text

4/3/15 API Tarpitting BLOATED CHEFS Centralize calls

Slide 46

Slide 46 text

4/3/15 BLOATED CHEFS OTHER NASTIES

Slide 47

Slide 47 text

4/3/15 Other Nasties BLOATED CHEFS • Too many conditional guards

Slide 48

Slide 48 text

4/3/15 Other Nasties BLOATED CHEFS • Too many conditional guards • tmpfs storage

Slide 49

Slide 49 text

4/3/15 Other Nasties BLOATED CHEFS • Too many conditional guards • tmpfs storage • Multiple package resources (Chef 12)

Slide 50

Slide 50 text

4/3/15 Other Nasties BLOATED CHEFS • Too many conditional guards • tmpfs storage • Multiple package resources (Chef 12) Six seconds for twelve packages

Slide 51

Slide 51 text

4/3/15 BLOATED CHEFS BEFORE/AFTER

Slide 52

Slide 52 text

4/3/15 Memory Saved BLOATED CHEFS Before: After:

Slide 53

Slide 53 text

4/3/15 Memory Saved BLOATED CHEFS Before: ~500MB After:

Slide 54

Slide 54 text

4/3/15 Memory Saved BLOATED CHEFS Before: ~500MB After: ~60MB

Slide 55

Slide 55 text

4/3/15 Memory Saved BLOATED CHEFS Before: ~500MB After: ~60MB 88% less memory!

Slide 56

Slide 56 text

4/3/15 Seconds Saved BLOATED CHEFS Before: After:

Slide 57

Slide 57 text

4/3/15 Seconds Saved BLOATED CHEFS Before: ~180s/run After:

Slide 58

Slide 58 text

4/3/15 Seconds Saved BLOATED CHEFS Before: ~180s/run After: ~30s/run

Slide 59

Slide 59 text

4/3/15 Seconds Saved BLOATED CHEFS Before: ~180s/run After: ~30s/run ~84% faster!

Slide 60

Slide 60 text

4/3/15 BLOATED CHEFS FREEDOM

Slide 61

Slide 61 text

4/3/15 Thank you. @evan2645 EVAN GILMAN