Slide 1

Slide 1 text

Holly Cummins Red Hat QCon London | March 29, 2023 Why Cloud Zombies Are Destroying the Planet and How You Can Stop Them

Slide 2

Slide 2 text

Holly Cummins Red Hat QCon London | March 29, 2023 Why Cloud Zombies Are Destroying the Planet and How You Can Stop Them

Slide 3

Slide 3 text

@holly_cummins #RedHat

Slide 4

Slide 4 text

@holly_cummins #RedHat

Slide 5

Slide 5 text

@therealmarkw1, twitter

Slide 6

Slide 6 text

what do these servers do? @therealmarkw1, twitter

Slide 7

Slide 7 text

what do these servers do? one is a backup for the other. @therealmarkw1, twitter

Slide 8

Slide 8 text

what do these servers do? one is a backup for the other. yes, but what do they do? @therealmarkw1, twitter

Slide 9

Slide 9 text

what do these servers do? one is a backup for the other. yes, but what do they do? @therealmarkw1, twitter no one has known for a couple of decades

Slide 10

Slide 10 text

#RedHat @[email protected] Hey boss, I created a Kubernetes cluster. 2018

Slide 11

Slide 11 text

#RedHat @[email protected] Hey boss, I created a Kubernetes cluster. I forgot it for 2 months. 2018

Slide 12

Slide 12 text

#RedHat @[email protected] Hey boss, I created a Kubernetes cluster. I forgot it for 2 months. … and it’s €1000 a month. 2018

Slide 13

Slide 13 text

#RedHat @[email protected] Hey boss, while I was working on a QCon talk about sustainability … 2023

Slide 14

Slide 14 text

#RedHat @[email protected] Hey boss, while I was working on a QCon talk about sustainability … I left the Quarkus CI on Mac disabled 2023

Slide 15

Slide 15 text

#RedHat @[email protected] Hey boss, while I was working on a QCon talk about sustainability … … and the instance is $159 a month. I left the Quarkus CI on Mac disabled 2023

Slide 16

Slide 16 text

@holly_cummins #RedHat “measure, don’t guess” (or decide based on stories on the internet)

Slide 17

Slide 17 text

@holly_cummins #RedHat actual picture of a zombie (it’s invisible)

Slide 18

Slide 18 text

@holly_cummins #RedHat actual picture of a zombie (it’s invisible)

Slide 19

Slide 19 text

#RedHat @[email protected] 2015 survey 30% of 4,000 servers doing no useful work

Slide 20

Slide 20 text

#RedHat @[email protected] 2017 survey 25% of 16,000 servers doing no useful work

Slide 21

Slide 21 text

#RedHat @[email protected] zombie “they haven't delivered any information or computing services for six months or more”

Slide 22

Slide 22 text

#RedHat @[email protected] “comatose servers”

Slide 23

Slide 23 text

#RedHat @[email protected] under-utilised servers

Slide 24

Slide 24 text

#RedHat @[email protected] “much of the energy consumed by U.S. data centers is used to power more than 12 million servers that do little or no work most of the time” NRDC

Slide 25

Slide 25 text

#RedHat @[email protected] the average server: 12 - 18% of capacity 30 - 60 % of maximum power https://www.nrdc.org/sites/default/files/data-center-efficiency-assessment-IB.pdf

Slide 26

Slide 26 text

#RedHat @[email protected] 2014 survey 29% of 4,000 active less than 5% of the time https://www.anthesisgroup.com/wp-content/uploads/2019/11/Comatose-Servers-Redux-2017.pdf

Slide 27

Slide 27 text

@holly_cummins #RedHat https://www.business2community.com/cloud-computing/overprovisioning-always-on-resources-lead-to-26-6-billion-in-public-cloud-waste-expected-in-2021-02381033 2021 study

Slide 28

Slide 28 text

@holly_cummins #RedHat $26.6 billion https://www.business2community.com/cloud-computing/overprovisioning-always-on-resources-lead-to-26-6-billion-in-public-cloud-waste-expected-in-2021-02381033 2021 study

Slide 29

Slide 29 text

@holly_cummins #RedHat $26.6 billion wasted by always-on cloud instances https://www.business2community.com/cloud-computing/overprovisioning-always-on-resources-lead-to-26-6-billion-in-public-cloud-waste-expected-in-2021-02381033 2021 study

Slide 30

Slide 30 text

#RedHat @[email protected] it’s not just runtime costs

Slide 31

Slide 31 text

#RedHat @[email protected] embodied carbon it’s not just runtime costs

Slide 32

Slide 32 text

#RedHat @[email protected] why does this happen?

Slide 33

Slide 33 text

@holly_cummins #RedHat managing machines is hard

Slide 34

Slide 34 text

@holly_cummins #RedHat managing machines is hard

Slide 35

Slide 35 text

Slide 36

Slide 36 text

#RedHat @[email protected] “perhaps someone forgot to turn them off” Antithesis Institute

Slide 37

Slide 37 text

Slide 38

Slide 38 text

#RedHat @[email protected] projects ended

Slide 39

Slide 39 text

#RedHat @[email protected] projects ended business processes changed

Slide 40

Slide 40 text

#RedHat @[email protected] projects ended business processes changed over-provisioning

Slide 41

Slide 41 text

#RedHat @[email protected] projects ended business processes changed over-provisioning isolation requirements

Slide 42

Slide 42 text

@holly_cummins #RedHat risk averse processes

Slide 43

Slide 43 text

@holly_cummins #RedHat “we run this as a batch job on weekends, but the servers stay up all week” “

Slide 44

Slide 44 text

@holly_cummins #RedHat “we run this as a batch job on weekends, but the servers stay up all week”

Slide 45

Slide 45 text

@holly_cummins #RedHat “we only use this system in UK working hours, but we leave it running 24/7 ” “

Slide 46

Slide 46 text

@holly_cummins #RedHat “we only use this system in UK working hours, but we leave it running 24/7 ”

Slide 47

Slide 47 text

@holly_cummins #RedHat auto-scaling algorithms are optimised for availability

Slide 48

Slide 48 text

@holly_cummins #RedHat green computing model: the four vowels

Slide 49

Slide 49 text

@holly_cummins #RedHat green computing model: the four vowels

Slide 50

Slide 50 text

@holly_cummins #RedHat green computing model: the four vowels elasticity

Slide 51

Slide 51 text

@holly_cummins #RedHat green computing model: the four vowels elasticity utilisation

Slide 52

Slide 52 text

@holly_cummins #RedHat green computing model: the four vowels elasticity utilisation efficiency

Slide 53

Slide 53 text

@holly_cummins #RedHat green computing model: the four vowels elasticity utilisation efficiency utility

Slide 54

Slide 54 text

@holly_cummins #RedHat green computing model: the four vowels elasticity utilisation efficiency utility

Slide 55

Slide 55 text

@holly_cummins #RedHat application utilisation

Slide 56

Slide 56 text

@holly_cummins #RedHat application utilisation high utilisation good case

Slide 57

Slide 57 text

@holly_cummins #RedHat application utilisation over-utilisation very bad case

Slide 58

Slide 58 text

@holly_cummins #RedHat application utilisation over-utilisation very bad case under-utilisation wasteful case

Slide 59

Slide 59 text

@holly_cummins #RedHat application elasticity high utilisation good case @holly_cummins

Slide 60

Slide 60 text

@holly_cummins #RedHat application elasticity scale-up good utilisation @holly_cummins

Slide 61

Slide 61 text

@holly_cummins #RedHat application elasticity scale-down good utilisation @holly_cummins

Slide 62

Slide 62 text

@holly_cummins #RedHat green computing model: the four vowels elasticity utilisation efficiency utility

Slide 63

Slide 63 text

@holly_cummins #RedHat green computing model: the four vowels elasticity utilisation efficiency utility

Slide 64

Slide 64 text

@holly_cummins #RedHat There is nothing so useless as doing efficiently that which should not be done at all. Peter Drucker why utility matters

Slide 65

Slide 65 text

@holly_cummins #RedHat “efficient zombies”

Slide 66

Slide 66 text

@holly_cummins #RedHat how do we solve the zombie problem?

Slide 67

Slide 67 text

@holly_cummins #RedHat how do we solve the zombie problem? detection and destruction

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

@holly_cummins #RedHat system archaeology … is not easy

Slide 70

Slide 70 text

@holly_cummins #RedHat scream test

Slide 71

Slide 71 text

@holly_cummins #RedHat “eco-monkey”

Slide 72

Slide 72 text

@holly_cummins #RedHat the scream is real

Slide 73

Slide 73 text

@holly_cummins #RedHat the scream is real this internal server doesn’t seem to have a purpose

Slide 74

Slide 74 text

@holly_cummins #RedHat the scream is real this internal server doesn’t seem to have a purpose let’s turn it off!

Slide 75

Slide 75 text

@holly_cummins #RedHat the scream is real this internal server doesn’t seem to have a purpose uh … why did the backbone of a client’s network just vanish? let’s turn it off!

Slide 76

Slide 76 text

@holly_cummins #RedHat the scream is real this internal server doesn’t seem to have a purpose uh … why did the backbone of a client’s network just vanish? let’s turn it off! oops.

Slide 77

Slide 77 text

@holly_cummins #RedHat IT Department, UK Bank let’s figure out what all these cloud workloads are, since I’m paying for them long meetings

Slide 78

Slide 78 text

@holly_cummins #RedHat IT Department, UK Bank let’s figure out what all these cloud workloads are, since I’m paying for them long meetings

Slide 79

Slide 79 text

@holly_cummins #RedHat long emails

Slide 80

Slide 80 text

@holly_cummins #RedHat tags

Slide 81

Slide 81 text

@holly_cummins #RedHat all the —opses

Slide 82

Slide 82 text

@holly_cummins #RedHat GreenOps

Slide 83

Slide 83 text

@holly_cummins #RedHat GreenOps greenops is a mid-sized trilobite (really)

Slide 84

Slide 84 text

@holly_cummins #RedHat FinOps figuring out who in your company forgot to turn off their cloud

Slide 85

Slide 85 text

@holly_cummins #RedHat

Slide 86

Slide 86 text

@holly_cummins #RedHat backstage.io

Slide 87

Slide 87 text

@holly_cummins #RedHat backstage.io •cost insights plugin

Slide 88

Slide 88 text

@holly_cummins #RedHat backstage.io •cost insights plugin •cloud carbon footprint plugin

Slide 89

Slide 89 text

• Densify • Granulate • Turbonomic Application Resource Management • TSO Logic • etc AIOps

Slide 90

Slide 90 text

21% improvement from installing Turbonomic in IBM CIO office

Slide 91

Slide 91 text

@holly_cummins #RedHat traffic monitoring

Slide 92

Slide 92 text

@holly_cummins #RedHat but. knowing is only half the battle.

Slide 93

Slide 93 text

@holly_cummins #RedHat the ikea effect

Slide 94

Slide 94 text

@holly_cummins #RedHat the ikea effect labour

Slide 95

Slide 95 text

@holly_cummins #RedHat the ikea effect labour

Slide 96

Slide 96 text

@holly_cummins #RedHat the ikea effect labour love

Slide 97

Slide 97 text

@holly_cummins #RedHat shut it down? but … what if I need this cluster later?

Slide 98

Slide 98 text

@holly_cummins #RedHat elasticity native quarkus starts faster than a light bulb

Slide 99

Slide 99 text

@holly_cummins #RedHat ultimate elasticity

Slide 100

Slide 100 text

@holly_cummins #RedHat we don’t switch the light off because we’re not sure if it will come back on

Slide 101

Slide 101 text

@holly_cummins #RedHat we don’t switch the server off because we’re not sure if it will come back on happens all the time

Slide 102

Slide 102 text

@holly_cummins #RedHat we don’t switch the server off because it would be too much work to recreate it happens all the time

Slide 103

Slide 103 text

@holly_cummins #RedHat

Slide 104

Slide 104 text

@holly_cummins #RedHat

Slide 105

Slide 105 text

@holly_cummins #RedHat turning it off and on again must

Slide 106

Slide 106 text

@holly_cummins #RedHat turning it off and on again must • be fast

Slide 107

Slide 107 text

@holly_cummins #RedHat turning it off and on again must • be fast • actually work

Slide 108

Slide 108 text

@holly_cummins #RedHat turning it off and on again must • be fast • actually work • idempotency

Slide 109

Slide 109 text

@holly_cummins #RedHat turning it off and on again must • be fast • actually work • idempotency • resiliency

Slide 110

Slide 110 text

@holly_cummins #RedHat making turning servers off as safe and easy as turning lights off

Slide 111

Slide 111 text

@holly_cummins #RedHat LightSwitchOps making turning servers off as safe and easy as turning lights off

Slide 112

Slide 112 text

@holly_cummins #RedHat simple scripts we used to leave our applications running all the time @darkandnerdy, Chicago DevOpsDays

Slide 113

Slide 113 text

@holly_cummins #RedHat simple scripts we used to leave our applications running all the time when we scripted turning them off at night, we reduced our cloud bill by 30% @darkandnerdy, Chicago DevOpsDays

Slide 114

Slide 114 text

@holly_cummins #RedHat

Slide 115

Slide 115 text

@holly_cummins #RedHat GitOps

Slide 116

Slide 116 text

@holly_cummins #RedHat GitOps (infrastructure as code)

Slide 117

Slide 117 text

@holly_cummins #RedHat

Slide 118

Slide 118 text

@holly_cummins #RedHat spin it down

Slide 119

Slide 119 text

@holly_cummins #RedHat kubectl apply -f all-my-cluster/ spin it down spin it up

Slide 120

Slide 120 text

@holly_cummins #RedHat kubectl apply -f all-my-cluster/ spin it down spin it up

Slide 121

Slide 121 text

@holly_cummins #RedHat kubectl apply -f all-my-cluster/ ansible-playbook stuff.yml spin it down spin it up

Slide 122

Slide 122 text

reducing snowflakes reduces redundancy

Slide 123

Slide 123 text

we need to have another copy of our expensive cluster in another region so we have failover!

Slide 124

Slide 124 text

we need to have another copy of our expensive cluster in another region so we have failover! uh … sounds expensive. are you sure about that?

Slide 125

Slide 125 text

rapid recovery does not require redundant servers

Slide 126

Slide 126 text

zombie reduction does not need to be fancy

Slide 127

Slide 127 text

@holly_cummins #RedHat large bank, 2013 50% reduction in CPUs with a lease system

Slide 128

Slide 128 text

@holly_cummins #RedHat large bank, 2013 50% reduction in CPUs with a lease system

Slide 129

Slide 129 text

things that (maybe) don’t help

Slide 130

Slide 130 text

@holly_cummins #RedHat things that (maybe) don’t help “out of sight, out of mind” cloud

Slide 131

Slide 131 text

@holly_cummins #RedHat

Slide 132

Slide 132 text

@holly_cummins #RedHat things that (maybe) don’t help virtualisation 2019 survey 30% of virtual servers doing no useful work

Slide 133

Slide 133 text

@holly_cummins #RedHat things that (maybe) don’t help virtualisation 2019 survey 30% of virtual servers doing no useful work 50% of virtual servers active less than 5% of the time

Slide 134

Slide 134 text

#RedHat @[email protected] you still need to remember to turn the virtual machine off

Slide 135

Slide 135 text

what about serverless?

Slide 136

Slide 136 text

modernising to serverless is a big lift

Slide 137

Slide 137 text

may not suit latency-sensitive workloads

Slide 138

Slide 138 text

“we solve the cold-start problem by … … keeping an instance running but not billing you”

Slide 139

Slide 139 text

@holly_cummins #RedHat application serverless systems may have high overheads

Slide 140

Slide 140 text

@holly_cummins #RedHat control plane application serverless systems may have high overheads

Slide 141

Slide 141 text

@holly_cummins #RedHat control plane application serverless systems may have high overheads

Slide 142

Slide 142 text

@holly_cummins #RedHat control plane application serverless systems may have high overheads

Slide 143

Slide 143 text

https://hotcarbon.org/pdf/hotcarbon22-sharma.pdf

Slide 144

Slide 144 text

https://hotcarbon.org/pdf/hotcarbon22-sharma.pdf virtualisation overheads mean each function request can use 30x more energy than a plain http server

Slide 145

Slide 145 text

are all parts of the system elastic?

Slide 146

Slide 146 text

things that definitely don’t help

Slide 147

Slide 147 text

@holly_cummins #RedHat things that don’t help prevention

Slide 148

Slide 148 text

@holly_cummins #RedHat things that don’t help prevention (?!)

Slide 149

Slide 149 text

surely shutting the barn door before the horse has left is a good idea?

Slide 150

Slide 150 text

prevention == heavy governance

Slide 151

Slide 151 text

remember the ikea effect?

Slide 152

Slide 152 text

remember the ikea effect? people will not surrender servers that were hard to get

Slide 153

Slide 153 text

zombies are not just servers

Slide 154

Slide 154 text

data

Slide 155

Slide 155 text

traffic

Slide 156

Slide 156 text

zombie packets

Slide 157

Slide 157 text

@holly_cummins #RedHat internet background noise

Slide 158

Slide 158 text

@holly_cummins #RedHat internet background noise 5.5 gigabits/s

Slide 159

Slide 159 text

@holly_cummins #RedHat unsolved problem == opportunity

Slide 160

Slide 160 text

@holly_cummins #RedHat the double-win turning things off saves a lot of money

Slide 161

Slide 161 text

@holly_cummins #RedHat

Slide 162

Slide 162 text

@holly_cummins #RedHat users …

Slide 163

Slide 163 text

@holly_cummins #RedHat up utilisation aim for elasticity limit kubesprawl de-zombify know what you’re using turn it off users …

Slide 164

Slide 164 text

@holly_cummins #RedHat 1-2%

Slide 165

Slide 165 text

@holly_cummins #RedHat tool creators, support 1-2%

Slide 166

Slide 166 text

@holly_cummins #RedHat better utilisation elasticity multi-tenancy de-zombification visibility disposability tool creators, support 1-2%

Slide 167

Slide 167 text

GreenOps FinOps AIOps GitOps LightSwitchOps

Slide 168

Slide 168 text

GreenOps FinOps AIOps GitOps LightSwitchOps

Slide 169

Slide 169 text

thank you @[email protected] slides