Slide 1

Slide 1 text

Treating Your Infrastructure Like Garbage Saturday, April 20, 13

Slide 2

Slide 2 text

Mike Fiedler, Datadog Twitter: @mikefiedler Saturday, April 20, 13

Slide 3

Slide 3 text

What do we care about? • servers • uptime • load • alerts Saturday, April 20, 13

Slide 4

Slide 4 text

What should we care about? • services • uptime • performance • alerts Saturday, April 20, 13

Slide 5

Slide 5 text

What’s the difference? • approach • perspective • business goals Saturday, April 20, 13

Slide 6

Slide 6 text

Stop server hugging! (C) GigaOM Saturday, April 20, 13

Slide 7

Slide 7 text

Everything fails. Be ready for it. http://youtu.be/drQlSptFXXI Watch this: Saturday, April 20, 13

Slide 8

Slide 8 text

So, how? Saturday, April 20, 13

Slide 9

Slide 9 text

Popular stack LB Data User Web Saturday, April 20, 13

Slide 10

Slide 10 text

Choose elasticity • Use distributed, self-healing storage Data Saturday, April 20, 13

Slide 11

Slide 11 text

Web Choose elasticity • Design web tier to tolerate failure • session state? • application behavior? service = n + 1 Saturday, April 20, 13

Slide 12

Slide 12 text

Choose elasticity • Monitor web tier health • Not always as easy at it seems! LB Saturday, April 20, 13

Slide 13

Slide 13 text

Choose elasticity User Saturday, April 20, 13

Slide 14

Slide 14 text

Missing piece? LB Data User Web Saturday, April 20, 13

Slide 15

Slide 15 text

Monitoring LB Data Web User Monitor Saturday, April 20, 13

Slide 16

Slide 16 text

[demo] https://github.com/miketheman/fullstack Saturday, April 20, 13

Slide 17

Slide 17 text

Tools used • AWS EC2 • MongoDB • Bottle.py • Python • Apache HTTP • HAProxy • Siege • Chef • Ruby • Spiceweasel • Datadog • Money Saturday, April 20, 13

Slide 18

Slide 18 text

Skills employed • System architecture design • Performance monitoring • Capacity & disaster planning Saturday, April 20, 13

Slide 19

Slide 19 text

What did you just see? • A design pattern • Experiencing failure and recovery • Disposable servers, valuable service Saturday, April 20, 13

Slide 20

Slide 20 text

#HugOps Saturday, April 20, 13