Slide 1

Slide 1 text

How Sysadmins View Your Stinkin’ Code Shawn Stratton Senior Systems Engineer / Architect Monday, October 29, 12

Slide 2

Slide 2 text

Who Am I? Currently work for Discovery Communications, Inc as a Systems Architect / Engineer. Became a Systems Architect 2011. Worked as a developer for NationalGuard.com, HowStuffWorks.com. PHP Developer since early 2000’s. Linux user since 1998. Monday, October 29, 12

Slide 3

Slide 3 text

Sysadmin at Work! http://xkcd.com/705/ Sysadmins are neurotic, OCD, and more than a little Paranoid. Beware! Monday, October 29, 12

Slide 4

Slide 4 text

So what is a Sysadmin? Server monkey. Administrates servers to ensure uptime, stability, and performance. Responds to service outages, power outages, errors, etc. Often after hours. Monday, October 29, 12

Slide 5

Slide 5 text

Great Sysadmin Qualities Calm during the inevitable firestorm. Analytical and understanding of everything that is being ran. Obsession about Single Points of Failure, Bus Factor, anything that impacts downtime. Monday, October 29, 12

Slide 6

Slide 6 text

So who’s our nemesis? Monday, October 29, 12

Slide 7

Slide 7 text

Developers! Developers! Developers! Monday, October 29, 12

Slide 8

Slide 8 text

What is a Developer? Compiler, uses caffeine to compile business requirements into software. Constantly trying to “improve” software and add “functionality.” Monday, October 29, 12

Slide 9

Slide 9 text

What’s wrong with that? Nothing, but... Sysadmins care about stability, developers introduce instability. Monday, October 29, 12

Slide 10

Slide 10 text

And then there are devops... Monday, October 29, 12

Slide 11

Slide 11 text

So what do we care about? Fault tolerance. Error handling. Configurability. Scalability. Minimal resource utilization. Sanity. Monday, October 29, 12

Slide 12

Slide 12 text

Fault Tolerance System Multiple Servers. “Self Healing” systems. Software Exception handling. Tolerance just means try to handle errors as well as possible. Monday, October 29, 12

Slide 13

Slide 13 text

Error Reporting Meaningful errors that we can act on. Tell us what failed, not just that a failure has occurred. Clear messages that we can understand. You have a parse error at line x of file y. Concise errors that don’t overwhelm. Parseable so we can automate responses. Monday, October 29, 12

Slide 14

Slide 14 text

Configurability / Flexibility Make values configurable. Database endpoints. HTTP endpoints. Directories you write to/read from. Cache endpoints. Monday, October 29, 12

Slide 15

Slide 15 text

Scalability In our world, we talk about Horizontal Scalability (scale out vs. up.) Basically means the ability to handle more traffic by adding additional servers. Monday, October 29, 12

Slide 16

Slide 16 text

Disk Access Disk access is per server. Never assume you’ll talk to the same server twice in a row. NFS is slooow. Sticky sessions break more things than they fix. Monday, October 29, 12

Slide 17

Slide 17 text

Database Master / Slave Master should be write-only. Dedicated slave for reporting / backups. Clustered Don’t expect instant consistency. Be aware of what you’ll lose. Monday, October 29, 12

Slide 18

Slide 18 text

That’s all great but... Where are the tools? Monday, October 29, 12

Slide 19

Slide 19 text

Disclaimer I’m a Linux administrator, a Linux/Mac user, and the majority of PHP Serving is done via Linux. The tools discussed are Linux tools. I’ve written everything for what’s included in Ubuntu (the common denominator). Sorry Windows guys. Monday, October 29, 12

Slide 20

Slide 20 text

Making Requests Browser, like everyone else. curl from the command line. Monday, October 29, 12

Slide 21

Slide 21 text

Curl - Test output Monday, October 29, 12

Slide 22

Slide 22 text

Curl - Better than a Browser Monday, October 29, 12

Slide 23

Slide 23 text

Checking CPU /Memory utilization xDebug + Cache Grind. Zend Debugger + Zend Studio. XHProf. uptime top htop Profiling Monitoring Monday, October 29, 12

Slide 24

Slide 24 text

Load Averages - wtf? Commonly shown as 3 values, e.g. 1.06 1.14 1.36 Numbers are load over 1 minute, 5 minutes, 15 minutes Numbers represent load for a single core 1 = 100% Magic numbers (when you’re good): .7 per core is good utilization. 1 per core is fully utilized (start planning new hw.) Monday, October 29, 12

Slide 25

Slide 25 text

xDebug http://xdebug.org/docs/install Simple, efficient, open source. Integrated into PHPStorm, Netbeans, and most every other IDE (beside Zend Studio). Monday, October 29, 12

Slide 26

Slide 26 text

xDebug + Cache Grind Monday, October 29, 12

Slide 27

Slide 27 text

Zend Debugger Very well integrated to Zend Studio & Zend Server. Dead simple to use. Monday, October 29, 12

Slide 28

Slide 28 text

Zend Debugger Monday, October 29, 12

Slide 29

Slide 29 text

XHProf https://github.com/facebook/xhprof Runs great constantly, but does not integrate well with an IDE. Monday, October 29, 12

Slide 30

Slide 30 text

Traditional (top) Monday, October 29, 12

Slide 31

Slide 31 text

Newer (htop) Monday, October 29, 12

Slide 32

Slide 32 text

Checking Disk IO inclued. dtrace. strace. Monday, October 29, 12

Slide 33

Slide 33 text

inclued Get a map of your includes. This is more useful than it seems. Monday, October 29, 12

Slide 34

Slide 34 text

dtrace From solaris, this will show you every system call, command is a bit complicated. Monday, October 29, 12

Slide 35

Slide 35 text

strace Different output and interaction than dtrace, same purpose. Monday, October 29, 12

Slide 36

Slide 36 text

Some other tools for Operations & Development. Monday, October 29, 12

Slide 37

Slide 37 text

CPU / System Metrics Too many to list but Cacti & Munin are common and have low barrier to entry. Monday, October 29, 12

Slide 38

Slide 38 text

Icinga / Nagios Alerts you to problems Monday, October 29, 12

Slide 39

Slide 39 text

Wrapping Up Monday, October 29, 12

Slide 40

Slide 40 text

Don’t Tell us it works on this machine or that machine. Forget to tell us your requirements while you’re making a deadline in the past. Claim you’ve found yet another [PHP|Apache|MySQL] bug when something doesn’t work. Do foolish things. e.g. Serve dynamically resized images (this is what gearman and caching is for.) Monday, October 29, 12

Slide 41

Slide 41 text

Do Control your IO & Network access. Be aware of your environment. Use the tools you have available and learn new ones. Monday, October 29, 12