Slide 1

Slide 1 text

Hello. My name is Mårten Gustafson

Slide 2

Slide 2 text

...I used to work here...

Slide 3

Slide 3 text

...now I work here...

Slide 4

Slide 4 text

...doing mostly this...

Slide 5

Slide 5 text

...but spend a fair share of my time looking at metrics like this...

Slide 6

Slide 6 text

AUTOMATE ALL THE TINGS! ...being rabid about this, which makes me a fan of...

Slide 7

Slide 7 text

...DevOps...and it’s general concepts...but...

Slide 8

Slide 8 text

the OPS side of DEV ...I’m talking about this

Slide 9

Slide 9 text

OpsDev ...I think: we (as developers) need to think about this!

Slide 10

Slide 10 text

develop for operations * I think we need to get better at this

Slide 11

Slide 11 text

develop for production * I think we need to focus on this, which is why...

Slide 12

Slide 12 text

...I’ll start off with one of the most boring things to most developers...

Slide 13

Slide 13 text

logging

Slide 14

Slide 14 text

* Huge files * Messy log format * Hard to filter * Hard to correlate * Might as well...

Slide 15

Slide 15 text

/bin/my-awesome-service 2&>1 > /dev/null

Slide 16

Slide 16 text

surprisingly hard

Slide 17

Slide 17 text

or... we’re surprisingly bad

Slide 18

Slide 18 text

so, logging

Slide 19

Slide 19 text

1. do it

Slide 20

Slide 20 text

framework pick a framework that’s: * makes sense * is de-facto standard? * is flexible * is lightweight * is easy to use

Slide 21

Slide 21 text

consistent try to log in a consistent manner * think of log messages in terms of operation * what’s an error (should somebody be woken up?) * what’s a trace (who’s the audience?) * etc

Slide 22

Slide 22 text

2. rotation & retention

Slide 23

Slide 23 text

resources are finite * rotate your log files * put an upper bound on the size of one log file

Slide 24

Slide 24 text

define window of interest * for the local disc: toss anythings that’s older than X, or: * compress (and then toss when they’re even older) * will you ever look in compressed log files?

Slide 25

Slide 25 text

3. formatting

Slide 26

Slide 26 text

0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)

Slide 27

Slide 27 text

0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) 0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) 0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) 0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)

Slide 28

Slide 28 text

easy on the eyes * aligned (easier on the eyes)

Slide 29

Slide 29 text

easy on the tools * tail & grep friendly

Slide 30

Slide 30 text

INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)

Slide 31

Slide 31 text

INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)

Slide 32

Slide 32 text

https://github.com/codahale/logula * have a look at this

Slide 33

Slide 33 text

3. destinations

Slide 34

Slide 34 text

* when this is your reality, you don’t really only want log files on local machine disk

Slide 35

Slide 35 text

* when this is your reality, you don’t really only want log files on local machine disk

Slide 36

Slide 36 text

to name a few

Slide 37

Slide 37 text

SMTP file syslog SQL AMQP IRC XMPP to name a few

Slide 38

Slide 38 text

critical = “real time”

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

audit = remote + restricted access

Slide 41

Slide 41 text

mix and match

Slide 42

Slide 42 text

always fallback on local file (fallacies of distributed computing) * Not SAN, NFS, NAS, etc

Slide 43

Slide 43 text

(beware of sensitive data) * security sensitive: keys, passwords, etc * integrity sensitive: whatever you’re users might provide that’s not for everyone’s eyes

Slide 44

Slide 44 text

4. separation

Slide 45

Slide 45 text

we usually log most things

Slide 46

Slide 46 text

we usually don’t separate

Slide 47

Slide 47 text

UTILIZE ALL THE CONTEXTS!

Slide 48

Slide 48 text

multiple logs & context logs

Slide 49

Slide 49 text

* Look at the SiftingAppender in logback-classic for an example

Slide 50

Slide 50 text

traditional log * Look at the SiftingAppender in logback-classic for an example

Slide 51

Slide 51 text

traditional log userid * Look at the SiftingAppender in logback-classic for an example

Slide 52

Slide 52 text

traditional log userid session id * Look at the SiftingAppender in logback-classic for an example

Slide 53

Slide 53 text

5. configuration

Slide 54

Slide 54 text

sane defaults * location * rotation

Slide 55

Slide 55 text

per environment * have a configuration that automatically adapts to the environment * log everything to stdout in local development * log everything to file in test * log X to Y and Z to FOO in prod

Slide 56

Slide 56 text

re:configurable * don’t require a deploy to change a log level * provide an API * use JMX * so that you can tweak logging (enable tracing) right in production when you need to

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

metrics

Slide 59

Slide 59 text

let your code speak

Slide 60

Slide 60 text

INSTRUMENT ALL THE CODE!

Slide 61

Slide 61 text

meters

Slide 62

Slide 62 text

counters meters timers gauges histograms

Slide 63

Slide 63 text

EXPOSE ALL THE VALUES!

Slide 64

Slide 64 text

...or whatever makes sense for you ...and yes, it’s Comic Sans for BAYEUX ...and yes, it’s Helvetica for JMX

Slide 65

Slide 65 text

JMX JSON XML HTTP XMPP AMQP THRIFT BAYEUX RMI CSV ...or whatever makes sense for you ...and yes, it’s Comic Sans for BAYEUX ...and yes, it’s Helvetica for JMX

Slide 66

Slide 66 text

* put all your values into your tools and services * BUT DON’T FORGET THE AD-HOC, LOCAL, USAGE (ie JMX)

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

self checks!

Slide 69

Slide 69 text

@Override protected Result check() throws Exception { if (database.ping()) { return Result.healthy(); } return Result.unhealthy("Can't ping database"); } * databases * other services * other dependencies * make them explicitly invokable

Slide 70

Slide 70 text

trend on them

Slide 71

Slide 71 text

alert on them

Slide 72

Slide 72 text

...render them as markers/buttons/light bulbs/whatever

Slide 73

Slide 73 text

make instrumentation a habit * just do it

Slide 74

Slide 74 text

find optimal usage later * you’ll never use it if it ain’t there

Slide 75

Slide 75 text

No content

Slide 76

Slide 76 text

http://metrics.codahale.com/

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

packaging * adaptive to different environments

Slide 79

Slide 79 text

one package * one package, regardless of environment

Slide 80

Slide 80 text

bundle dependencies

Slide 81

Slide 81 text

über jar (maven shade plugin) * for example

Slide 82

Slide 82 text

adaptive configuration * adaptive to different environments

Slide 83

Slide 83 text

isolated * mocks dependencies with (static) dummy answers

Slide 84

Slide 84 text

org.mockito mockito-core test * maven example

Slide 85

Slide 85 text

org.mockito mockito-core test * maven example

Slide 86

Slide 86 text

org.mockito mockito-core test * maven example

Slide 87

Slide 87 text

org.mockito mockito-core compile * maven example

Slide 88

Slide 88 text

127.0.0.1 * expect everything to be available on 127.0.0.1

Slide 89

Slide 89 text

test / qa / staging / prod * the usual suspects

Slide 90

Slide 90 text

either detect your environment * ie, bundle configurations for all environments

Slide 91

Slide 91 text

or load externalized configuration * DNS * ZooKeeper * CouchDB * Doozer * External property/YML/JSON/whatever files ** in one sane specified location (preferably the working directory)

Slide 92

Slide 92 text

strive for zero-touch configuration * packages should JUST WORK

Slide 93

Slide 93 text

No content

Slide 94

Slide 94 text

the operational aspect needs to be an integral part of:

Slide 95

Slide 95 text

the operational aspect needs to be an integral part of: development

Slide 96

Slide 96 text

the operational aspect needs to be an integral part of: design

Slide 97

Slide 97 text

the operational aspect needs to be an integral part of: architecture

Slide 98

Slide 98 text

the operational aspect needs to be an integral part of: reasoning

Slide 99

Slide 99 text

And therefore, a quick comment on...

Slide 100

Slide 100 text

*aaS ....this...

Slide 101

Slide 101 text

...[whatever] as a service * what ever as a service

Slide 102

Slide 102 text

OUTSOURCE ALL THE TINGS!

Slide 103

Slide 103 text

to name some * logging * alerting

Slide 104

Slide 104 text

Tools! Nice!

Slide 105

Slide 105 text

(new shiny object syndrome)

Slide 106

Slide 106 text

No content

Slide 107

Slide 107 text

aka

Slide 108

Slide 108 text

fallacies of distributed computing

Slide 109

Slide 109 text

No content

Slide 110

Slide 110 text

1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous (- James Gosling) Fallacies of distributed computing - Peter Deutsch

Slide 111

Slide 111 text

1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous (- James Gosling) Fallacies of distributed computing - Peter Deutsch

Slide 112

Slide 112 text

so when using this, we need to seriously consider...

Slide 113

Slide 113 text

1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous (- James Gosling) Fallacies of distributed computing - Peter Deutsch ...this * reliability (overall, geo location, connectivity) * security (communication, retention) * cost (of using, of not being available)

Slide 114

Slide 114 text

by all means...

Slide 115

Slide 115 text

...use services

Slide 116

Slide 116 text

but not only

Slide 117

Slide 117 text

don’t bet your operation on their availability

Slide 118

Slide 118 text

No content

Slide 119

Slide 119 text

responsibility

Slide 120

Slide 120 text

logging

Slide 121

Slide 121 text

YOU

Slide 122

Slide 122 text

metrics

Slide 123

Slide 123 text

YOU

Slide 124

Slide 124 text

packaging

Slide 125

Slide 125 text

YOU

Slide 126

Slide 126 text

configuration

Slide 127

Slide 127 text

YOU

Slide 128

Slide 128 text

sane defaults

Slide 129

Slide 129 text

YOU

Slide 130

Slide 130 text

YOU

Slide 131

Slide 131 text

NOT operations

Slide 132

Slide 132 text

NOT your hosting provider

Slide 133

Slide 133 text

NOT your boss

Slide 134

Slide 134 text

NOT service provider

Slide 135

Slide 135 text

NOT your colleague

Slide 136

Slide 136 text

YOU

Slide 137

Slide 137 text

develop accordingly

Slide 138

Slide 138 text

develop for operations

Slide 139

Slide 139 text

love your operations @martengustafson http://marten.gustafson.pp.se/ [email protected]