Your Own Metric System - Speaker Deck

Slide 1

Slide 1 text

Your Own Metric System Ian Dees ·@undees OSCON 2012 Hello, and welcome. I’m Ian.

Slide 2

Slide 2 text

By day, I make oscilloscopes. By night, I play guitar irresponsibly.

Slide 3

Slide 3 text

pragprog.com/titles/dhwcr I also write books, mostly about Ruby topics. A group of us—me, plus two major contributors to the Cucumber test framework—are working on a new book of speciﬁc testing techniques. Ruby and its various test frameworks were my gateway drug to code metrics, though for this talk we’ll be concentrating on other languages.

Slide 4

Slide 4 text

Oscilloscopes have been available commercially since the 1940s. Their architecture changes slowly. Software needs to last, and it tends to last whether we wish for a rewrite or not. Our team’s exploration of this mix of old and new code led to our interest in code metrics.

Slide 5

Slide 5 text

Setting And even if you’re not working on a large legacy code base, there are likely issues that we face in common.

Slide 6

Slide 6 text

The forces against us ❥ Entropy drags our code down ❥ Apathy drags us down There are a lot of forces that push on us and our teams. Today, I want to talk about two very different forces that have surprisingly similar effects: the entropy that drags our code down over time, and the apathy that drags us down personally over time.

Slide 7

Slide 7 text

Stay engaged and productive How do we ﬁght these forces? How do we keep our interest after our tenth straight hour wading into the weeds of an incomprehensible legacy routine? How do we prevent the code we write today from being someone’s nightmare tomorrow?

Slide 8

Slide 8 text

Knowing our code can help us do our jobs and have more fun We have many tools in our chest; one is a good set of metrics—information about our code base. My hope is that you’ll consider code metrics at least as an intriguing, low-cost possibility for making the day go by a little better.

Slide 9

Slide 9 text

Risk #1 Missing or poor information can waste our time or lead us to cause harm The risk with doing this—and there’s always a risk—is that we might waste our time making changes we don’t need, or worse, end up trashing our code in the name of blindly satisfying some target number.

Slide 10

Slide 10 text

Two steps forward 1. Ask questions about your code 2. Choose metrics that answer those questions How do we address that risk? By letting our project needs dictate our metric choices, not the other way around. It sounds simple. But as we’ll see, it’s possible to misapply a metric and make a big mess.

Slide 11

Slide 11 text

Purpose of metrics Since getting the reasons right is so important, let’s talk about why we’re gathering this data.

Slide 12

Slide 12 text

purpose of metrics Help you answer a question The purpose of any metric should be to help you answer a question. Since we’re developers who maybe also do a little testing, let’s ask a few example questions now.

Slide 13

Slide 13 text

purpose of metrics What mess should I clean up next? For example, if several ﬁles need some love, where should I concentrate my efforts?

Slide 14

Slide 14 text

purpose of metrics The product backlog isn’t a substitute for your brain Something else may be giving us guidance on what part of the code to work in—like the product backlog. But you may be in a situation where you’ve got a little more leeway, like an explicit charter to pay down technical debt.

Slide 15

Slide 15 text

Risk #2 Making structural changes can introduce new bugs (or expose existing ones) That said, when you do wander off the map, you do risk creating a bug. With legacy code, you may also uncover an existing bug and get the blame nonetheless. One way to address this risk is to improve your test coverage, and make small changes at a time. Another is to choose the right metrics; ﬁxing static analysis warnings has anecdotally been one of the lowest-risk change activities I’ve ever seen.

Slide 16

Slide 16 text

purpose of metrics Where are the bugs (likely to be)? Here’s another question we might ask. Where are the bugs? Where are the old bugs we haven’t found yet? Where are the new ones we might have created recently?

Slide 17

Slide 17 text

purpose of metrics /** * REMOVE THIS CODE * BEFORE WE SHIP! */ We can also turn to our code for ideas of what questions to ask. Has anyone seen something like this comment in production code? The number of these red ﬂags in your code is a kind of code metric you can measure and reduce.

Slide 18

Slide 18 text

purpose of metrics Have we forgotten anything for this release? That quantitative measurement—number of bad comments in the code—is helping us make a qualitative determination.

Slide 19

Slide 19 text

purpose of metrics These questions are for us The questions we’ve heard so far are things we might ask,...

Slide 20

Slide 20 text

purpose of metrics Not for someone else ...not things someone else might ask.

Slide 21

Slide 21 text

purpose of metrics Questions from others: (outside the scope of our metrics) Not that other people’s questions aren’t legitimately interesting, or that they might not apply metrics of their own.

Slide 22

Slide 22 text

purpose of metrics Should we hold the release? For example, the SQA team might be looking for red ﬂags that could hold up the release.

Slide 23

Slide 23 text

purpose of metrics time → errors/KLOC → So they might look at aggregate errors per thousand lines of code. Not something I necessarily use to make decisions as a developer, but it doesn’t scare me if this metric is in use somewhere.

Slide 24

Slide 24 text

purpose of metrics Who’s got the best KLOC or error rate? On a more sinister note, tracking rates of code production or error creation/resolution are outright destructive of teams.

Slide 25

Slide 25 text

purpose of metrics It was time to fill out the management form for the first time. When he got to the lines of code part, he thought about it for a second, and then wrote in the number: -2000. After a couple more weeks, they stopped asking Bill to fill out the form, and he gladly complied. —folklore.org There was apparently a brief, dark time at Apple when employees were tracked by lines of code produced, until Bill Atkinson showed that you can improve and shorten the code at the same time.

Slide 26

Slide 26 text

purpose of metrics Have we met our target complexity or coverage? Another, more subtle trap is setting absolute thresholds for various metrics.

Slide 27

Slide 27 text

Doing so is like blindly obeying a GPS device: sooner or later, you’ll drive off a cliff.

Slide 28

Slide 28 text

purpose of metrics Metrics serve you, not the other way around Metrics are supposed to be here for our beneﬁt.

Slide 29

Slide 29 text

purpose of metrics Keep the job fun And indeed, in addition to answering speciﬁc questions about our projects, they can make coding seem a little bit like a game where the side effect is to produce better code...

Slide 30

Slide 30 text

purpose of metrics More fun than actually working? ...as long as we still get around to writing the code eventually.

Slide 31

Slide 31 text

Risk #3 There is a trap here for the distractible We have to be careful not to spend all day writing fancier shell scripts and slapping our stats onto elaborate dashboards (though there are quick-and-cheap dashboards I like; see the Tranquil project).

Slide 32

Slide 32 text

Common metrics Now that we have a few questions in mind about our code base, let’s look at some metrics commonly used by many projects. (Later, we’ll look at writing our own.) The nice thing about prefab metrics is that we can ﬁnd open source implementations and supporting research.

Slide 33

Slide 33 text

common metrics Languages ❥ C: a case study ❥ Perl: the beginner’s experience ❥ just ask! Rather than present you with a laundry list, I’m going to stick to a few targeted examples in C and Perl. But similar tools likely exist for your language; catch me in the hall afterwards if you’d like to explore that together.

Slide 34

Slide 34 text

common metrics Repo for this talk github.com/undees/oscon The code samples you’re about to see are on GitHub; feel free to send a pull request if you’d like your favorite language to be included.

Slide 35

Slide 35 text

common metrics Cyclomatic complexity The granddaddy of modern code metrics is McCabe Cyclomatic Complexity. It’s meant to be a loose measure of how many different paths there are through a piece of code.

Slide 36

Slide 36 text

common metrics E – N + 2P The fancy explanation is that you draw a graph of control ﬂow through your function, then calculate a score from the number of edges, nodes, and return points.

Slide 37

Slide 37 text

common metrics 1. Start with a score of 1 2. Add 1 for each if, case, for, or boolean condition The simpler explanation is that we walk through the code and add a point for each decision the code has to make.

Slide 38

Slide 38 text

Volume speaking_volume( bool correct_room, bool correct_time) { if (correct_room && correct_time) { return INTELLIGIBLE; } else { // rehearsing return INAUDIBLE; } } complexity: 1 So we’d start with a value of 1 for this code sample...

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Volume speaking_volume( bool correct_room, bool correct_time) { if (correct_room && correct_time) { return INTELLIGIBLE; } else { // rehearsing return INAUDIBLE; } } complexity: 3 ...and add 1 ﬁnal point for the boolean operator. Depending on the implementation, we might add a point for the multiple returns.

Slide 41

Slide 41 text

common metrics parisc-linux.org/~bame/pmccabe pmccabe One easy-to-use implementation of this metric for C code is pmccabe.

Slide 42

Slide 42 text

$ pmccabe *.c | sort -nr | head -10 3 3 3 6 8!oscon.c(6): speaking_volume 1 1 2 16 5!oscon.c(16): main When we run it, it prints the complexity, size, and location of each function in our project.

Slide 43

Slide 43 text

common metrics Perl::Metrics::Simple CPAN has several metrics modules for Perl; Perl::Metrics::Simple is an easy one to get started with.

Slide 44

Slide 44 text

sub speaking_volume { my $correct_room = shift; my $correct_time = shift; if ($correct_room && $correct_time) { return 'intelligible'; } else { # rehearsing return 'inaudible'; } } Here’s a Perl subroutine similar to the one we saw.

Slide 45

Slide 45 text

$ countperl lib ... Tab-delimited list of subroutines, with most complex at top ----------------------------------------------------------- complexity sub path size 4 speaking_volume lib/OSCON.pm 9 ... Similar to pmccabe, Perl::Metrics::Simple gives us the size and complexity of each method.

Slide 46

Slide 46 text

Speaking of size and complexity, this paper reexamined several previous studies and found that several popular code metrics were effectively just expensive ways...

Slide 47

Slide 47 text

$ wc -l oscon.c ...of counting lines. The paper didn’t consider cyclomatic complexity alone (and there were other issues dealt with in subsequent papers by other authors), but we should always be skeptical of our own metrics. Fortunately, most tools give us both a line count and a complexity metric; we can decide for ourselves.

Slide 48

Slide 48 text

Risk #4 Blindly reducing one number can add complexity and bugs Some teams set complexity targets. In the degenerate case, they turn their code into a bunch of tiny functions that do nothing—making the overall code base more complex and prone to bugs.

Slide 49

Slide 49 text

common metrics Test coverage Another widely used metric is the percentage of your code that gets executed by your tests.

Slide 50

Slide 50 text

common metrics 1. Instrument your program 2. Watch your tests run 3. Report which lines get executed Measuring this typically involves instrumenting your code, so that you can watch it as it runs your tests.

Slide 51

Slide 51 text

common metrics Addresses “epic conﬁdence” fail opensourcebridge.org/sessions/923 Knowing our test coverage helps address the “epic conﬁdence” problem that Laura Thomson described in her Open Source Bridge talk, “How Not To Release Software.” Teams afflicted by this bug assert without evidence that their tests are great.

Slide 52

Slide 52 text

common metrics Testable code is more... testable In addition to combating hubris, measuring coverage helps us make our code more testable. Testability is not an end in itself, but a property with beneﬁcial side effects.

Slide 53

Slide 53 text

common metrics gcov For C projects, it’s easy to measure coverage. GCC comes with the gcov coverage tool.

Slide 54

Slide 54 text

int main() { assert(speaking_volume(true, true) == INTELLIGIBLE); return 0; } Here’s a test that exercises just one branch of our code from earlier.

Slide 55

Slide 55 text

$ gcc -fprofile-arcs \ -ftest-coverage \ -c oscon.c $ gcc -fprofile-arcs \ oscon.o First, we’d compile and link our program with a couple of gcov’s required ﬂags.

Slide 56

Slide 56 text

$ gcov oscon.c $ cat oscon.c.gcov Then, we’d run our tests and point gcov at the logﬁles.

Slide 57

Slide 57 text

1: 6:Volume speaking_volume(bool correct_room, bool correct_time) { 1: 7: if (correct_room && correct_time) { 1: 8: return INTELLIGIBLE; -: 9: } else { -: 10: // rehearsing ####: 11: return INAUDIBLE; -: 12: } -: 13:} The result is a list of what lines did and didn’t get executed. In this case, we never ran the “else” clause.

Slide 58

Slide 58 text

common metrics Devel::Cover Not to be outdone, Perl provides Devel::Cover.

Slide 59

Slide 59 text

$ cover -test $ cat cover_db/coverage.html You just point Devel::Cover at your tests, and it produces an HTML report for you.

Slide 60

Slide 60 text

Devel::Cover gives us more information than gcov did. We executed line 26 once, but didn’t exercise both sides of the “&&”.

Slide 61

Slide 61 text

Risk #5 High code coverage can make you think your code is good Which brings us to another thing to keep in mind. Hitting each line of code once isn’t the same as hitting each combination of branches. Code coverage is meant to help you look for holes, not to lull you into false security.

Slide 62

Slide 62 text

Custom metrics The advantage of applying commonly used measurements is good support. The downside is lack of context; the creators of those metrics have nowhere near the knowledge of your project that you do. So you may want to supplement common metrics with a few of your own. I can’t tell you what those metrics are, but I can tell you a couple of the ones I’ve seen used.

Slide 63

Slide 63 text

X-rated-ness First, let’s look at what I’ll call X-rated-ness.

Slide 64

Slide 64 text

1. 2. 3. 4. 5. 6. 7. custom metrics Carlin’s 7 Dirty Words Just as George Carlin gave us his famous list of words you can’t say on television,...

Slide 65

Slide 65 text

custom metrics 1.XXX 2.TODO 3.FIXME 4.TBD 5.HACK 6.#if 0 7.#ifndef TESTING Our 7 Dirty Words ...software teams have their own lists of bad words.

Slide 66

Slide 66 text

custom metrics 1. 2. 3. 4. 5. 6. 7. Our 7 Dirty Words (Sorry, I should have blurred those out. ;-)

Slide 67

Slide 67 text

$ ack -cl 'XXX|TODO|FIXME' oscon.c:1 This is dead simple to do with ack, the modern-day replacement for grep. Just count string occurrences across your ﬁles, and optionally do a little sorting.

Slide 68

Slide 68 text

custom metrics Test::Fixme Grepping works on nearly every language, of course. But Perl has its own speciﬁc implementation of this metric.

Slide 69

Slide 69 text

use Test::Fixme; run_tests(where => 'lib', match => qr/XXX|TODO|FIXME/); All you have to do is throw a couple of lines into a “.t” ﬁle...

Slide 70

Slide 70 text

$ make test ... t/test-fixme.t .. 1/1 # Failed test ''lib/OSCON.pm'' # at t/test-fixme.t line 2. # File: 'lib/OSCON.pm' # 34 # XXX:remove the temp limit before we deploy # Looks like you failed 1 test of 1. t/test-fixme.t .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/1 subtests ...and Perl won’t even let your tests pass if you’ve got a naughty word in your code.

Slide 71

Slide 71 text

custom metrics Churn Another metric that’s not universally used, but can still come in handy, is code churn: how often does a given piece of code change?

Slide 72

Slide 72 text

custom metrics Recently changed code may have new bugs Churn can tell us what parts have changed recently; those parts may have new bugs.

Slide 73

Slide 73 text

custom metrics Frequently-changed code may have problems Churn can also tell us what parts change often; those parts can become trouble spots.

Slide 74

Slide 74 text

git log --pretty=oneline \ --since=2012-05-04 \ oscon.c | wc -l You can get as crazy as you want with churn: examining which lines have changed the most, which functions have had the most people working on them, and so on. Git can tell you a lot more than a simple metric can, but if you’re on a centralized system you may want to just grab the data yourself and pick it apart with UNIX tools.

Slide 75

Slide 75 text

custom metrics Missing documentation If you’re writing code that’s going to get used by developers outside of your team, you might use a metric like documentation coverage to identify the parts of the code that most badly need docs.

Slide 76

Slide 76 text

custom metrics Errors by time of day Most of the metrics we’ve seen so far have been one-shot numbers. But it’s also possible to track things over time, like occurrences of compiler errors or test failures.

Slide 77

Slide 77 text

custom metrics Play by Play: Zed Shaw peepcode.com/products/play-by-play-zed-shaw Zed Shaw does a great demo of this in his Play by Play screencast with PeepCode.

Slide 78

Slide 78 text

What do we get from all this? We’ve talked about the kinds of questions we want to ask about our code, and the metrics that can help us answer those questions. Now for the bigger question: what’s the effect on our software? Well, here are some of the things that happened with my team.

Slide 79

Slide 79 text

Found a real dependency problem with pmccabe One, I found a surprisingly high complexity number in what was supposed to be a simple math routine. Somebody had snuck in an unwanted dependency on an unrelated system.

Slide 80

Slide 80 text

Found dead code with gcov While looking for untested code, we found some code that didn’t need any tests—because it was never called anyway!

Slide 81

Slide 81 text

Did a quick churn check at manual test time I personally like to look at what features have changed when it’s time to do manual testing.

Slide 82

Slide 82 text

Found places we can DRY up the code Some designs come at a time when our understanding of the domain is imperfect. As our understanding improves, we refactor the code. Complexity metrics can be handy for prioritizing.

Slide 83

Slide 83 text

Relative, not absolute! One of the common themes woven through much of this discussion is that absolute limits for code metrics are not as helpful as relative measures within a project.

Slide 84

Slide 84 text

Content-Type: multipart/wish My hope is that you come away from this session with a couple of ideas for metrics you’d like to try, and with the well-founded belief that you can get started with very little time investment.

Slide 85

Slide 85 text

❥ Find the answers you need ❥ Look like heroes ❥ Have fun I hope you ﬁnd the answers you need for your project, and that you have fun getting them.

Slide 86

Slide 86 text

Fin Thank you, and have a fantastic OSCON.

Slide 87

Slide 87 text

Credits ﬂickr.com/photos/aussiegall/286709039 ﬂickr.com/photos/bensutherland/252230820 The images in this presentation were used by permission under the terms of a Creative Commons license.