What metrics will help developers?
- Louis Suárez-Potts @luispo
These are notes. All mistakes of thought and typing are therefore to be excused!
I wanted to explore the differences in consumer vs. producer communities. They are distinct and different but also share traits. The metrics used for the consumer communities can be of use in the producer ones, but only to a point. Commodity exchange is mostly quanti fiable as are the discursive contexts summarized by the term “social media.”
Intellectual production is less so, as we look not just to the obvious, a bug resolved, a patch submitted, a branch created, but to the invisible: the quality of the code (do others have to rewrite it?), the deep value of a submission (does it lead to more, or not?), the community impact (does it expand the community interestingly?).
We are all aware that corporate managers—or, really, any investor anxious about the merits of her investment—want metrics that justify the risk. They usually want to know if the community, however defined, is “healthy”--a comprehensive term—and perhaps more important, producing something that can be taken to market—and which will also satisfy the user's expectations.
Of course, the same anxieties attend traditional, closed source environments. But in a corporation, the developer is paid to produce, and though everyone recognizes that money alone doesn't guarantee genius, it does make for predictability and regularity, which for most investors, is enough. In an open source community, however, that predictability is not guaranteed, as it may be that the crucial developers are not paid by the company most wanting their work. In fact, they may not be paid at all and they could
be doing what they do simply because—and pick your reason.
The importance of identifying the reason—or more accurately, reasons--for engaging the outside developer and then retaining her interest would seem to go without saying. The dif ficulty lies in that for most developers and also other contributors, there may be a kaleidoscope of reasons. Reasons include economic interests, intellectual satisfaction, social engagement, and the perceived moral or ethical satisfaction of working on an open source project, a kind of social good. Then there is the by-now common fact that many are paid to work on a project by their company, which bene fits from the resulting collaboration.
Given the complex, and also interdependent, nature of the community weave, measuring the health of the community entails more than checking its pulse or pace of activity. Not all good work is done fast anddespite every effort to keep activity transparent, work is nevertheless done off-list, and for all that, can be quite good. We all have different habits of work, and though one of the tasks of an open source manager is to educate the community on the project's customs of collaboration, still we operate out of our own
context, and if my group happens to work fast on a brilliant new component and does this work of fline, it's hard to gripe.
But if the result is opaque to the larger community? If it may work today but then my group disbands and no one can add to it tomorrow, let alone update it? There's a point to having everything on the list (or equivalent public forum) and in following the accepted community protocols. It helps others carry on what you began. But the obligation to follow the rules also counters aspects of the egotistical in open source which are frankly often transformative and even—but not always—usefully productive.
Identifying these meteors is important; as is encouraging their activity. But if their primary motivation is egotistical, then it presents a problem for the community as a whole. For one, such developers tend to be unpredictable. For another, almost by de finition, their work can cost the community more than its worth.
Knowing, then, not just the most brilliant stars in the community but also their relations to others, and where the weaknesses, as well as the strengths, lie leads to a community that lasts.
Metrics tracking activity can tell the manager (or anyone else; no reason this should be private) who is doing what and with whom on the project. But what would help the developer, star or not? What information about the others making up the project, as well as the architecture and nature of the code, would help the developer in her efforts?
Part of the answer depends on the developers in question and on their company's engagement. (Anxious companies keep their IP close and expose only as much as they dare. This anxiety affects community dynamics, in that few wish to contribute to a project if they are only perceived as free labor to be exploited. Mature meta-projects today seek to normalize corporate self-interest, and institute mechanisms that level the field.)
Briefly, some categories that would help developers and other contributors both in contributing code and in furthering the community. (An open source project is not reducible to its license or code and must include the productive community.)
• Task difficulty: Let's suppose we can quantify levels of dif ficulty for any given task within a project: Of use to neophytes and mentors.
• But is the effort, perhaps only initial, worth the result? (A cost/bene fit analysis particular to the project.) If the quanti fication can be automated, then it may prove useful in onboarding, as the identi fication of more or less easy-to do tasks is a chore for experienced developers. (And we do not want to slog experienced developers with chores.)
• In fact, charting the mentoring effort can be useful and making that data available to all even more so. The developer, ideally, is not only interested in the code and what it does. She isalso interested in extending the community making the code. Learning how effective mentoring is, would clearly help. But what even counts as “mentoring” is in question. For brevity, it's the teaching of the architecture and style of the code and the collaborative protocols of the project. No one wants to spend much, if any, time on this. But it's necessary,especially for large projects, though early ones pretended it was not.
• What methods of mentoring are most ef ficient? Broadly, two models: a classroom or a tutorial. (A third model, documentation without a tutor, is not generally effective.) The
classroom model is cheap; the tutorial, not. But it is seen as more valuable in the end, at least for the very talented developer.
• But does that help the community over all? Data that re flects these differences would be useful, and not just for the manager; for the contributor, too. It can help any contributor interested in expanding the reach of the community.
• Activity: This refers to work on particular a) issues but also b) extant branches and c) new branches or modules. From the manager's perspective, knowing these data is important and can help justify the project's continuance. From the developer's view, finding out that a branch that seems interesting actually has been abandoned (for whatever reason) or has a record of unresolved issues can give pause. Similarly a popular area may be fun for the developer, but it
could be popular just because it's easy and hardly a challenge.
• Who is active and interesting? If a star developer (however de fined) starts a new branch or subproject and she is not egotistical (or even if she is), she is likely to attract others, provided they can follow her. (Protocols try to make that so.) But how is a developer, especially a new one, to know who is a star? (The same problem, incidentally, exists in academia.) The developer can look to the lists, to the code, to blogs and other media, social or not, and presumably, they have
done this, anyway, and it was reason they came to the project: because there are interesting people. If the project supports it, contributors can also look at charts indexing contributions by contributor, and also the trajectory of those contributions.
• Value: Are my contributions valued, and by whom? Or just ignored? In large projects, a manager may extol a contributor for her effort and be public about it but if her peers ignore her work or dismiss it, that will matter more. By the same token, if they take up her work and use it, that is likely to encourage her more than any interview, though it may be because of the interview that her peers become aware of her contributions.
• Future. Without money, without the means of getting it, a project's members will likely go elsewhere. Data that tracks the finances of the project and also that can point to sources can make the difference in a project.
In drafting these notes I used several sources. One was Stack Over flow, another the dissertation work of Mekki MacAulay Abdelwahab, whose work on the determinants of success in open source uses the Mozilla BugZilla database. I also used my own experiences with OpenOf fice and Apache, and as aconsultant with Age of Peers, where I've had the opportunity to work with much smaller and even more anxious organizations.