Slide 1

Slide 1 text

Metrics for Large Software Development Teams Jesus M. Gonzalez-Barahona jgb@bitergia.com @jgbarah Bitergia / LibreSoft (URJC) http://speakerdeck.com/jgbarah/ Metrics Day at Chalmers University of Technology Gothenburg (Sweden), November 10th 2016 Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 1 / 69

Slide 2

Slide 2 text

Structure of the presentation 1 A bit of context 2 Dealing with dynamic complexity 3 Sources of information 4 Activity / size 5 Remaining code 6 Performance 7 Demographics 8 Diversity in FOSS development 9 GrimoireLab: tools for software development analytics 10 Final remarks Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 2 / 69

Slide 3

Slide 3 text

A bit of context Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 3 / 69

Slide 4

Slide 4 text

Me and my two hats Uni Rey Juan Carlos: LibreSoft research team Understanding free, open source software Data analytics approach Bitergia: From research to the real world Understanding software development Data analytics approach http://gsyc.es/~jgb Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 4 / 69

Slide 5

Slide 5 text

The company The software development analytics company dashboards reports consultancy ... http://bitergia.com Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 5 / 69

Slide 6

Slide 6 text

The book Evaluating FOSS Projects: Work in progress Free / open book Fork and play! https://jgbarah.gitbooks.io/evaluating-foss-projects/ Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 6 / 69

Slide 7

Slide 7 text

Recommendations Open your laptop Download the slides (they have links) Visit Cauldron.io and produce your own dashboard Play with the dashboards Understand the interpretations behind the numbers http://cauldron.io Code: OWL2016 Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 7 / 69

Slide 8

Slide 8 text

The Cauldron http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 8 / 69

Slide 9

Slide 9 text

Example: OPNFV dashboard http://opnfv.biterg.io Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 9 / 69

Slide 10

Slide 10 text

Dealing with dynamic complexity Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 10 / 69

Slide 11

Slide 11 text

Development projects may be large and complex Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 11 / 69

Slide 12

Slide 12 text

Projects may be large and complex... and dynamic It’s difficult to... ...track what’s happening ...understand why it’s happening ...react quickly ...evaluate results of reaction If data is available analytics may come to the rescue Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 12 / 69

Slide 13

Slide 13 text

A continuous process Figure out your interest Find out available data Define key parameters Monitor, understand, detect deviations Act to correct, improve Track results Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 13 / 69

Slide 14

Slide 14 text

A continuous process (example) Case: Overall development activity Interest: activity Data: changes to code, tickets Parameters: commits, tickets closed Monitoring: charts, numbers Observation: numbers declining Action: allocate more developer effort Track results... Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 14 / 69

Slide 15

Slide 15 text

Sources of information Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 15 / 69

Slide 16

Slide 16 text

Repositories, repositories, repositories Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 16 / 69

Slide 17

Slide 17 text

Source code management Centralized or client/server: CVS, Subversion Decentralized: git, Mercurial, Bazaar, etc. Today: most of them accessible through git... but not always the information is what appears to be (eg: branches in Subversion and git) Can be integrated with other tools: Gerrit, GitHub, etc. Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 17 / 69

Slide 18

Slide 18 text

Issue tracking Many different systems: Bugzilla Jira GitHub issues Phabricator RedMine Trac ... Each with a different model, data, operations... Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 18 / 69

Slide 19

Slide 19 text

Code review More and more projects using it Usually: peer review pre-merge change review Different methods: Mailing lists (eg: Linux) Gerrit (eg: OpenStack) GitHub pull requests (eg: ElasticSearch) or even Jira, Bugzilla... Usually, references to tickets and commits Much of the control on the software lies here Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 19 / 69

Slide 20

Slide 20 text

Asynchronous communication Mailing lists: Mailing lists systems (Mailman) Google Groups Mailing list archivers (Gmane) Forums: too many to mention Question/Answer sites: StackOverflow, Askbot Information is always archived Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 20 / 69

Slide 21

Slide 21 text

Synchronous communication Systems: Traditionally: IRC Nowadays: Slack & many others Not always text/based (eg: videoconferences) Notes: In many cases, lack of archives Privacy concerns: considered informal communication Difficult to track identities Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 21 / 69

Slide 22

Slide 22 text

Tracking involved parties Development is much more than developers (this is explicit in FOSS & inner sourcing) Developers: all repositories Contributors: issue tracking, async communication Users: async communication, ... Ecosystem: difficult to track Software may include beacons: tracking usage Needed: tracking identities in different data sources Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 22 / 69

Slide 23

Slide 23 text

Activity / size Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 23 / 69

Slide 24

Slide 24 text

Activity / size Many different aspects of activity: committing patches: source code management system reporting, commenting or fixing bugs: issue tracking system submitting patches or reviewing them: code review system sending messages: async or sync communication systems Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 24 / 69

Slide 25

Slide 25 text

Activity / size (most common cases) Parameters reflecting activity for a certain period. People active for a certain period. Evolution of any of them. Trends for any of them. Difficult to compare between projects Interesting to compare inside project (different subprojects, different time frames) Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 25 / 69

Slide 26

Slide 26 text

Activity / size (many facets) http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 26 / 69

Slide 27

Slide 27 text

Activity / size (many facets) http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 27 / 69

Slide 28

Slide 28 text

Remaining code Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 28 / 69

Slide 29

Slide 29 text

How old is code [Linux kernel, July 2016, lines in C files by age] http://linux.biterg.io Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 29 / 69

Slide 30

Slide 30 text

How old is code (2) [Linux kernel, July 2016, C files by last commit] http://linux.biterg.io Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 30 / 69

Slide 31

Slide 31 text

How old is code (3) [Linux kernel, July 2016, C files by first remaining commit] http://linux.biterg.io Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 31 / 69

Slide 32

Slide 32 text

How old is code? drivers/net in Linux Age of lines (data of authorship, “.c” files) From top left, clockwise: Wireless, USB, IRDA Ethernet Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 32 / 69

Slide 33

Slide 33 text

Performance Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 33 / 69

Slide 34

Slide 34 text

Backlog (evolution over time) Example: backlog of open issues. http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 34 / 69

Slide 35

Slide 35 text

Efficiency Example: closed / opened tickets per quarter Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 35 / 69

Slide 36

Slide 36 text

Tickets Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 36 / 69

Slide 37

Slide 37 text

Code review (time to merge) Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 37 / 69

Slide 38

Slide 38 text

Code review (time to merge, metrics) Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 38 / 69

Slide 39

Slide 39 text

Code review (time to merge, evolution) Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 39 / 69

Slide 40

Slide 40 text

Code review (number of versions per review) Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 40 / 69

Slide 41

Slide 41 text

The complete coding process From idea to implementation Story, design Ticket(s) Code review Automated testing Commit in code base The OpenStack case Blueprint (if feature), Launchpad Ticket (bug, feature), Launchpad Code review, Gerrit Automated testing, Jenkins Commit in code base, Gerrit, Git Similar cases: GitHub, GitLab, Atlassian Requires discipline in the developing team Requires enough traces in the repositories Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 41 / 69

Slide 42

Slide 42 text

Demographics Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 42 / 69

Slide 43

Slide 43 text

The many identities of anyone The repository level. The class of repository level. The project level. The global level. Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 43 / 69

Slide 44

Slide 44 text

Demographics: The aging chart Attraction Retention Newcomers Expertise Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 44 / 69

Slide 45

Slide 45 text

Demographics: Contributors funnel Communities of volunteers “Peripheral”: activities (questions, reporting bugs) Small contributions: answers, bug fixes change proposals Core: design, feature implementation, bug fixes Inner source Questions, reports, etc. in public (no more coffee machine meetings) Moving to develop: answers, bug fixes change proposals Core: design, feature implementation, bug fixes, mentorship Finding traces, visualizing career evolution Assessments & forecasts of available expertise Identification of success stories Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 45 / 69

Slide 46

Slide 46 text

Demographics: Mentorship Helping newcomers, helping people from other areas Usually linked to bug fixing and code review Who is helping others to improve their skills? Who are benefiting more from the help of others? Who are newcomers, and who of them are not receiving mentorship? When a newcomer may convert into mentor? Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 46 / 69

Slide 47

Slide 47 text

Diversity in FOSS development Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 47 / 69

Slide 48

Slide 48 text

Diversity: geographical information (time zones) http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 48 / 69

Slide 49

Slide 49 text

Diversity: geographical information (GitHub profiles) Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 49 / 69

Slide 50

Slide 50 text

Diversity: affiliation http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 50 / 69

Slide 51

Slide 51 text

Diversity: Apache Pony Factor In words of Daniel Gruno: We [the ASF] created a term we have coined “Pony Factor” (because ASF is full of ponies, or people who think they are ponies). Pony Factor (PF) shows the diversity of a project in terms of the division of labor among committers in a project. Pony Factor is determined as: “The lowest number of committers whose total contribution constitutes the majority of the codebase” https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/ Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 51 / 69

Slide 52

Slide 52 text

Diversity: Bitergia Elephant Factor Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 52 / 69

Slide 53

Slide 53 text

Diversity: Bitergia Elephant Factor Projects can benefit from powerful collaborations from companies (elephants). The elephant factor shows the diversity of a project in terms of the division of labor among companies (by mean of developers affiliated with them). Elephant factor is determined as: “The lowest number of companies whose total contribution (in commits by their employees) constitutes the majority of the commits” Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 53 / 69

Slide 54

Slide 54 text

Diversity: some projects Pony Factor Elephant Factor Commits (excl bots) OpenNebula 4 1 12K Eucalyptus 5 1 25K CloudStack 14 1 42K OpenStack >100 6 126K CloudFoundry 41 1 60K OpenShift 10 1 15K Docker 15 1 18K Kubernetes 12 1 7K [Circa May 2016] Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 54 / 69

Slide 55

Slide 55 text

Diversity: Code “owned” “The land belongs to its workers” Emiliano Zapata Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 55 / 69

Slide 56

Slide 56 text

Diversity: Code “owned” The code changes over time. The current version is “owned” by the people who produced it. The code “belongs” to those who wrote it. Zapata factor (work in progress): “The lowest number of developers for whom the total number of lines of code they “own” (were last touched by them) constitutes the majority of the lines of code” Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 56 / 69

Slide 57

Slide 57 text

Diversity: Code “owned” [Linux kernel, July 2016, Zapata factor: 200] Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 57 / 69

Slide 58

Slide 58 text

Diversity: Code “owned” The code “belongs” to companies who employ developers changing it. United Fruit factor (work in progress): “The lowest number of companies for whom the total number of lines of code they “own” (were last touched by their employees) constitutes the majority of the lines of code” Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 58 / 69

Slide 59

Slide 59 text

Diversity: Gender gap Commits by women: 6.8% (4 Kcommits) Women: 9.9% (330 developers) Linux kernel, Nov 2015 – Oct 2016 Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 59 / 69

Slide 60

Slide 60 text

GrimoireLab: tools for software development analytics Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 60 / 69

Slide 61

Slide 61 text

GrimoireLab http://grimoirelab.github.io Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 61 / 69

Slide 62

Slide 62 text

GrimoireLab Perceval: data retrieval Arthur: retrieval orchestration GelK: enrichment SortingHat: identity management ElasticSearch (*): database for storing everything Kibiter: dashboard (light fork of Kibana) Panels: visualizations for Kibiter http://grimoirelab.github.io (*) Not a part of GrimoireLab Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 62 / 69

Slide 63

Slide 63 text

GrimoireLab http://grimoirelab.github.io Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 63 / 69

Slide 64

Slide 64 text

Final remarks Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 64 / 69

Slide 65

Slide 65 text

Room for improvement Many other aspects... explore your own Refine what is important Explore new ways of making data useful Tell interesting stories based on data Visualization is very important Higher-order metrics Simplify results, make them meaningful Can we characterize many aspects with a small set of metrics? Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 65 / 69

Slide 66

Slide 66 text

Summary You cannot improve what you cannot measure Fortunately, you can measure a lot of things... http://bitergia.com http://grimoirelab.github.io http://speakerdeck.com/jgbarah Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 66 / 69

Slide 67

Slide 67 text

A moment for a commercial: Join us at MSR 2017!! http://2017.msrconf.org 14th International Conference on Mining Software Repositories Co-located with ICSE Buenos Aires, Argentina Save the dates: May 20-21 2017 Start the conversation!!! #msr17 Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 67 / 69

Slide 68

Slide 68 text

c 2016 Bitergia Some rights reserved. This presentation is distributed under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 68 / 69

Slide 69

Slide 69 text

Credits (1) “Man With Two Hats” Statue by Henk Visch, located in Otawa, Canada Picture by Lezumbalaberenjena in Wikimedia Commons License: Public domain https://commons.wikimedia.org/wiki/File: Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg “Crowd at FOSDEM 2008” by Jes´ us Corrius License: CC Attribution 2.0 http://www.flickr.com/photos/jcorrius/2302302707/ “Emiliano Zapata” License: Public Domain Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 69 / 69