$30 off During Our Annual Pro Sale. View Details »

Metrics for Large Software Development Teams

Metrics for Large Software Development Teams

Talk at the Metrics Day, at Chalmers University of Technology, Gothenburg, November 10th 2016.

Jesus M. Gonzalez-Barahona

November 09, 2016
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Metrics for Large Software Development Teams
    Jesus M. Gonzalez-Barahona
    [email protected] @jgbarah
    Bitergia / LibreSoft (URJC)
    http://speakerdeck.com/jgbarah/
    Metrics Day at Chalmers University of Technology
    Gothenburg (Sweden), November 10th 2016
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 1 / 69

    View Slide

  2. Structure of the presentation
    1 A bit of context
    2 Dealing with dynamic complexity
    3 Sources of information
    4 Activity / size
    5 Remaining code
    6 Performance
    7 Demographics
    8 Diversity in FOSS development
    9 GrimoireLab: tools for software development analytics
    10 Final remarks
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 2 / 69

    View Slide

  3. A bit of context
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 3 / 69

    View Slide

  4. Me and my two hats
    Uni Rey Juan Carlos:
    LibreSoft research team
    Understanding free, open source software
    Data analytics approach
    Bitergia:
    From research to the real world
    Understanding software development
    Data analytics approach
    http://gsyc.es/~jgb
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 4 / 69

    View Slide

  5. The company
    The software development analytics company
    dashboards
    reports
    consultancy
    ...
    http://bitergia.com
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 5 / 69

    View Slide

  6. The book
    Evaluating FOSS Projects:
    Work in progress
    Free / open book
    Fork and play!
    https://jgbarah.gitbooks.io/evaluating-foss-projects/
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 6 / 69

    View Slide

  7. Recommendations
    Open your laptop
    Download the slides (they have links)
    Visit Cauldron.io and produce your own dashboard
    Play with the dashboards
    Understand the interpretations behind the numbers
    http://cauldron.io
    Code: OWL2016
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 7 / 69

    View Slide

  8. The Cauldron
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 8 / 69

    View Slide

  9. Example: OPNFV dashboard
    http://opnfv.biterg.io
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 9 / 69

    View Slide

  10. Dealing with dynamic
    complexity
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 10 / 69

    View Slide

  11. Development projects may be large and complex
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 11 / 69

    View Slide

  12. Projects may be large and complex... and dynamic
    It’s difficult to...
    ...track what’s happening
    ...understand why it’s happening
    ...react quickly
    ...evaluate results of reaction
    If data is available
    analytics may come to the rescue
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 12 / 69

    View Slide

  13. A continuous process
    Figure out your interest
    Find out available data
    Define key parameters
    Monitor, understand, detect deviations
    Act to correct, improve
    Track results
    Measure → Monitor → Act
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 13 / 69

    View Slide

  14. A continuous process (example)
    Case: Overall development activity
    Interest: activity
    Data: changes to code, tickets
    Parameters: commits, tickets closed
    Monitoring: charts, numbers
    Observation: numbers declining
    Action: allocate more developer effort
    Track results...
    Measure → Monitor → Act
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 14 / 69

    View Slide

  15. Sources of information
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 15 / 69

    View Slide

  16. Repositories, repositories, repositories
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 16 / 69

    View Slide

  17. Source code management
    Centralized or client/server: CVS, Subversion
    Decentralized: git, Mercurial, Bazaar, etc.
    Today: most of them accessible through git...
    but not always the information is what appears to be
    (eg: branches in Subversion and git)
    Can be integrated with other tools:
    Gerrit, GitHub, etc.
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 17 / 69

    View Slide

  18. Issue tracking
    Many different systems:
    Bugzilla
    Jira
    GitHub issues
    Phabricator
    RedMine
    Trac ...
    Each with a different model, data, operations...
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 18 / 69

    View Slide

  19. Code review
    More and more projects using it
    Usually: peer review pre-merge change review
    Different methods:
    Mailing lists (eg: Linux)
    Gerrit (eg: OpenStack)
    GitHub pull requests (eg: ElasticSearch)
    or even Jira, Bugzilla...
    Usually, references to tickets and commits
    Much of the control on the software lies here
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 19 / 69

    View Slide

  20. Asynchronous communication
    Mailing lists:
    Mailing lists systems (Mailman)
    Google Groups
    Mailing list archivers (Gmane)
    Forums: too many to mention
    Question/Answer sites: StackOverflow, Askbot
    Information is always archived
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 20 / 69

    View Slide

  21. Synchronous communication
    Systems:
    Traditionally: IRC
    Nowadays: Slack & many others
    Not always text/based (eg: videoconferences)
    Notes:
    In many cases, lack of archives
    Privacy concerns: considered informal communication
    Difficult to track identities
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 21 / 69

    View Slide

  22. Tracking involved parties
    Development is much more than developers
    (this is explicit in FOSS & inner sourcing)
    Developers: all repositories
    Contributors: issue tracking, async communication
    Users: async communication, ...
    Ecosystem: difficult to track
    Software may include beacons: tracking usage
    Needed: tracking identities in different data sources
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 22 / 69

    View Slide

  23. Activity / size
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 23 / 69

    View Slide

  24. Activity / size
    Many different aspects of activity:
    committing patches:
    source code management system
    reporting, commenting or fixing bugs:
    issue tracking system
    submitting patches or reviewing them:
    code review system
    sending messages:
    async or sync communication systems
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 24 / 69

    View Slide

  25. Activity / size (most common cases)
    Parameters reflecting activity for a certain period.
    People active for a certain period.
    Evolution of any of them.
    Trends for any of them.
    Difficult to compare between projects
    Interesting to compare inside project
    (different subprojects, different time frames)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 25 / 69

    View Slide

  26. Activity / size (many facets)
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 26 / 69

    View Slide

  27. Activity / size (many facets)
    http://s.bitergia.com/db-fosdem16
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 27 / 69

    View Slide

  28. Remaining code
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 28 / 69

    View Slide

  29. How old is code
    [Linux kernel, July 2016, lines in C files by age]
    http://linux.biterg.io
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 29 / 69

    View Slide

  30. How old is code (2)
    [Linux kernel, July 2016, C files by last commit]
    http://linux.biterg.io
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 30 / 69

    View Slide

  31. How old is code (3)
    [Linux kernel, July 2016, C files by first remaining
    commit]
    http://linux.biterg.io
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 31 / 69

    View Slide

  32. How old is code? drivers/net in Linux
    Age of lines (data of authorship, “.c” files)
    From top left, clockwise: Wireless, USB, IRDA Ethernet
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 32 / 69

    View Slide

  33. Performance
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 33 / 69

    View Slide

  34. Backlog (evolution over time)
    Example: backlog of open issues.
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 34 / 69

    View Slide

  35. Efficiency
    Example: closed / opened tickets per quarter
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 35 / 69

    View Slide

  36. Tickets
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 36 / 69

    View Slide

  37. Code review (time to merge)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 37 / 69

    View Slide

  38. Code review (time to merge, metrics)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 38 / 69

    View Slide

  39. Code review (time to merge, evolution)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 39 / 69

    View Slide

  40. Code review (number of versions per review)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 40 / 69

    View Slide

  41. The complete coding process
    From idea to implementation
    Story, design
    Ticket(s)
    Code review
    Automated testing
    Commit in code base
    The OpenStack case
    Blueprint (if feature), Launchpad
    Ticket (bug, feature), Launchpad
    Code review, Gerrit
    Automated testing, Jenkins
    Commit in code base, Gerrit, Git
    Similar cases: GitHub, GitLab, Atlassian
    Requires discipline in the developing team
    Requires enough traces in the repositories
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 41 / 69

    View Slide

  42. Demographics
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 42 / 69

    View Slide

  43. The many identities of anyone
    The repository level.
    The class of repository level.
    The project level.
    The global level.
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 43 / 69

    View Slide

  44. Demographics: The aging chart
    Attraction Retention
    Newcomers Expertise
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 44 / 69

    View Slide

  45. Demographics: Contributors funnel
    Communities of volunteers
    “Peripheral”: activities
    (questions, reporting bugs)
    Small contributions: answers,
    bug fixes change proposals
    Core: design, feature
    implementation, bug fixes
    Inner source
    Questions, reports, etc. in
    public
    (no more coffee machine
    meetings)
    Moving to develop: answers,
    bug fixes change proposals
    Core: design, feature
    implementation, bug fixes,
    mentorship
    Finding traces, visualizing career evolution
    Assessments & forecasts of available expertise
    Identification of success stories
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 45 / 69

    View Slide

  46. Demographics: Mentorship
    Helping newcomers, helping people from other areas
    Usually linked to bug fixing and code review
    Who is helping others to improve their skills?
    Who are benefiting more from the help of others?
    Who are newcomers, and who of them are not
    receiving mentorship?
    When a newcomer may convert into mentor?
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 46 / 69

    View Slide

  47. Diversity in FOSS
    development
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 47 / 69

    View Slide

  48. Diversity: geographical information (time zones)
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 48 / 69

    View Slide

  49. Diversity: geographical information (GitHub profiles)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 49 / 69

    View Slide

  50. Diversity: affiliation
    http://s.bitergia.com/db-fosdem16
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 50 / 69

    View Slide

  51. Diversity: Apache Pony Factor
    In words of Daniel Gruno:
    We [the ASF] created a term we have coined
    “Pony Factor” (because ASF is full of ponies, or
    people who think they are ponies). Pony Factor
    (PF) shows the diversity of a project in terms of
    the division of labor among committers in a
    project.
    Pony Factor is determined as:
    “The lowest number of committers whose
    total contribution constitutes the majority of
    the codebase”
    https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 51 / 69

    View Slide

  52. Diversity: Bitergia Elephant Factor
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 52 / 69

    View Slide

  53. Diversity: Bitergia Elephant Factor
    Projects can benefit from powerful collaborations
    from companies (elephants). The elephant factor
    shows the diversity of a project in terms of the
    division of labor among companies (by mean of
    developers affiliated with them).
    Elephant factor is determined as:
    “The lowest number of companies whose
    total contribution (in commits by their
    employees) constitutes the majority of the
    commits”
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 53 / 69

    View Slide

  54. Diversity: some projects
    Pony Factor Elephant Factor Commits (excl bots)
    OpenNebula 4 1 12K
    Eucalyptus 5 1 25K
    CloudStack 14 1 42K
    OpenStack >100 6 126K
    CloudFoundry 41 1 60K
    OpenShift 10 1 15K
    Docker 15 1 18K
    Kubernetes 12 1 7K
    [Circa May 2016]
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 54 / 69

    View Slide

  55. Diversity: Code “owned”
    “The land belongs
    to its workers”
    Emiliano Zapata
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 55 / 69

    View Slide

  56. Diversity: Code “owned”
    The code changes over time. The current version is
    “owned” by the people who produced it.
    The code “belongs” to those who wrote it.
    Zapata factor (work in progress):
    “The lowest number of developers for whom
    the total number of lines of code they “own”
    (were last touched by them) constitutes the
    majority of the lines of code”
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 56 / 69

    View Slide

  57. Diversity: Code “owned”
    [Linux kernel, July 2016, Zapata factor: 200]
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 57 / 69

    View Slide

  58. Diversity: Code “owned”
    The code “belongs” to companies who employ
    developers changing it.
    United Fruit factor (work in progress):
    “The lowest number of companies for whom
    the total number of lines of code they “own”
    (were last touched by their employees)
    constitutes the majority of the lines of code”
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 58 / 69

    View Slide

  59. Diversity: Gender gap
    Commits by women: 6.8% (4 Kcommits)
    Women: 9.9% (330 developers)
    Linux kernel, Nov 2015 – Oct 2016
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 59 / 69

    View Slide

  60. GrimoireLab: tools for
    software development
    analytics
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 60 / 69

    View Slide

  61. GrimoireLab
    http://grimoirelab.github.io
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 61 / 69

    View Slide

  62. GrimoireLab
    Perceval: data retrieval
    Arthur: retrieval orchestration
    GelK: enrichment
    SortingHat: identity management
    ElasticSearch (*): database for storing everything
    Kibiter: dashboard (light fork of Kibana)
    Panels: visualizations for Kibiter
    http://grimoirelab.github.io
    (*) Not a part of GrimoireLab
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 62 / 69

    View Slide

  63. GrimoireLab
    http://grimoirelab.github.io
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 63 / 69

    View Slide

  64. Final remarks
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 64 / 69

    View Slide

  65. Room for improvement
    Many other aspects... explore your own
    Refine what is important
    Explore new ways of making data useful
    Tell interesting stories based on data
    Visualization is very important
    Higher-order metrics
    Simplify results, make them meaningful
    Can we characterize many aspects with
    a small set of metrics?
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 65 / 69

    View Slide

  66. Summary
    You cannot improve
    what you cannot measure
    Fortunately, you can measure a lot of things...
    http://bitergia.com
    http://grimoirelab.github.io
    http://speakerdeck.com/jgbarah
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 66 / 69

    View Slide

  67. A moment for a commercial: Join us at MSR 2017!!
    http://2017.msrconf.org
    14th International
    Conference on
    Mining Software
    Repositories
    Co-located with ICSE
    Buenos Aires, Argentina
    Save the dates:
    May 20-21 2017
    Start the conversation!!!
    #msr17
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 67 / 69

    View Slide

  68. c 2016 Bitergia
    Some rights reserved. This presentation is distributed under the
    “Attribution-ShareAlike 3.0” license, by Creative Commons, available at
    http://creativecommons.org/licenses/by-sa/3.0/
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 68 / 69

    View Slide

  69. Credits (1)
    “Man With Two Hats”
    Statue by Henk Visch, located in Otawa, Canada
    Picture by Lezumbalaberenjena in Wikimedia Commons
    License: Public domain
    https://commons.wikimedia.org/wiki/File:
    Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg
    “Crowd at FOSDEM 2008”
    by Jes´
    us Corrius
    License: CC Attribution 2.0
    http://www.flickr.com/photos/jcorrius/2302302707/
    “Emiliano Zapata”
    License: Public Domain
    Jesus Gonzalez-Barahona (Bitergia) Metrics for Large Software Development Teams Metrics Day 2016 69 / 69

    View Slide