Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Software development analytics workshop (Austin 2016)

Software development analytics workshop (Austin 2016)

Workshop sponsored by Bitergia, focused on how to measure free, open source projects, and how these metrics can produce valuable information for different professional profiles. Austin, TX, USA, May 16th 2016.

Transcript

  1. Software Development Analytics Workshop Part 1 Jesus M. Gonzalez-Barahona [email protected]

    @jgbarah Bitergia / LibreSoft (URJC) http://speakerdeck.com/jgbarah/ Software Development Analytics Workshop Austin (Texas, USA), May 16th 2016 Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 1 / 57
  2. c 2016 Bitergia Some rights reserved. This presentation is distributed

    under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 2 / 57
  3. Structure of the presentation 1 A bit of context 2

    Dealing with dynamic complexity 3 Sources of information 4 The community 5 The processes 6 Final remarks Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 3 / 57
  4. The company The software development analytics company dashboards reports consultancy

    ... http://bitergia.com Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 5 / 57
  5. The people Jesus M. Gonzalez-Barahona: Co-founder of Bitergia Researcher at

    URJC @jgbarah Daniel Izquierdo: Co-founder, data analyst at Bitergia @dizquierdo Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 6 / 57
  6. The workshop Topic: software development analytics Approach: Information available (data

    sources) Analytics for development communities Analytics for development processes Real cases from the real world Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 7 / 57
  7. The book Evaluating FOSS Projects: Work in progress Free /

    open book Fork and play! https://jgbarah.gitbooks.io/evaluating-foss-projects/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 8 / 57
  8. Recommendations Open your laptop Download the slides (they have links)

    Visit Cauldron.io and produce your own dashboard Play with the dashboards Understand the interpretations behind the numbers Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 9 / 57
  9. Projects may be large and complex [Crowd at FOSDEM 2008,

    by Jes´ us Corrius, CC Attribution 2.0] http://www.flickr.com/photos/jcorrius/2302302707/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 12 / 57
  10. Projects may be large and complex... and dynamic It’s difficult

    to... ...track what’s happening ...understand why it’s happening ...react quickly ...evaluate results of reaction If data is available analytics may come to the rescue Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 13 / 57
  11. A continuous process Figure out your interest Find out available

    data Define key parameters Monitor, understand, detect deviations Act to correct, improve Track results Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 14 / 57
  12. A continuous process (example) Case: company-led development community Interest: activity

    Data: changes to code, tickets Parameters: commits, tickets closed Monitoring: charts, numbers Observation: numbers falling down Actions: allocate more developer effort Track results... Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 15 / 57
  13. Source code management Centralized or client/server: CVS, Subversion Decentralized: git,

    Mercurial, Bazaar, etc. Today: most of them accessible through git... but not always the information is what appears to be (eg: branches in Subversion and git) Can be integrated with other tools: Gerrit, GitHub, etc. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 18 / 57
  14. Source code management (example: git) git show --pretty=fuller commit 364f67f13b0046c0a0a688b30a1341ff9946ac26

    Author: Santiago Due~ nas <[email protected]> AuthorDate: Fri Oct 11 12:55:44 2013 +0200 Commit: Santiago Due~ nas <[email protected]> CommitDate: Fri Oct 11 12:55:44 2013 +0200 [db] Add author’s commit date Some SCMs, like Git, make a distinction between the dates the committer and author pushed the changes. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 19 / 57
  15. Source code management (example: git diff) diff --git a/pycvsanaly2/DBContentHandler.py b/pycvsanaly2/DBC

    index 579e103..4b0066d 100644 --- a/pycvsanaly2/DBContentHandler.py +++ b/pycvsanaly2/DBContentHandler.py @@ -149,7 +149,7 @@ class DBContentHandler (ContentHandler): self.actions = [] profiler_stop ("Inserting actions for repository if self.commits: - commits = [(c.repository_id) for c in self.commit + commits = [(c.author_date, c.repository_id) for c profiler_start ("Inserting commits for repository cursor.executemany (statement (DBLog.__insert__, self.commits = [] Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 20 / 57
  16. Source code management (git: important notes) You only see commits

    that were committed commits may still sit in “children” repos History can be rewritten forces to retrieve full history once and again Commits may change while moving from repo to repo eg: rebasing Authors, committers may not be what they should eg: merges via web interface in GitHub Dates for commits are usually in developer’s time: allows for time zone analysis Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 21 / 57
  17. Issue tracking Many different systems: Bugzilla Jira GitHub issues Phabricator

    RedMine Trac ... Each with a different model, data, operations... Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 22 / 57
  18. Issue tracking (tickets) Usual information in a ticket Identifier. Summary.

    Description. Opening date. Ticker opener. Ticket asignee. Priority. State. Then you have: State changes Comments Attachments Related info ... Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 24 / 57
  19. Issue tracking (important notes) Many different uses: Bug reports Feature

    requests Design Policy discussions ... Different models make comparison difficult The responsiveness of a project is related to how it deals with tickets Tickets are a communication tool too Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 27 / 57
  20. Code review More and more projects using it Usually: peer

    review pre-merge change review Different methods: Mailing lists (eg: Linux) Gerrit (eg: OpenStack) GitHub pull requests (eg: ElasticSearch) Usually, references to tickets and commits Much of the control on the software lies here Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 28 / 57
  21. Asynchronous communication Mailing lists: Mailing lists systems (Mailman) Google Groups

    Mailing list archivers (Gmane) Forums: too many to mention Question/Answer sites: StackOverflow, Askbot Information is always archived Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 31 / 57
  22. Asynchronous communication (notes) Many projects: “if it didn’t happen in

    ACS, it didn’t happen” May be difficult to mine: no easily downloadable archives policies of no scrapping Privacy issues: mangling email addresses Forums: diversity forces many retrieval tools Question/answer sites: can be massive Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 32 / 57
  23. Synchronous communication Systems: Traditionally: IRC Nowadays: Slack & many others

    Not always text/based (eg: videoconferences) Notes: In many cases, lack of archives Privacy concerns: considered informal communication Difficult to track identities Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 33 / 57
  24. The many communities Development community: all repositories Contributing community: issue

    tracking, async communication User community: async communication, ... Ecosystem community: difficult to track Software may include beacons: tracking usage Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 35 / 57
  25. Tracking activity Many different aspects of activity: committing patches: source

    code management system reporting, commenting or fixing bugs: issue tracking system submitting patches or reviewing them: code review system sending messages: async or sync communication systems Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 36 / 57
  26. Tracking activity (most common cases) Parameters reflecting activity for a

    certain period. People active for a certain period. Evolution of any of them. Trends for any of them. Difficult to compare between projects Interesting to compare inside project (different subprojects, different time frames) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 37 / 57
  27. The many identities of anyone The repository level. The repository

    kind level. The project level. The global level. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 40 / 57
  28. Diversity: Apache Pony Factor In words of Daniel Gruno: We

    [the ASF] created a term we have coined “Pony Factor” (because ASF is full of ponies, or people who think they are ponies). Pony Factor (PF) shows the diversity of a project in terms of the division of labor among committers in a project. Pony Factor is determined as: “The lowest number of committers whose total contribution constitutes the majority of the codebase” https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 45 / 57
  29. Diversity: Bitergia Elephant Factor Projects can benefit from powerful collaborations

    from companies (elephants). The elephant factor shows the diversity of a project in terms of the division of labor among companies (by mean of developers affiliated with them). Elephant factor is determined as: “The lowest number of companies whose total contribution (in commits by their employees) constitutes the majority of the commits” Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 46 / 57
  30. Diversity: some projects Pony Factor Elephant Factor Commits (excl bots)

    OpenNebula 4 1 12K Eucalyptus 5 1 25K CloudStack 14 1 42K OpenStack >100 6 126K CloudFoundry 41 1 60K OpenShift 10 1 15K Docker 15 1 18K Kubernetes 12 1 7K Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 47 / 57
  31. Backlog (evolution over time) Example: backlog of open issues. http://cauldron.io/dashboards/elastic

    Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 49 / 57
  32. Efficiency Example: closed / opened tickets per quarter Jesus Gonzalez-Barahona

    (Bitergia) Software Development Analytics Workshop May 2016 50 / 57
  33. Code review (number of versions per review) Jesus Gonzalez-Barahona (Bitergia)

    Software Development Analytics Workshop May 2016 55 / 57
  34. Summary You cannot improve what you cannot measure Fortunately, you

    can measure a lot of things... Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop May 2016 57 / 57