Slide 1

Slide 1 text

Software Development Analytics Workshop Jesus M. Gonzalez-Barahona [email protected] @jgbarah Bitergia / LibreSoft (URJC) http://bit.ly/sda-workshop-1 Software Development Analytics Workshop Brussels (Belgium), January 29st 2016 Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 1 / 58

Slide 2

Slide 2 text

c 2016 Bitergia Some rights reserved. This presentation is distributed under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 2 / 58

Slide 3

Slide 3 text

Structure of the presentation 1 A bit of context 2 Dealing with dynamic complexity 3 Sources of information 4 The community 5 The processes 6 Real cases 7 Final remarks Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 3 / 58

Slide 4

Slide 4 text

A bit of context Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 4 / 58

Slide 5

Slide 5 text

The company The software development analytics company dashboards reports consultancy ... http://bitergia.com Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 5 / 58

Slide 6

Slide 6 text

The people Jesus M. Gonzalez-Barahona: Co-founder of Bitergia Researcher at URJC @jgbarah Daniel Izquierdo: Co-founder, data analyst at Bitergia @dizquierdo Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 6 / 58

Slide 7

Slide 7 text

The workshop Topic: software development analytics Approach: Information available (data sources) Analytics for development communities Analytics for development processes Real cases from the real world Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 7 / 58

Slide 8

Slide 8 text

The book Evaluating FOSS Projects: Work in progress Free / open book Fork and play! https://jgbarah.gitbooks.io/evaluating-foss-projects/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 8 / 58

Slide 9

Slide 9 text

Dealing with dynamic complexity Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 9 / 58

Slide 10

Slide 10 text

Projects may be large and complex [Crowd at FOSDEM 2008, by Jes´ us Corrius, CC Attribution 2.0] http://www.flickr.com/photos/jcorrius/2302302707/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 10 / 58

Slide 11

Slide 11 text

Projects may be large and complex... and dynamic It’s difficult to... ...track what’s happening ...understand why it’s happening ...react quickly ...evaluate results of reaction If data is available analytics may come to the rescue Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 11 / 58

Slide 12

Slide 12 text

A continuous process Figure out your interest Find out available data Define key parameters Monitor, understand, detect deviations Act to correct, improve Track results Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 12 / 58

Slide 13

Slide 13 text

A continuous process (example) Case: company-led development community Interest: activity Data: changes to code, tickets Parameters: commits, tickets closed Monitoring: charts, numbers Observation: numbers falling down Actions: allocate more developer effort Track results... Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 13 / 58

Slide 14

Slide 14 text

Sources of information Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 14 / 58

Slide 15

Slide 15 text

Repositories, repositories, repositories Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 15 / 58

Slide 16

Slide 16 text

Source code management Centralized or client/server: CVS, Subversion Decentralized: git, Mercurial, Bazaar, etc. Today: most of them accessible through git... but not always the information is what appears to be (eg: branches in Subversion and git) Can be integrated with other tools: Gerrit, GitHub, etc. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 16 / 58

Slide 17

Slide 17 text

Source code management (example: git) git show --pretty=fuller commit 364f67f13b0046c0a0a688b30a1341ff9946ac26 Author: Santiago Due~ nas AuthorDate: Fri Oct 11 12:55:44 2013 +0200 Commit: Santiago Due~ nas CommitDate: Fri Oct 11 12:55:44 2013 +0200 [db] Add author’s commit date Some SCMs, like Git, make a distinction between the dates the committer and author pushed the changes. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 17 / 58

Slide 18

Slide 18 text

Source code management (example: git diff) diff --git a/pycvsanaly2/DBContentHandler.py b/pycvsanaly2/DBC index 579e103..4b0066d 100644 --- a/pycvsanaly2/DBContentHandler.py +++ b/pycvsanaly2/DBContentHandler.py @@ -149,7 +149,7 @@ class DBContentHandler (ContentHandler): self.actions = [] profiler_stop ("Inserting actions for repository if self.commits: - commits = [(c.repository_id) for c in self.commit + commits = [(c.author_date, c.repository_id) for c profiler_start ("Inserting commits for repository cursor.executemany (statement (DBLog.__insert__, self.commits = [] Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 18 / 58

Slide 19

Slide 19 text

Source code management (git: important notes) You only see commits that were committed commits may still sit in “children” repos History can be rewritten forces to retrieve full history once and again Commits may change while moving from repo to repo eg: rebasing Authors, committers may not be what they should eg: merges via web interface in GitHub Dates for commits are usually in developer’s time: allows for time zone analysis Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 19 / 58

Slide 20

Slide 20 text

Issue tracking Many different systems: Bugzilla Jira GitHub issues Phabricator RedMine Trac ... Each with a different model, data, operations... Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 20 / 58

Slide 21

Slide 21 text

Issue tracking (Bugzilla workflow) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 21 / 58

Slide 22

Slide 22 text

Issue tracking (tickets) Usual information in a ticket Identifier. Summary. Description. Opening date. Ticker opener. Ticket asignee. Priority. State. Then you have: State changes Comments Attachments Related info ... Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 22 / 58

Slide 23

Slide 23 text

Issue tracking (Bugzilla ticket) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 23 / 58

Slide 24

Slide 24 text

Issue tracking (GitHub issue) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 24 / 58

Slide 25

Slide 25 text

Issue tracking (important notes) Many different uses: Bug reports Feature requests Design Policy discussions ... Different models make comparison difficult The responsiveness of a project is related to how it deals with tickets Tickets are a communication tool too Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 25 / 58

Slide 26

Slide 26 text

Code review More and more projects using it Usually: peer review pre-merge change review Different methods: Mailing lists (eg: Linux) Gerrit (eg: OpenStack) GitHub pull requests (eg: ElasticSearch) Usually, references to tickets and commits Much of the control on the software lies here Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 26 / 58

Slide 27

Slide 27 text

Code review (Gerrit) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 27 / 58

Slide 28

Slide 28 text

Code review (GitHub Pull Requests) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 28 / 58

Slide 29

Slide 29 text

Asynchronous communication Mailing lists: Mailing lists systems (Mailman) Google Groups Mailing list archivers (Gmane) Forums: too many to mention Question/Answer sites: StackOverflow, Askbot Information is always archived Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 29 / 58

Slide 30

Slide 30 text

Asynchronous communication (notes) Many projects: “if it didn’t happen in ACS, it didn’t happen” May be difficult to mine: no easily downloadable archives policies of no scrapping Privacy issues: mangling email addresses Forums: diversity forces many retrieval tools Question/answer sites: can be massive Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 30 / 58

Slide 31

Slide 31 text

Synchronous communication Systems: Traditionally: IRC Nowadays: Slack & many others Not always text/based (eg: videoconferences) Notes: In many cases, lack of archives Privacy concerns: considered informal communication Difficult to track identities Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 31 / 58

Slide 32

Slide 32 text

The community Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 32 / 58

Slide 33

Slide 33 text

The many communities Development community: all repositories Contributing community: issue tracking, async communication User community: async communication, ... Ecosystem community: difficult to track Software may include beacons: tracking usage Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 33 / 58

Slide 34

Slide 34 text

Tracking activity Many different aspects of activity: committing patches: source code management system reporting, commenting or fixing bugs: issue tracking system submitting patches or reviewing them: code review system sending messages: async or sync communication systems Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 34 / 58

Slide 35

Slide 35 text

Tracking activity (most common cases) Parameters reflecting activity for a certain period. People active for a certain period. Evolution of any of them. Trends for any of them. Difficult to compare between projects Interesting to compare inside project (different subprojects, different time frames) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 35 / 58

Slide 36

Slide 36 text

Tracking activity (many facets) http://activity.openstack.org Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 36 / 58

Slide 37

Slide 37 text

Tracking activity (many facets) http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 37 / 58

Slide 38

Slide 38 text

The many identities of anyone The repository level. The repository kind level. The project level. The global level. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 38 / 58

Slide 39

Slide 39 text

The aging chart Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 39 / 58

Slide 40

Slide 40 text

Geographical information (time zones) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 40 / 58

Slide 41

Slide 41 text

Geographical information (GitHub profiles) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 41 / 58

Slide 42

Slide 42 text

Affiliation Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 42 / 58

Slide 43

Slide 43 text

Diversity: Apache Pony Factor In words of Daniel Gruno: We [the ASF] created a term we have coined “Pony Factor” (because ASF is full of ponies, or people who think they are ponies). Pony Factor (PF) shows the diversity of a project in terms of the division of labor among committers in a project. Pony Factor is determined as: “The lowest number of committers whose total contribution constitutes the majority of the codebase” https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/ Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 43 / 58

Slide 44

Slide 44 text

Diversity: Bitergia Elephant Factor Projects can benefit from powerful collaborations from companies (elephants). The elephant factor shows the diversity of a project in terms of the division of labor among companies (by mean of developers affiliated with them). Elephant factor is determined as: “The lowest number of companies whose total contribution (in commits by their employees) constitutes the majority of the commits” Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 44 / 58

Slide 45

Slide 45 text

Diversity: some projects Pony Factor Elephant Factor Commits (excl bots) OpenNebula 4 1 12K Eucalyptus 5 1 25K CloudStack 14 1 42K OpenStack >100 6 126K CloudFoundry 41 1 60K OpenShift 10 1 15K Docker 15 1 18K Kubernetes 12 1 7K Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 45 / 58

Slide 46

Slide 46 text

The processes Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 46 / 58

Slide 47

Slide 47 text

Backlog (evolution over time) Example: pending (not abandoned, not merged) code reviews. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 47 / 58

Slide 48

Slide 48 text

Backlog (evolution over time) Example: pending (not abandoned, not merged) code reviews by age. Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 48 / 58

Slide 49

Slide 49 text

Efficiency Example: closed / opened tickets per quarter Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 49 / 58

Slide 50

Slide 50 text

Tickets Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 50 / 58

Slide 51

Slide 51 text

Code review (time to merge) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 51 / 58

Slide 52

Slide 52 text

Code review (time to merge, metrics) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 52 / 58

Slide 53

Slide 53 text

Code review (time to merge, evolution) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 53 / 58

Slide 54

Slide 54 text

Code review (number of versions per review) Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 54 / 58

Slide 55

Slide 55 text

Real cases Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 55 / 58

Slide 56

Slide 56 text

Final remarks Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 56 / 58

Slide 57

Slide 57 text

Summary You cannot improve what you cannot measure Fortunately, you can measure a lot of things... Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 57 / 58

Slide 58

Slide 58 text

Preview: the new Kibana-based dashboards http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Software Development Analytics Workshop Jan 2016 58 / 58