Slide 1

Slide 1 text

Open Development Analytics A Step Towards More Project Transparency Jesus M. Gonzalez-Barahona [email protected] @jgbarah Bitergia / LibreSoft (URJC) Linux Foundation Collaboration Summit Lake Tahoe (CA, USA), March 29th 2016 Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 1 / 60

Slide 2

Slide 2 text

Open Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 2 / 60

Slide 3

Slide 3 text

Software development http://xkcd.com/844/ Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 3 / 60

Slide 4

Slide 4 text

Analytics https://en.wikipedia.org/wiki/Charles_Joseph_Minard Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 4 / 60

Slide 5

Slide 5 text

Open Development Analytics Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 5 / 60

Slide 6

Slide 6 text

Structure of the presentation 1 A bit of context 2 Transparency and governance 3 A personal journey 4 Open development analytics 5 Who is contributing? 6 How are changes being reviewed? 7 Dependency 8 Dealing with issues? 9 Diversity 10 Bonus track Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 6 / 60

Slide 7

Slide 7 text

A bit of context Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 7 / 60

Slide 8

Slide 8 text

Me and my two hats Uni Rey Juan Carlos: LibreSoft research team Understanding free, open source software Data analytics approach Bitergia: From research to the real world Understanding software development Data analytics approach http://gsyc.es/~jgb Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 8 / 60

Slide 9

Slide 9 text

The company The software development analytics company dashboards reports consultancy ... http://bitergia.com Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 9 / 60

Slide 10

Slide 10 text

Transparency and governance Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 10 / 60

Slide 11

Slide 11 text

Who drives open software developoment? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 11 / 60

Slide 12

Slide 12 text

Who drives open software development A community Persons (and organizations) with common goals different interests Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 12 / 60

Slide 13

Slide 13 text

Working together Self-awareness Governance Transparency Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 13 / 60

Slide 14

Slide 14 text

Self-awareness Open development communities need to be self-aware data is the source for awareness... when it can be used for “sensing” The same applies to any open organization Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 14 / 60

Slide 15

Slide 15 text

Governance “Establishment of policies, and continuous monitoring of their proper implementation, by the members of the governing body of an organization. It includes the mechanisms required to balance the powers of the members (with the associated accountability), and their primary duty of enhancing the prosperity and viability of the organization.” http://businessdictionary.com Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 15 / 60

Slide 16

Slide 16 text

Governance “Establishment of policies, and continuous monitoring of their proper implementation, by the members of the governing body of an organization. It includes the mechanisms required to balance the powers of the members (with the associated accountability), and their primary duty of enhancing the prosperity and viability of the organization.” http://businessdictionary.com Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 16 / 60

Slide 17

Slide 17 text

Transparency It comes in two flavors Transparency to the community (fairness) Transparency to third parties (trust) Which for open organizations are kind of the same Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 17 / 60

Slide 18

Slide 18 text

Transparency Example of rationale (OpenStack): “OpenStack favors disclosure and transparency to promote sharing and collaboration within the OpenStack community” https://www.openstack.org/legal/transparency-policy/ Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 18 / 60

Slide 19

Slide 19 text

Transparency: showing the data is not enough Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 19 / 60

Slide 20

Slide 20 text

We need analytics (1) I’m a company investing heavily in a project. I’m hiring developers, supporting the foundation, sponsoring activities... Are my developers treated according to the policies? Are we getting integrated in the community? How do we compare with other companies of similar characteristics? Are we having reasonable metrics, according to the current stated policies and agreements? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 20 / 60

Slide 21

Slide 21 text

We need analytics (2) I’m an independent developer, devoting a large fraction of my time to this project. Are my initiatives being considered on fair terms? Are employees of other companies dealing with me the same way they do with their company colleagues? Am I considered based on my merits? Am I having reasonable metrics, according to the current stated policies? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 21 / 60

Slide 22

Slide 22 text

We need analytics (3) We’re a project foundation, interested in being neutral, inclusive, in making life easy to all contributors Are newcomers being treated as they should? Are we balancing the interests of companies and independent developers? Do we have subprojects which are outliers in terms of performance, inclusiveness, etc. Are the policies we put in place having some impact? Do metrics show our project is as we intended it to be? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 22 / 60

Slide 23

Slide 23 text

A personal journey Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 23 / 60

Slide 24

Slide 24 text

Counting lines of code http://www.dwheeler.com/sloc/redhat62-v1/redhat62sloc.html Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 24 / 60

Slide 25

Slide 25 text

Counting lines of code (revisited) http://gsyc.es/~mortuno/articulos/counting_potatoes.pdf Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 25 / 60

Slide 26

Slide 26 text

Key takeaways (2001) Interesting questions about software development can be answered with data There is a lot of value in doing it in the open!!! But: Data retrieval is not that easy We need FOSS tools We need comparable results We need visualizations Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 26 / 60

Slide 27

Slide 27 text

Open development analytics Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 27 / 60

Slide 28

Slide 28 text

A new dimension of openness When we develop in the open we produce a great deal of data about how we develop “Show me the development data” as a step beyond “show me the code” Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 28 / 60

Slide 29

Slide 29 text

From open development to open development analytics Information about code, community, development for open development projects can be retrieved, organized, analyzed Let’s publish analytics results & data Open Development Analytics: A new standard for transparency Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 29 / 60

Slide 30

Slide 30 text

Open development analytics Who may benefit? Developers Project managers Community managers Evaluators ... Anyone interested in the health of the project Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 30 / 60

Slide 31

Slide 31 text

Who may benefit? Slide used by Jim Zemlin at LF Collab 2016 Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 31 / 60

Slide 32

Slide 32 text

Some areas of interest Performance (understanding activity) Company participation (beyond copyright notices) Transparency (available information) Auditing (certify participation, experience, etc.) Profiling (key people, companies) Neutrality (fair treatment) Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 32 / 60

Slide 33

Slide 33 text

Who is contributing? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 33 / 60

Slide 34

Slide 34 text

The influence of companies In many projects, companies are main drivers They join forces to push the project... ...but they watch each other, look for balances They contribute money, resources... ...and direct development effort Having an accurate, transparent picture is very important! Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 34 / 60

Slide 35

Slide 35 text

Affiliation Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 35 / 60

Slide 36

Slide 36 text

How are changes being reviewed? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 36 / 60

Slide 37

Slide 37 text

Some reviewers are more equal than others http://blog.bitergia.com/2015/12/30/ some-developers-are-more-equal-than-others/ Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 37 / 60

Slide 38

Slide 38 text

Neutrality? q q q q q q q q 0 1 2 3 250 500 1000 2000 4000 Number of accepted reviews Iterations per accepted review (median) Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 38 / 60

Slide 39

Slide 39 text

Dependency Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 39 / 60

Slide 40

Slide 40 text

Apache Pony Factor In words of Daniel Gruno: We [the ASF] created a term we have coined “Pony Factor” (because ASF is full of ponies, or people who think they are ponies). Pony Factor (PF) shows the diversity of a project in terms of the division of labor among committers in a project. Pony Factor is determined as: “The lowest number of committers whose total contribution constitutes the majority of the codebase” https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/ Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 40 / 60

Slide 41

Slide 41 text

Bitergia Elephant Factor Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 41 / 60

Slide 42

Slide 42 text

Bitergia Elephant Factor Projects can benefit from powerful collaborations from companies (elephants). The elephant factor shows the diversity of a project in terms of the division of labor among companies (by mean of developers affiliated with them). Elephant factor is determined as: “The lowest number of companies whose total contribution (in commits by their employees) constitutes the majority of the commits” Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 42 / 60

Slide 43

Slide 43 text

Code “owned” “The land belongs to its workers” Emiliano Zapata Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 43 / 60

Slide 44

Slide 44 text

Code “owned” The code changes over time. The current version is “owned” by the people who produced it. The code “belongs” to those who wrote it. Zapata factor (work in progress): “The lowest number of developers for whom the total number of lines of code they “own” (were last touched by them) constitutes the majority of the lines of code” Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 44 / 60

Slide 45

Slide 45 text

Code “owned” The code “belongs” to companies who employ developers changing it. United Fruit factor (work in progress): “The lowest number of companies for whom the total number of lines of code they “own” (were last touched by their employees) constitutes the majority of the lines of code” Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 45 / 60

Slide 46

Slide 46 text

Pony / elephant factors for some projects Pony Factor Elephant Factor Commits (excl bots) OpenNebula 4 1 12K Eucalyptus 5 1 25K CloudStack 14 1 42K OpenStack >100 6 126K CloudFoundry 41 1 60K OpenShift 10 1 15K Docker 15 1 18K Kubernetes 12 1 7K [July 2015] Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 46 / 60

Slide 47

Slide 47 text

Dealing with issues? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 47 / 60

Slide 48

Slide 48 text

Issues may be processed not as intended Policy (or recommendations) may mandate transitions but are they real? Time to close when same company reporting / fixing? Time to close for external bug reports? Time to close depending on who reports? Who opens tickets that nobody cares about? Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 48 / 60

Slide 49

Slide 49 text

Ej: The “mandated” changes of state Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 49 / 60

Slide 50

Slide 50 text

The real changes of state Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 50 / 60

Slide 51

Slide 51 text

Diversity Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 51 / 60

Slide 52

Slide 52 text

Geography Geographical diversity is difficult to assess Companies can keep detailed records, but open communties are different Fortunately, some tools leave traces... This allows for better knowledge ...and better tracking of initiatives Example: policies to enlarge the number of developers in XXX region Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 52 / 60

Slide 53

Slide 53 text

Geography: time zones in git records Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 53 / 60

Slide 54

Slide 54 text

Geography: GitHub profiles Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 54 / 60

Slide 55

Slide 55 text

Gender: Analyzing by name Current situation of gender imbalance in OpenStack Gender Developers Commmits Commits/devel Female 750 14,647 19.5 Male 4,632 207,112 44.7 Only names with more than 80% of certainty. [Work in progress, preliminary results] Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 55 / 60

Slide 56

Slide 56 text

Bonus track Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 56 / 60

Slide 57

Slide 57 text

Preview: the new Kibana-based dashboards http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 57 / 60

Slide 58

Slide 58 text

License c 2016 Bitergia Some rights reserved. This presentation is distributed under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 58 / 60

Slide 59

Slide 59 text

Credits (1) “Man With Two Hats” Statue by Henk Visch, located in Otawa, Canada Picture by Lezumbalaberenjena in Wikimedia Commons License: Public domain https://commons.wikimedia.org/wiki/File: Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg “Napoleon’s Russian campaign of 1812” Original by Charles Minard License: Public domain https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File: Minard.png “Aged Come In We’re Open” Picture by Czarina Alegre in Flickr License: Creative Commons Attribution 2.0 https://flic.kr/p/fjGamh Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 59 / 60

Slide 60

Slide 60 text

Credits (2) “Good code” Comic by Randall Munroe, XKCD 844 License: Creative Commons Attribution-NonCommercial 2.5 http://xkcd.com/844/ “Crowd at FOSDEM 2008” Picture by Jes´ us Corrius in Flickr Licenses: Creative Commmons Attribution 2.0 http://www.flickr.com/photos/jcorrius/2302302707/ “Elephant” Picture by ajoheyho License: Creative Commons Public Domain https://pixabay.com/en/elephant-african-bush-elephant-114543/ “Emiliano Zapata” License: Public Domain Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics March 2016 60 / 60