Slide 1

Slide 1 text

Software Development Analytics Collaboration as Health Indicator OSS Summit Latin America 2022 Daniel Izquierdo Cortázar Miguel Ángel Fernández

Slide 2

Slide 2 text

Analytics Specialist & Consultant @ Bitergia CEO @ Bitergia Governing Board @ CHAOSS VP @ InnerSource Commons Foundation

Slide 3

Slide 3 text

Collaboration (from Latin com- "with" + laborare "to labor", "to work") is the process of two or more people, entities or organizations working together to complete a task or achieve a goal. Wikipedia dixit

Slide 4

Slide 4 text

Welcome to Open Source Communities!

Slide 5

Slide 5 text

What does look like collaboration in open source projects?

Slide 6

Slide 6 text

Data mining process and visualizations powered by GrimoireLab, a CHAOSS project

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Dots are developers Squares are repositories Edge exists if a developer has contributed to a repository

Slide 9

Slide 9 text

Some visualization highlights across communities

Slide 10

Slide 10 text

Collaboration vs Isolated Projects

Slide 11

Slide 11 text

‘Continent’ communities vs Archipelago

Slide 12

Slide 12 text

1 Project Developer vs Many Projects Developer

Slide 13

Slide 13 text

High density areas vs lighter ones

Slide 14

Slide 14 text

Knowledge silos, continent communities

Slide 15

Slide 15 text

Organizational Diversity, areas of expertise

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Projects are interconnected Organizations and developers engage at different levels OSS projects without contributors are dead OSS projects without collaboration are dead What is a healthy collaboration?

Slide 18

Slide 18 text

Let’s try to formalize this discussion Let’s see if we can measure collaboration

Slide 19

Slide 19 text

What is a graph? Representation of a network as a set of connected elements

Slide 20

Slide 20 text

Creating collaboration networks (I)

Slide 21

Slide 21 text

Creating collaboration networks (II)

Slide 22

Slide 22 text

How to measure collaboration from a network? Which properties of the network can help us to measure collaboration? Which metrics should we consider?

Slide 23

Slide 23 text

Applying Graph theory: Network properties Adjacency Two nodes are adjacent if there is an edge between them. Two edges are adjacent if they share one of their ends. Degree The degree of a node is the number of connections that it has to other nodes in the network. Connectivity A node is reachable from another node if there is a path between them. A graph is connected if there is a path for every pair of nodes in the graph.

Slide 24

Slide 24 text

Applying Graph theory: Centrality metrics Betweenness centrality A way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another.

Slide 25

Slide 25 text

Analyzing a real network (I) Contributors Projects Degree The amount of connections from a Contributor node indicates they collaborate in many projects. Connectivity A highly-connected network indicates a more collaborative community. Adjacency Contributor nodes sharing edges to Project nodes indicate collaboration among these people.

Slide 26

Slide 26 text

Analyzing a real network (II) Contributors Projects Betweenness Centrality Finding the contributors connected to a greater number of projects help us find the people acting as bridges in the community.

Slide 27

Slide 27 text

Collaborating to define Collaboration

Slide 28

Slide 28 text

Community Health Analytics for Open Source Software https://chaoss.community

Slide 29

Slide 29 text

Metrics Software Implementation agnostic community development metrics OSS Tools to Analyze (OSS) Software Development Projects Certain Intersection

Slide 30

Slide 30 text

Metrics Implementation agnostic community development metrics Work in Progress @ Metrics Models Working Group Join #wg-metrics-models @ CHAOSS Slack

Slide 31

Slide 31 text

Software OSS Tools to Analyze (OSS) Software Development Projects https://chaoss.github.io/grimoirelab/ Raw data Identities DB Enriched data Incremental datasets Historical data Focus on data, not on mining processes OSS metrics lake Metrics ready for consumption 30+ Data sources

Slide 32

Slide 32 text

Extra Collaboration Metrics in Action [by Bitergia]

Slide 33

Slide 33 text

https://innersourceportal.santander.com Bitergia in Action: Santander InnerSource Metrics

Slide 34

Slide 34 text

From Art to Science: The Evolution of Community Development. Diane Mueller and Daniel Izquierdo. IEEE Software Volume: 36, Issue: 6, Nov.-Dec. 2019 https://www.cncf.io/blog/2020/08/04/a-guide-to-untan gling-the-cncf-cross-community-relationships/ “Scaling management skills by 10x thanks to data insights” Discover developer interrelations, onboard newcomers faster, and align project expectations and releases. Bitergia in Action: Red Hat and CNCF

Slide 35

Slide 35 text

https://report.mozilla.community/ “[...] holistic view of our contributor ecosystem’s network structure, health and impact [...]” “[...] we’re able to visually describe these distinct contributor communities as well as how they are interconnected [...]” Bitergia in Action: Mozilla Rebel Alliance

Slide 36

Slide 36 text

Daniel Izquierdo Cortázar Miguel Ángel Fernández Email [email protected] [email protected] Contact Us