Authorship dashboard for the Linux Kernel

Authorship dashboard for the Linux Kernel

Talk at LinuxCon 2016, Toronto (unfortunately, I couldn't deliver the talk, but these I the slides I intended to use).

Every single line of code in the Linux kernel has an author. We have built a dashboard to learn and drill down on who wrote what. The talk will present that dashboard, and will show how it can be produced using open source technologies (Python tools, ElasticSearch, Kibana, etc.)

B7081d0131ad47821467b8e81434cf7a?s=128

Jesus M. Gonzalez-Barahona

August 22, 2016
Tweet

Transcript

  1. Authorship dashboard for the Linux Kernel Jesus M. Gonzalez-Barahona jgb

    @ bitergia.com @jgbarah Bitergia http://speakerdeck.com/jgbarah LinuxCon North America Toronto (Canada), August 22-24, 2016 Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 1 / 32
  2. Structure of the presentation 1 A bit of context 2

    The Linux development history dashboard 3 Exploring the Linux history (some examples) 4 How to build your own dashboard 5 Final remarks Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 2 / 32
  3. A bit of context Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard

    for Linux Aug 2016 3 / 32
  4. Me and my circumstances Jesus M. Gonzalez-Barahona: Researcher for many

    years at Uni. Rey Juan Carlos Understanding free, open source software development Now collaborating with, and co-founder of Bitergia Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 4 / 32
  5. The company The software development analytics company dashboards reports consultancy

    ... http://bitergia.com Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 5 / 32
  6. The Linux development history dashboard Jesus M. Gonzalez-Barahona (Bitergia) Authorship

    dashboard for Linux Aug 2016 6 / 32
  7. Having data is not like understanding data https://en.wikipedia.org/wiki/Charles_Joseph_Minard Jesus M.

    Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 7 / 32
  8. What’s in the dashboard All commits in the current Linux

    (Torvalds’) git repo (git log) Git and Demographics All lines in the current master HEAD with details about when they were introduced (git blame) Git Blame, Git Blame (Charts), and Git Blame (Files) http://linux.biterg.io Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 8 / 32
  9. Git Blame panel Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for

    Linux Aug 2016 9 / 32
  10. Git Blame panel Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for

    Linux Aug 2016 10 / 32
  11. Git Blame (Charts) panel Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard

    for Linux Aug 2016 11 / 32
  12. Git Blame (Files) panel Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard

    for Linux Aug 2016 12 / 32
  13. Git Blame (Files) panel (2) Jesus M. Gonzalez-Barahona (Bitergia) Authorship

    dashboard for Linux Aug 2016 13 / 32
  14. How to use the dashboard Click almost anywhere to apply

    the corresponding filter Eg: click on an author to filter their activity Interact with filters (green / red buttons that appear on the top) Select dates on the top right Change layout by dragging & resizing widgets Use the icon below the date selector to share (includes current filters and layout) Panels are customized Kibana dashboards: https://www.elastic.co/guide/en/kibana/4.4/dashboard.html Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 14 / 32
  15. Exploring the Linux history (some examples) Jesus M. Gonzalez-Barahona (Bitergia)

    Authorship dashboard for Linux Aug 2016 15 / 32
  16. How old is the code? Age of lines (date of

    authorship, “.c” files) Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 16 / 32
  17. How old is the code? (files by last changed) Files

    by last change (date of authorship) Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 17 / 32
  18. How old is the code? (files by first change) Files

    by first remaining change (date of authorship, “.c” files) Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 18 / 32
  19. How old is the code? drivers/net Age of lines (data

    of authorship, “.c” files) From top left, clockwise: Wireless, USB, IRDA Ethernet Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 19 / 32
  20. Code “owned” “The land belongs to its workers” Emiliano Zapata

    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 20 / 32
  21. Code “owned” (authors of remaining code) Top authors, by number

    of lines (since 2002) Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 21 / 32
  22. Code “owned” (authors of remaining code) Top authors, by number

    of snippets (since 2002) About 200 to make for 50% of snippets (Snippet: piece of a file changed in a single commit) Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 22 / 32
  23. Where do developers work? Number of lines by time zone

    (lines remaining in current kernel, by time of authorship) Top: lines from 2009, bottom: 2016 Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 23 / 32
  24. How to build your own dashboard Jesus M. Gonzalez-Barahona (Bitergia)

    Authorship dashboard for Linux Aug 2016 24 / 32
  25. Ingredients A git repository with all Linux history (up to

    having a meaninful git blame output) Some scripts based on GrimoireLab to analyze it and produce data for the dashboard Python to run those scripts ElasticSearch to store the data Kibitter / Kibana to produce the dashboard Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 25 / 32
  26. Reconstructing Linux development history From Dave Jones: 0.01 to 2.4.0

    From Thomas Gleixner: 2.4.0 to 2.6.12-rc2 From Torvalds’ Linux repo: 2.6.12 to now All put together by Yoann Padioleau and later Rob Landley https://landley.net/kdocs/fullhist/ Available for your enjoyment (updated up to now) https://github.com/history-repos/linux/ Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 26 / 32
  27. Analyzing the repository with Git Blame Git Blame backend for

    Perceval Still not merged upstream, clone from http://github.com/jgbarah/perceval (gitblame branch) Ad-hoc script to produce Kibana indexes https: //github.com/jgbarah/blameanalysis.git Assuming Linux git repo in ~/linux: blame_analysis.py --repodir ~/linux --store linux-store \ --processed linux-processed --uploaded linux-uploaded \ git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux --es_url elasticsearch_url Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 27 / 32
  28. Producing the dashboard Starting from a standard Bitergia Analytics Dashboard

    for the Linux git repository (built with GrimoireLab tools) http://grimoirelab.github.io Some new panels (Kibana dashboards) for git blame Git Blame Git Blame (Charts) Git Blame (Files) http://linux.biterg.io Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 28 / 32
  29. Final remarks Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux

    Aug 2016 29 / 32
  30. A complete dashboard for Linux code history 25 years of

    Linux development history This is what the current kernel is made of Play with the data, explore it! Our contribution to Linux 25th anniversary Dashboard: http://linux.biterg.io Slides: http://speakerdeck.com/jgbarah Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 30 / 32
  31. License c 2016 Bitergia Some rights reserved. This presentation is

    distributed under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 31 / 32
  32. Credits “Napoleon’s Russian campaign of 1812” Original by Charles Minard

    License: Public domain https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File: Minard.png “Emiliano Zapata” License: Public Domain Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 32 / 32