$30 off During Our Annual Pro Sale. View Details »

Authorship dashboard for the Linux Kernel

Authorship dashboard for the Linux Kernel

Talk at LinuxCon 2016, Toronto (unfortunately, I couldn't deliver the talk, but these I the slides I intended to use).

Every single line of code in the Linux kernel has an author. We have built a dashboard to learn and drill down on who wrote what. The talk will present that dashboard, and will show how it can be produced using open source technologies (Python tools, ElasticSearch, Kibana, etc.)

Jesus M. Gonzalez-Barahona

August 22, 2016
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Authorship dashboard for the Linux Kernel
    Jesus M. Gonzalez-Barahona
    jgb @ bitergia.com @jgbarah
    Bitergia
    http://speakerdeck.com/jgbarah
    LinuxCon North America
    Toronto (Canada), August 22-24, 2016
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 1 / 32

    View Slide

  2. Structure of the presentation
    1 A bit of context
    2 The Linux development history dashboard
    3 Exploring the Linux history (some examples)
    4 How to build your own dashboard
    5 Final remarks
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 2 / 32

    View Slide

  3. A bit of context
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 3 / 32

    View Slide

  4. Me and my circumstances
    Jesus M. Gonzalez-Barahona:
    Researcher for many years at
    Uni. Rey Juan Carlos
    Understanding free, open
    source software development
    Now collaborating with, and
    co-founder of Bitergia
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 4 / 32

    View Slide

  5. The company
    The software development analytics company
    dashboards
    reports
    consultancy
    ...
    http://bitergia.com
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 5 / 32

    View Slide

  6. The Linux development
    history dashboard
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 6 / 32

    View Slide

  7. Having data is not like understanding data
    https://en.wikipedia.org/wiki/Charles_Joseph_Minard
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 7 / 32

    View Slide

  8. What’s in the dashboard
    All commits in the current Linux (Torvalds’) git repo
    (git log)
    Git and Demographics
    All lines in the current master HEAD
    with details about when they were introduced
    (git blame)
    Git Blame, Git Blame (Charts), and Git Blame (Files)
    http://linux.biterg.io
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 8 / 32

    View Slide

  9. Git Blame panel
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 9 / 32

    View Slide

  10. Git Blame panel
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 10 / 32

    View Slide

  11. Git Blame (Charts) panel
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 11 / 32

    View Slide

  12. Git Blame (Files) panel
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 12 / 32

    View Slide

  13. Git Blame (Files) panel (2)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 13 / 32

    View Slide

  14. How to use the dashboard
    Click almost anywhere to apply the corresponding
    filter
    Eg: click on an author to filter their activity
    Interact with filters (green / red buttons that appear
    on the top)
    Select dates on the top right
    Change layout by dragging & resizing widgets
    Use the icon below the date selector to share
    (includes current filters and layout)
    Panels are customized Kibana dashboards:
    https://www.elastic.co/guide/en/kibana/4.4/dashboard.html
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 14 / 32

    View Slide

  15. Exploring the Linux history
    (some examples)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 15 / 32

    View Slide

  16. How old is the code?
    Age of lines (date of authorship, “.c” files)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 16 / 32

    View Slide

  17. How old is the code? (files by last changed)
    Files by last change (date of authorship)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 17 / 32

    View Slide

  18. How old is the code? (files by first change)
    Files by first remaining change (date of authorship, “.c” files)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 18 / 32

    View Slide

  19. How old is the code? drivers/net
    Age of lines (data of authorship, “.c” files)
    From top left, clockwise: Wireless, USB, IRDA Ethernet
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 19 / 32

    View Slide

  20. Code “owned”
    “The land belongs
    to its workers”
    Emiliano Zapata
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 20 / 32

    View Slide

  21. Code “owned” (authors of remaining code)
    Top authors, by number of lines (since 2002)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 21 / 32

    View Slide

  22. Code “owned” (authors of remaining code)
    Top authors, by number of snippets (since 2002)
    About 200 to make for 50% of snippets
    (Snippet: piece of a file changed in a single commit)
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 22 / 32

    View Slide

  23. Where do developers work?
    Number of lines by time zone
    (lines remaining in current kernel, by time of authorship)
    Top: lines from 2009, bottom: 2016
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 23 / 32

    View Slide

  24. How to build your own
    dashboard
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 24 / 32

    View Slide

  25. Ingredients
    A git repository with all Linux history
    (up to having a meaninful git blame output)
    Some scripts based on GrimoireLab
    to analyze it and produce data for the dashboard
    Python to run those scripts
    ElasticSearch to store the data
    Kibitter / Kibana to produce the dashboard
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 25 / 32

    View Slide

  26. Reconstructing Linux development history
    From Dave Jones: 0.01 to 2.4.0
    From Thomas Gleixner: 2.4.0 to 2.6.12-rc2
    From Torvalds’ Linux repo: 2.6.12 to now
    All put together by Yoann Padioleau
    and later Rob Landley
    https://landley.net/kdocs/fullhist/
    Available for your enjoyment (updated up to now)
    https://github.com/history-repos/linux/
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 26 / 32

    View Slide

  27. Analyzing the repository with Git Blame
    Git Blame backend for Perceval
    Still not merged upstream, clone from
    http://github.com/jgbarah/perceval (gitblame
    branch)
    Ad-hoc script to produce Kibana indexes
    https:
    //github.com/jgbarah/blameanalysis.git
    Assuming Linux git repo in ~/linux:
    blame_analysis.py --repodir ~/linux --store linux-store \
    --processed linux-processed --uploaded linux-uploaded \
    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
    --es_url elasticsearch_url
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 27 / 32

    View Slide

  28. Producing the dashboard
    Starting from a standard Bitergia Analytics Dashboard
    for the Linux git repository
    (built with GrimoireLab tools)
    http://grimoirelab.github.io
    Some new panels (Kibana dashboards) for git blame
    Git Blame
    Git Blame (Charts)
    Git Blame (Files)
    http://linux.biterg.io
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 28 / 32

    View Slide

  29. Final remarks
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 29 / 32

    View Slide

  30. A complete dashboard for Linux code history
    25 years of Linux development history
    This is what the current kernel is made of
    Play with the data, explore it!
    Our contribution to
    Linux 25th anniversary
    Dashboard: http://linux.biterg.io
    Slides: http://speakerdeck.com/jgbarah
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 30 / 32

    View Slide

  31. License
    c 2016 Bitergia
    Some rights reserved.
    This presentation is distributed under the
    “Attribution-ShareAlike 3.0” license, by Creative Commons,
    available at
    http://creativecommons.org/licenses/by-sa/3.0/
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 31 / 32

    View Slide

  32. Credits
    “Napoleon’s Russian campaign of 1812”
    Original by Charles Minard
    License: Public domain
    https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File:
    Minard.png
    “Emiliano Zapata”
    License: Public Domain
    Jesus M. Gonzalez-Barahona (Bitergia) Authorship dashboard for Linux Aug 2016 32 / 32

    View Slide