Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data, data and data about your favourite community: The Grimoire Toolset

Data, data and data about your favourite community: The Grimoire Toolset

Data, data and data about your favourite community, the Grimoire Toolset.

Bitergia

June 16, 2015
Tweet

More Decks by Bitergia

Other Decks in Programming

Transcript

  1. Data, data and data about your favourite community The Grimoire

    Toolset Daniel Izquierdo-Cortazar [email protected] http://twitter.com/dizquierdo Bitergia Madrid, June 16th, 2015 http://bitergia.com Daniel Izquierdo-Cortazar (Bitergia) Data, data and data about your favourite community Madrid Python Meetup 1 / 26
  2. c 2012-2015 Bitergia Some rights reserved. This presentation is distributed

    under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 2 / 26
  3. Why this talk? Massive data are produced by open source

    projects How can we take advantage of it? Is it useful to analyze my community? Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 3 / 26
  4. Index 1 Introduction 2 Architecture 3 Metrics and studies 4

    Some Code 5 Future Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 4 / 26
  5. The Grimoire Toolset: goals Open Source tools to analyze Open

    Source Projects Focus on development activity Understand how technical communities evolve and behave Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 5 / 26
  6. Index 1 Introduction 2 Architecture 3 Metrics and studies 4

    Some Code 5 Future Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 6 / 26
  7. The Grimoire Toolset: architecture Three main steps: Retrieval: from publicly

    available data sources -> MySQL ddbb Parsing and cleaning: MySQL ddbb -> JSON files (among other formats) Visualization: JSON files are viz, using JavaScript, CSS, HTML Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 7 / 26
  8. Retrieval: Metrics Grimoire Specialized tools to analyze development repositories Python

    2.7 based CVSAnalY for versioning systems such as Git Bicho for issue tracking systems such as Bugzilla or Jira Sibyl for question and answer forums such as Discourse or Askbot Pullpo for code review processes such as GitHub pull requests and others https://metricsgrimoire.github.io/ Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 8 / 26
  9. Parsing and cleaning: GrimoireLib Python 2.7 based Transparency db layer

    for Metrics Grimoire Reuse code: no need to create once and again the same queries Scalable and modular: a new metric is a new class https://github.com/VizGrimoire/GrimoireLib Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 9 / 26
  10. Visualization: VizGrimoireJS JavaScript library to visualize a JSON API Examples:

    OpenStack Foundation: http://activity.openstack.org/dash/browser/ Wikimedia Foundation: http://korma.wmflabs.org/browser/ Puppet Labs: http://bitergia.dev.puppetlabs.com/browser/ Red Hat - Ceph: http://metrics.ceph.com/ ’Dogfooding’: http://projects.bitergia.com/grimoire-dashboard/ Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 10 / 26
  11. Index 1 Introduction 2 Architecture 3 Metrics and studies 4

    Some Code 5 Future Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 11 / 26
  12. Metrics and studies overview: source code and code review Source

    code (git, svn, hg, etc): usual ones: commits, authors, files, added/removed lines, branches, companies, etc not so usual: demographics, timezone, developers characterization Code Review (gerrit, github): merges, abandoned, submitted patchsets or changesets, people, companies, etc time to close, time waiting for the submitter or reviewer Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 12 / 26
  13. Metrics and studies overview: communication channels: mailing lists, IRC, Q&A

    Mailing lists emails, people, companies, hot topics, time to first reply, emails initiating threads, those replying, unanswered posts, timezone analysis Question and answers (stackoverflow, askbot, discourse) top visited questions, labels, people, answers, comments, ... Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 13 / 26
  14. Metrics and studies overview: ticketing systems: Bugzilla, Jira, Launchpad, ...

    Tickets: opened and closed tickets, efficiency, time to close tickets time to attend Other data sources: Wikis, Downloads, Releases, Apache logs Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 14 / 26
  15. Available filters Filters: general: repository, company, domain, project, people source

    code: branch, module, file type, log message tickets: ticket type Examples: Commits (by company) (and by repo) (and by filetype) Time to close issues (by company) (and by tracker) Top companies (by project) Top developers (by project) (and by company) Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 15 / 26
  16. Index 1 Introduction 2 Architecture 3 Metrics and studies 4

    Some Code 5 Future Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 16 / 26
  17. Metrics main methods API: get agg: aggregated numbers get ts:

    evolutionary numbers get list: list of elements (eg authors) get trends: trends for a specific date during the last X days Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 17 / 26
  18. How to Import the needed libraries: # Database access from

    vizgrimoire.metrics.query_builder import SCMQuery # Filters to apply from vizgrimoire.metrics.metrics_filter import MetricFilters # Let’s start playing with git activity metrics import vizgrimoire.metrics.scm_metrics as scm Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 18 / 26
  19. How to Database access: # Instantiate database access # Playing

    with OpenStack source code database (MySQL) at # http://activity.openstack.org/dash/.../source_code.mysql.7z # Database named as openstack_source_code_fosdem2015 user = "root" password = "" source_code_db = "openstack_source_code_fosdem2015" identities_db = "openstack_source_code_fosdem2015" dbcon = SCMQuery(user, password, source_code_db, identities_db) Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 19 / 26
  20. How to Instantiate filters: # Instantiate some filters to play

    with period = MetricFilters.PERIOD_MONTH startdate = "’2014-01-01’" enddate = "’2015-01-01’" # basic filter filters = MetricFilters(period, startdate, enddate) # company and repo filter filters_r = MetricFilters(period, startdate, enddate) filters_r.add_filter(MetricFilters.COMPANY, "Red Hat") filters_r.add_filter(MetricFilters.REPOSITORY, "nova.git") Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 20 / 26
  21. How to Instantiate the metric you need # Retrieving data

    for each filter. # Let’s start with authors commits = scm.Commits(dbcon, filters) authors.get_agg() authors.get_ts() authors.get_list() authors.get_trends(filters.enddate, 7) Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 21 / 26
  22. Index 1 Introduction 2 Architecture 3 Metrics and studies 4

    Some Code 5 Future Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 22 / 26
  23. The future of Grimoire Cool access to database, but slow

    Currently playing with Pandas Addition of modules for reporting Work on documentation and testing Lower the entry barrier Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 23 / 26
  24. This is not the end You can use or contribute

    to *Grimoire Code and issues at: https://github.com/VizGrimoire/GrimoireLib IRC in Freenode at #metrics-grimoire Mailing list: https://lists.libresoft.es/ listinfo/metrics-grimoire Daniel Izquierdo-Cortazar (Bitergia) Software Development Data Madrid Python Meetup 24 / 26
  25. Data, data and data about your favourite community The Grimoire

    Toolset Daniel Izquierdo-Cortazar [email protected] http://twitter.com/dizquierdo Bitergia Madrid, June 16th, 2015 http://speakerdeck.com/bitergia Daniel Izquierdo-Cortazar (Bitergia) Data, data and data about your favourite community Madrid Python Meetup 2015 25 / 26