A dataset for pull based development research

43df3993acc9af4e9f619e59cd849aee?s=47 Georgios Gousios
May 31, 2014
540

A dataset for pull based development research

MSR 2014 best data paper award presentation

43df3993acc9af4e9f619e59cd849aee?s=128

Georgios Gousios

May 31, 2014
Tweet

Transcript

  1. A dataset for pull request research Georgios Gousios and Andy

    Zaidman @gousiosg
  2. None
  3. 900 projects

  4. 350,000 pull reqs

  5. 40 features! (patch size, code reviews, testing, social) lifetime_minutes mergetime_minutes

    num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touch test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers
  6. suite of tools in R! (manipulation, selection, machine learning)

  7. None
  8. Wed, Jun 4! 17:30! Hall 3 Software Engineering Research Group

    http://swerl.tudelft.nl/ Delft University of Technology A dataset for pull-based development research Georgios Gousios and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl Diamond Markus fuel mongo−python−driver openproject pentaho−reporting hazelcast DIRAC rails−i18n gxa hibernate−ogm celery homebrew−php www.gittip.com flask serverspec pybossa head cyder cython maraschino Vanilla Synapse−Repository−Services growstuff oozie Socorro−Tests RxJava ome−documentation s−ramp Terasology mrjob firefox−flicks onepercentclub−site virt−test zanata−server paperclip heroapp−LetsHire cylc cryptography Resteasy conductor culture−hub openengsb−framework pivotal_workstation PyBitmessage topaz linguist jbosstools−openshift sunpy XPrivacy DSpace rspec−rails middleman−guides shinken nagios github−services ncs_navigator_core katello−api django−cms cucumber−jvm veewee jbosstools−integration−tests retrofit metrics gaffer nrv scikit−image Silverpeas−Core mev mne−python infinispan unknown−horizons iris ra1stats lorsource scalding forma−clj marketplace−tests neutrino bosh modeshape hector mcom−tests refinerycms−blog spokenvote Sick−Beard pyon geotrellis ztltest jboss−eap−quickstarts fail2ban gatein−portal bioformats simple_form imagefactory goldrush frontend sequel metasploit−framework CocoaPods platform logstash scrapy docs SciELO−Manager sidekiq chef wakame−vdc ggrc−core hibernate−orm omniauth ckan stacktach sunspot nova brooklyn teiid−designer sentry grape missionhub playframework okhttp pry 24pullrequests progit origin−server OpenSlides scala−ide commcare−hq sbt data−access spree website phoenix Catacomb−Snatch summingbird ralph activerecord−jdbc−adapter rootpy jekyll pelican matplotlib geotools dynmap qiime Spout cobbler compass pyramid tatami biopython hornetq gatling rails_admin raven−python reddit lims−core basho_docs liferay−portal asciidoctor SalesforceMobileSDK−Android the−blue−alliance vumi−go tire META−SHARE GravityBox emesene web2py developer.github.com rubocop bitHopper dagger configuration ninja−ide ruby fog vumi rudder chiliproject netty bedrock zf2−documentation play active_merchant ka−lite HiggsAnalysis−HiggsToTauTau ursula youtube−dl buildbot nltk pyes Addon−Tests tools grails−core psychopy erpnext karma−exchange alaveteli totalfinder−i18n WMCore coworfing CouchPotatoServer rose ecms webpay sympy nexus−oss Osmand active_admin Equivalent−Exchange−3 iSENSE−Hardware zamboni spray gitlabhq errbit usergrid−stack Printrun rspec−core celluloid homebrew−science candlepin core gunicorn hy travis−core socorro BPSF railo geocoder k−9 ADL_LRS otter OTM2 katello openmicroscopy oi−userland zipkin geoserver foreman django−rest−framework django pandas mezzanine whitehall mifosx pyzmq cookbooks mongoengine pulp_rpm werkzeug middleman pentaho−commons−gwt−modules nikola right_link dropwizard rosdistro ycp−killer rspec−expectations salt miro rstat.us components eden Bukkit basex engine maven−android−plugin django−social−auth appscale kotlin hibernate−validator padrino−framework rSENSE refinerycms amu_automata_2011 chillingeffects SynapseWebClient spree_i18n riak−java−client wonder socorro−crashstats homebrew vagrant loomio scalaz sagecell neo4j wildfly jagger formtastic aws−sdk−ruby ipython sufia cas storm exercism.io heroku android−sync kuma carrierwave c2cgeoportal bundler pylearn2 meniscus rspec−mocks hibernate−search jboss−as−quickstart junit Theano jruby oq−engine mcMMO Essentials repose smart−answers androidannotations groovy−core akka elasticsearch ssGWT−lib homebrew−cask sveditor mongo−ruby−driver c−geo−opensource EMS ENdoSnipe mopidy rails adhocracy chef−cookbooks salt−cloud formhub SimpleCV bitmask_client requests ActionBarSherlock jbosstools−jst dcache OpenGenesis subscription−manager sensu−community−plugins socode nipype linkit appscale−tools addon−sdk sumo−tests tornado capybara Superdesk pentaho−platform resque sql−layer open−build−service draper spring−batch otwarchive berkshelf shoulda−matchers rubygems unisubs edx−platform liquibase fpm play1 capybara−webkit www.ruby−lang.org pip TFCraft floodlight qtile lims−api sched.do nexus rubinius octopress fuse Minecraft−Overviewer reddeer opennaas mail django−timepiece picketlink leap_client sequencescape xwiki−platform spring−integration MinecraftForge cgeo OpenMDAO−Framework boto django−tastypie ESP−Website devise orbisgis sensu Catroid MyJobs kitsune django−extensions active_model_serializers statsmodels calcentral rhc jbosstools−base Silverpeas−Components autotest pyload autopsy addmeto.cc coi−services spring−framework remo quickstarts elephant−bird Razor brakeman atlas mule andlytics diaspora jclouds drools druid dipy mongoid CONNECT django−oscar bigbluebutton BuildCraft python−guide spark scala narayana twitter−bootstrap−rails kivy cucumber molgenis diffa Play20 CraftBukkit pulp regulations−site OWD_TEST_TOOLKIT droidplanner openstreetmap−website 900 projects 350,000 pull reqs lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers 40 features Churn src, test Participants Project popularity Commits Files src, doc Forward links Pull request submitter followers, track record Code reviews Merges (also those outside Github) https://github.com/gousiosg/pullreqs ML suite in R @gousiosg