Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A dataset for pull based development research
Search
Georgios Gousios
May 31, 2014
0
840
A dataset for pull based development research
MSR 2014 best data paper award presentation
Georgios Gousios
May 31, 2014
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
600
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
710
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
330
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
Featured
See All Featured
Navigating Team Friction
lara
183
15k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
32
2.7k
Documentation Writing (for coders)
carmenintech
66
4.5k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
810
Making the Leap to Tech Lead
cromwellryan
133
9k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
2
290
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
What's in a price? How to price your products and services
michaelherold
243
12k
Testing 201, or: Great Expectations
jmmastey
40
7.1k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Transcript
A dataset for pull request research Georgios Gousios and Andy
Zaidman @gousiosg
None
900 projects
350,000 pull reqs
40 features! (patch size, code reviews, testing, social) lifetime_minutes mergetime_minutes
num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touch test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers
suite of tools in R! (manipulation, selection, machine learning)
None
Wed, Jun 4! 17:30! Hall 3 Software Engineering Research Group
http://swerl.tudelft.nl/ Delft University of Technology A dataset for pull-based development research Georgios Gousios and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl Diamond Markus fuel mongo−python−driver openproject pentaho−reporting hazelcast DIRAC rails−i18n gxa hibernate−ogm celery homebrew−php www.gittip.com flask serverspec pybossa head cyder cython maraschino Vanilla Synapse−Repository−Services growstuff oozie Socorro−Tests RxJava ome−documentation s−ramp Terasology mrjob firefox−flicks onepercentclub−site virt−test zanata−server paperclip heroapp−LetsHire cylc cryptography Resteasy conductor culture−hub openengsb−framework pivotal_workstation PyBitmessage topaz linguist jbosstools−openshift sunpy XPrivacy DSpace rspec−rails middleman−guides shinken nagios github−services ncs_navigator_core katello−api django−cms cucumber−jvm veewee jbosstools−integration−tests retrofit metrics gaffer nrv scikit−image Silverpeas−Core mev mne−python infinispan unknown−horizons iris ra1stats lorsource scalding forma−clj marketplace−tests neutrino bosh modeshape hector mcom−tests refinerycms−blog spokenvote Sick−Beard pyon geotrellis ztltest jboss−eap−quickstarts fail2ban gatein−portal bioformats simple_form imagefactory goldrush frontend sequel metasploit−framework CocoaPods platform logstash scrapy docs SciELO−Manager sidekiq chef wakame−vdc ggrc−core hibernate−orm omniauth ckan stacktach sunspot nova brooklyn teiid−designer sentry grape missionhub playframework okhttp pry 24pullrequests progit origin−server OpenSlides scala−ide commcare−hq sbt data−access spree website phoenix Catacomb−Snatch summingbird ralph activerecord−jdbc−adapter rootpy jekyll pelican matplotlib geotools dynmap qiime Spout cobbler compass pyramid tatami biopython hornetq gatling rails_admin raven−python reddit lims−core basho_docs liferay−portal asciidoctor SalesforceMobileSDK−Android the−blue−alliance vumi−go tire META−SHARE GravityBox emesene web2py developer.github.com rubocop bitHopper dagger configuration ninja−ide ruby fog vumi rudder chiliproject netty bedrock zf2−documentation play active_merchant ka−lite HiggsAnalysis−HiggsToTauTau ursula youtube−dl buildbot nltk pyes Addon−Tests tools grails−core psychopy erpnext karma−exchange alaveteli totalfinder−i18n WMCore coworfing CouchPotatoServer rose ecms webpay sympy nexus−oss Osmand active_admin Equivalent−Exchange−3 iSENSE−Hardware zamboni spray gitlabhq errbit usergrid−stack Printrun rspec−core celluloid homebrew−science candlepin core gunicorn hy travis−core socorro BPSF railo geocoder k−9 ADL_LRS otter OTM2 katello openmicroscopy oi−userland zipkin geoserver foreman django−rest−framework django pandas mezzanine whitehall mifosx pyzmq cookbooks mongoengine pulp_rpm werkzeug middleman pentaho−commons−gwt−modules nikola right_link dropwizard rosdistro ycp−killer rspec−expectations salt miro rstat.us components eden Bukkit basex engine maven−android−plugin django−social−auth appscale kotlin hibernate−validator padrino−framework rSENSE refinerycms amu_automata_2011 chillingeffects SynapseWebClient spree_i18n riak−java−client wonder socorro−crashstats homebrew vagrant loomio scalaz sagecell neo4j wildfly jagger formtastic aws−sdk−ruby ipython sufia cas storm exercism.io heroku android−sync kuma carrierwave c2cgeoportal bundler pylearn2 meniscus rspec−mocks hibernate−search jboss−as−quickstart junit Theano jruby oq−engine mcMMO Essentials repose smart−answers androidannotations groovy−core akka elasticsearch ssGWT−lib homebrew−cask sveditor mongo−ruby−driver c−geo−opensource EMS ENdoSnipe mopidy rails adhocracy chef−cookbooks salt−cloud formhub SimpleCV bitmask_client requests ActionBarSherlock jbosstools−jst dcache OpenGenesis subscription−manager sensu−community−plugins socode nipype linkit appscale−tools addon−sdk sumo−tests tornado capybara Superdesk pentaho−platform resque sql−layer open−build−service draper spring−batch otwarchive berkshelf shoulda−matchers rubygems unisubs edx−platform liquibase fpm play1 capybara−webkit www.ruby−lang.org pip TFCraft floodlight qtile lims−api sched.do nexus rubinius octopress fuse Minecraft−Overviewer reddeer opennaas mail django−timepiece picketlink leap_client sequencescape xwiki−platform spring−integration MinecraftForge cgeo OpenMDAO−Framework boto django−tastypie ESP−Website devise orbisgis sensu Catroid MyJobs kitsune django−extensions active_model_serializers statsmodels calcentral rhc jbosstools−base Silverpeas−Components autotest pyload autopsy addmeto.cc coi−services spring−framework remo quickstarts elephant−bird Razor brakeman atlas mule andlytics diaspora jclouds drools druid dipy mongoid CONNECT django−oscar bigbluebutton BuildCraft python−guide spark scala narayana twitter−bootstrap−rails kivy cucumber molgenis diffa Play20 CraftBukkit pulp regulations−site OWD_TEST_TOOLKIT droidplanner openstreetmap−website 900 projects 350,000 pull reqs lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers 40 features Churn src, test Participants Project popularity Commits Files src, doc Forward links Pull request submitter followers, track record Code reviews Merges (also those outside Github) https://github.com/gousiosg/pullreqs ML suite in R @gousiosg