Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A dataset for pull based development research
Search
Georgios Gousios
May 31, 2014
0
860
A dataset for pull based development research
MSR 2014 best data paper award presentation
Georgios Gousios
May 31, 2014
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
260
The troubles of modern dependency management and what to do about them
gousiosg
0
490
Mining Repositories with Apache Spark
gousiosg
0
620
My adventures with open everything
gousiosg
0
260
Structure and Evolution of Package Dependency Networks
gousiosg
0
730
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
340
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
900
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
260
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
461
33k
Building Your Own Lightsaber
phodgson
104
6.2k
Optimizing for Happiness
mojombo
377
70k
4 Signs Your Business is Dying
shpigford
183
22k
Raft: Consensus for Rubyists
vanstee
137
6.8k
Practical Orchestrator
shlominoach
186
10k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
BBQ
matthewcrist
87
9.5k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.2k
RailsConf 2023
tenderlove
29
1k
Mobile First: as difficult as doing things right
swwweet
223
9.5k
We Have a Design System, Now What?
morganepeng
51
7.4k
Transcript
A dataset for pull request research Georgios Gousios and Andy
Zaidman @gousiosg
None
900 projects
350,000 pull reqs
40 features! (patch size, code reviews, testing, social) lifetime_minutes mergetime_minutes
num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touch test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers
suite of tools in R! (manipulation, selection, machine learning)
None
Wed, Jun 4! 17:30! Hall 3 Software Engineering Research Group
http://swerl.tudelft.nl/ Delft University of Technology A dataset for pull-based development research Georgios Gousios and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl Diamond Markus fuel mongo−python−driver openproject pentaho−reporting hazelcast DIRAC rails−i18n gxa hibernate−ogm celery homebrew−php www.gittip.com flask serverspec pybossa head cyder cython maraschino Vanilla Synapse−Repository−Services growstuff oozie Socorro−Tests RxJava ome−documentation s−ramp Terasology mrjob firefox−flicks onepercentclub−site virt−test zanata−server paperclip heroapp−LetsHire cylc cryptography Resteasy conductor culture−hub openengsb−framework pivotal_workstation PyBitmessage topaz linguist jbosstools−openshift sunpy XPrivacy DSpace rspec−rails middleman−guides shinken nagios github−services ncs_navigator_core katello−api django−cms cucumber−jvm veewee jbosstools−integration−tests retrofit metrics gaffer nrv scikit−image Silverpeas−Core mev mne−python infinispan unknown−horizons iris ra1stats lorsource scalding forma−clj marketplace−tests neutrino bosh modeshape hector mcom−tests refinerycms−blog spokenvote Sick−Beard pyon geotrellis ztltest jboss−eap−quickstarts fail2ban gatein−portal bioformats simple_form imagefactory goldrush frontend sequel metasploit−framework CocoaPods platform logstash scrapy docs SciELO−Manager sidekiq chef wakame−vdc ggrc−core hibernate−orm omniauth ckan stacktach sunspot nova brooklyn teiid−designer sentry grape missionhub playframework okhttp pry 24pullrequests progit origin−server OpenSlides scala−ide commcare−hq sbt data−access spree website phoenix Catacomb−Snatch summingbird ralph activerecord−jdbc−adapter rootpy jekyll pelican matplotlib geotools dynmap qiime Spout cobbler compass pyramid tatami biopython hornetq gatling rails_admin raven−python reddit lims−core basho_docs liferay−portal asciidoctor SalesforceMobileSDK−Android the−blue−alliance vumi−go tire META−SHARE GravityBox emesene web2py developer.github.com rubocop bitHopper dagger configuration ninja−ide ruby fog vumi rudder chiliproject netty bedrock zf2−documentation play active_merchant ka−lite HiggsAnalysis−HiggsToTauTau ursula youtube−dl buildbot nltk pyes Addon−Tests tools grails−core psychopy erpnext karma−exchange alaveteli totalfinder−i18n WMCore coworfing CouchPotatoServer rose ecms webpay sympy nexus−oss Osmand active_admin Equivalent−Exchange−3 iSENSE−Hardware zamboni spray gitlabhq errbit usergrid−stack Printrun rspec−core celluloid homebrew−science candlepin core gunicorn hy travis−core socorro BPSF railo geocoder k−9 ADL_LRS otter OTM2 katello openmicroscopy oi−userland zipkin geoserver foreman django−rest−framework django pandas mezzanine whitehall mifosx pyzmq cookbooks mongoengine pulp_rpm werkzeug middleman pentaho−commons−gwt−modules nikola right_link dropwizard rosdistro ycp−killer rspec−expectations salt miro rstat.us components eden Bukkit basex engine maven−android−plugin django−social−auth appscale kotlin hibernate−validator padrino−framework rSENSE refinerycms amu_automata_2011 chillingeffects SynapseWebClient spree_i18n riak−java−client wonder socorro−crashstats homebrew vagrant loomio scalaz sagecell neo4j wildfly jagger formtastic aws−sdk−ruby ipython sufia cas storm exercism.io heroku android−sync kuma carrierwave c2cgeoportal bundler pylearn2 meniscus rspec−mocks hibernate−search jboss−as−quickstart junit Theano jruby oq−engine mcMMO Essentials repose smart−answers androidannotations groovy−core akka elasticsearch ssGWT−lib homebrew−cask sveditor mongo−ruby−driver c−geo−opensource EMS ENdoSnipe mopidy rails adhocracy chef−cookbooks salt−cloud formhub SimpleCV bitmask_client requests ActionBarSherlock jbosstools−jst dcache OpenGenesis subscription−manager sensu−community−plugins socode nipype linkit appscale−tools addon−sdk sumo−tests tornado capybara Superdesk pentaho−platform resque sql−layer open−build−service draper spring−batch otwarchive berkshelf shoulda−matchers rubygems unisubs edx−platform liquibase fpm play1 capybara−webkit www.ruby−lang.org pip TFCraft floodlight qtile lims−api sched.do nexus rubinius octopress fuse Minecraft−Overviewer reddeer opennaas mail django−timepiece picketlink leap_client sequencescape xwiki−platform spring−integration MinecraftForge cgeo OpenMDAO−Framework boto django−tastypie ESP−Website devise orbisgis sensu Catroid MyJobs kitsune django−extensions active_model_serializers statsmodels calcentral rhc jbosstools−base Silverpeas−Components autotest pyload autopsy addmeto.cc coi−services spring−framework remo quickstarts elephant−bird Razor brakeman atlas mule andlytics diaspora jclouds drools druid dipy mongoid CONNECT django−oscar bigbluebutton BuildCraft python−guide spark scala narayana twitter−bootstrap−rails kivy cucumber molgenis diffa Play20 CraftBukkit pulp regulations−site OWD_TEST_TOOLKIT droidplanner openstreetmap−website 900 projects 350,000 pull reqs lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers 40 features Churn src, test Participants Project popularity Commits Files src, doc Forward links Pull request submitter followers, track record Code reviews Merges (also those outside Github) https://github.com/gousiosg/pullreqs ML suite in R @gousiosg