Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A dataset for pull based development research
Search
Georgios Gousios
May 31, 2014
950
0
Share
A dataset for pull based development research
MSR 2014 best data paper award presentation
Georgios Gousios
May 31, 2014
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
350
The troubles of modern dependency management and what to do about them
gousiosg
0
680
Mining Repositories with Apache Spark
gousiosg
0
710
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
900
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
440
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
Featured
See All Featured
Producing Creativity
orderedlist
PRO
348
40k
How GitHub (no longer) Works
holman
316
150k
Building the Perfect Custom Keyboard
takai
2
740
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
200
Fireside Chat
paigeccino
42
3.9k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
140
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
1.3k
4 Signs Your Business is Dying
shpigford
187
22k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Become a Pro
speakerdeck
PRO
31
5.9k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
500
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Transcript
A dataset for pull request research Georgios Gousios and Andy
Zaidman @gousiosg
None
900 projects
350,000 pull reqs
40 features! (patch size, code reviews, testing, social) lifetime_minutes mergetime_minutes
num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touch test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers
suite of tools in R! (manipulation, selection, machine learning)
None
Wed, Jun 4! 17:30! Hall 3 Software Engineering Research Group
http://swerl.tudelft.nl/ Delft University of Technology A dataset for pull-based development research Georgios Gousios and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl Diamond Markus fuel mongo−python−driver openproject pentaho−reporting hazelcast DIRAC rails−i18n gxa hibernate−ogm celery homebrew−php www.gittip.com flask serverspec pybossa head cyder cython maraschino Vanilla Synapse−Repository−Services growstuff oozie Socorro−Tests RxJava ome−documentation s−ramp Terasology mrjob firefox−flicks onepercentclub−site virt−test zanata−server paperclip heroapp−LetsHire cylc cryptography Resteasy conductor culture−hub openengsb−framework pivotal_workstation PyBitmessage topaz linguist jbosstools−openshift sunpy XPrivacy DSpace rspec−rails middleman−guides shinken nagios github−services ncs_navigator_core katello−api django−cms cucumber−jvm veewee jbosstools−integration−tests retrofit metrics gaffer nrv scikit−image Silverpeas−Core mev mne−python infinispan unknown−horizons iris ra1stats lorsource scalding forma−clj marketplace−tests neutrino bosh modeshape hector mcom−tests refinerycms−blog spokenvote Sick−Beard pyon geotrellis ztltest jboss−eap−quickstarts fail2ban gatein−portal bioformats simple_form imagefactory goldrush frontend sequel metasploit−framework CocoaPods platform logstash scrapy docs SciELO−Manager sidekiq chef wakame−vdc ggrc−core hibernate−orm omniauth ckan stacktach sunspot nova brooklyn teiid−designer sentry grape missionhub playframework okhttp pry 24pullrequests progit origin−server OpenSlides scala−ide commcare−hq sbt data−access spree website phoenix Catacomb−Snatch summingbird ralph activerecord−jdbc−adapter rootpy jekyll pelican matplotlib geotools dynmap qiime Spout cobbler compass pyramid tatami biopython hornetq gatling rails_admin raven−python reddit lims−core basho_docs liferay−portal asciidoctor SalesforceMobileSDK−Android the−blue−alliance vumi−go tire META−SHARE GravityBox emesene web2py developer.github.com rubocop bitHopper dagger configuration ninja−ide ruby fog vumi rudder chiliproject netty bedrock zf2−documentation play active_merchant ka−lite HiggsAnalysis−HiggsToTauTau ursula youtube−dl buildbot nltk pyes Addon−Tests tools grails−core psychopy erpnext karma−exchange alaveteli totalfinder−i18n WMCore coworfing CouchPotatoServer rose ecms webpay sympy nexus−oss Osmand active_admin Equivalent−Exchange−3 iSENSE−Hardware zamboni spray gitlabhq errbit usergrid−stack Printrun rspec−core celluloid homebrew−science candlepin core gunicorn hy travis−core socorro BPSF railo geocoder k−9 ADL_LRS otter OTM2 katello openmicroscopy oi−userland zipkin geoserver foreman django−rest−framework django pandas mezzanine whitehall mifosx pyzmq cookbooks mongoengine pulp_rpm werkzeug middleman pentaho−commons−gwt−modules nikola right_link dropwizard rosdistro ycp−killer rspec−expectations salt miro rstat.us components eden Bukkit basex engine maven−android−plugin django−social−auth appscale kotlin hibernate−validator padrino−framework rSENSE refinerycms amu_automata_2011 chillingeffects SynapseWebClient spree_i18n riak−java−client wonder socorro−crashstats homebrew vagrant loomio scalaz sagecell neo4j wildfly jagger formtastic aws−sdk−ruby ipython sufia cas storm exercism.io heroku android−sync kuma carrierwave c2cgeoportal bundler pylearn2 meniscus rspec−mocks hibernate−search jboss−as−quickstart junit Theano jruby oq−engine mcMMO Essentials repose smart−answers androidannotations groovy−core akka elasticsearch ssGWT−lib homebrew−cask sveditor mongo−ruby−driver c−geo−opensource EMS ENdoSnipe mopidy rails adhocracy chef−cookbooks salt−cloud formhub SimpleCV bitmask_client requests ActionBarSherlock jbosstools−jst dcache OpenGenesis subscription−manager sensu−community−plugins socode nipype linkit appscale−tools addon−sdk sumo−tests tornado capybara Superdesk pentaho−platform resque sql−layer open−build−service draper spring−batch otwarchive berkshelf shoulda−matchers rubygems unisubs edx−platform liquibase fpm play1 capybara−webkit www.ruby−lang.org pip TFCraft floodlight qtile lims−api sched.do nexus rubinius octopress fuse Minecraft−Overviewer reddeer opennaas mail django−timepiece picketlink leap_client sequencescape xwiki−platform spring−integration MinecraftForge cgeo OpenMDAO−Framework boto django−tastypie ESP−Website devise orbisgis sensu Catroid MyJobs kitsune django−extensions active_model_serializers statsmodels calcentral rhc jbosstools−base Silverpeas−Components autotest pyload autopsy addmeto.cc coi−services spring−framework remo quickstarts elephant−bird Razor brakeman atlas mule andlytics diaspora jclouds drools druid dipy mongoid CONNECT django−oscar bigbluebutton BuildCraft python−guide spark scala narayana twitter−bootstrap−rails kivy cucumber molgenis diffa Play20 CraftBukkit pulp regulations−site OWD_TEST_TOOLKIT droidplanner openstreetmap−website 900 projects 350,000 pull reqs lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers 40 features Churn src, test Participants Project popularity Commits Files src, doc Forward links Pull request submitter followers, track record Code reviews Merges (also those outside Github) https://github.com/gousiosg/pullreqs ML suite in R @gousiosg