Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Of Software Changes; Large and Small

Of Software Changes; Large and Small

Keynote Talk at the 30th International Conference on Software Maintenance and Evolution (ICSME), Victoria, BC, Canada, 3 October, 2014.



Arie van Deursen

October 03, 2014

More Decks by Arie van Deursen

Other Decks in Research


  1. 1  

  2. Acknowledgements   •  ICSME  Organizers   •  Co-­‐researchers    

    Georgios  Gousios,  Hennie   Huijgens,  Steven   Raemaekers,  MarCn   Pinzger,  Peggy  Storey,     Joost  Visser  Andy  Zaidman       •  SERG  @  TU  DelM   •  CHISEL  @  UVic   2  
  3. Maintenance  and  EvoluCon   Evolu&on   Curiosity-­‐driven  research   Help

     to  understand  the  world:   •  How  do  people  manage  to   change  their  soMware?     Nurtured  by  challenges  faced   during  soMware  maintenance   Maintenance   Engineering  research   Help  to  change  the  world   •  How  can  we  support  people   changing  their  soMware?   Nurtured  by  soMware   evoluCon  research  results   3   hWps://flic.kr/p/edchNL,  steveolmstead  
  4. SoMware  Changes:     Large  and  Small   •  Edit

      •  Commit   •  Bug  fix   •  Pull  request   •  Feature   •  Library  release   •  Project  por`olio   4   flickr.com/photos/wackybadger/  
  5. Number  of  Individual  Contributors  in     Most  CollaboraCve  GitHub

     Projects   5   Full  data  on  hWp://www.gousios.gr/blog/The-­‐triumph-­‐of-­‐online-­‐collaboraCon/  
  6. The  Pull  Request       Offer  coherent  set  of

     changes     to  a  project  owner     “I  take  responsibility  for  this  change;   are  you  willing  to  integrate  it”?   6  
  7. Pull  Request  =     Means  of  CommunicaCon   7

  8. GhTorrent:  GitHub  Data   in  a  RelaConal  Database   • 

    Scalable,  query-­‐able,   offline  mirror  of  data   from  GitHub  REST  API.   •  Data  since  2012   •  5.9  TB  json  in  MongoDB   •  600M  rows  in  MySQL   •  1  GB  per  hour  collected   8   Georgios  Gousios:  The  GHTorent  dataset  and  tool  suite.  MSR  2013:  233-­‐236   Georgios  Gousios,  Diomidis  Spinellis:  GHTorrent:  Github's  data  from  a  firehose.  MSR  2012:  12-­‐21  
  9. #Pull  requests  per  project   •  Median  =  2;  

      •  95-­‐percenCle  =  21   •  Rails  /  Homebrew:     •  >  10,000  pull  requests   0 1000 2000 3000 100 10000 Number of pull requests (log) Number of projects 9   Georgios  Gousios,  MarCn  Pinzger,  Arie  van  Deursen.   An  exploratory  study  of  the  pull-­‐based  soMware  development  model.  ICSE  2014:  345-­‐355  
  10. Pull  Request  Sample   Obtain  beWer  understanding  of    

    pull  request-­‐intensive  projects.     Sample  projects  with:   •  >  200  pull  requests   •  test  suite   •  Ruby,  Python,  Java,  Scala   •  At  least  one  commit  from  non-­‐core  member   10   ResulCng  PR   sample:   300  projects;   166,000  pull   requests  
  11. Most  Pull  Requests  are  Accepted   11  

  12. Acceptance  is  Generally  Fast   12  

  13. Key  Pull  Request  CharacterisCcs   •  Decision  to  merge  :

      – depends  on  how  hot  area  of  code  is   •  Time  to  merge:   – depends  on  test  density  of  project   •  Pull  request  rejecCon:     – caused  by  (lack  of)  task  arCculaCon   13   Georgios  Gousios,  MarCn  Pinzger,  Arie  van  Deursen.   An  exploratory  study  of  the  pull-­‐based  soMware  development  model.  ICSE  2014:  345-­‐355  
  14. Reaching  Out:   “Pull  Request  Performance  Reports”   14  

    Lifelines  for  10%   slowest  pull  requests   External  (red)  versus   core  (blue)  contributors   hWp://ghtorrent.org/pullreq-­‐perf/  
  15. Reaching  Out:  Integrators   •  IdenCfied  over  3000  projects  that

        received  one  pull  request  per  week  in  2013.   •  Generated  performance  reports   •  Sent  personalized  email  to  integrators.   •  …  asked  a  carefully  designed  set  of  quesAons.   •   and  received  750  responses  in  2  weeks.   15   Georgios  Gousios,  Andy  Zaidman,  Margaret-­‐Anne  Storey,  and  Arie  van  Deursen.  Work  PracCces  and   Challenges  in  Pull-­‐Based  Development:  The  Integrator’s  PerspecCve.  Report  TUD-­‐SERG-­‐2014-­‐013  
  16. 16   QuesAon  26.       What  is  the

     biggest  challenge  (if  any)  you  face     while  managing  contribuAons   through  pull  requests?     (410  answers)  
  17. 17   R509:     Huge,  unwieldy,  complected  bundles  of

     ‘hey,   I  added  a  LOT  of  features  and  fixes  ALL  AT  ONCE!’     R255:      Telling  people  that  something  is  wrong     without  hurCng  their  feelings     R449:     Dealing  with  loud  and  trigger-­‐happy  developers.  
  18. 18   accepting blame communicating goals and standards context switching

    multiple communication channels reaching consensus poor notifications project speed process ignorance timezones coordination among contributors coordination among integrators impact politeness asking more work bikeshedding hit 'n' run RPs poor documentation age syncing feature isolation developer availability conflicts differences in opinion motivating contributors generalizing solutions tools git knoweledge size review tools testing responsiveness maintain vision volume explaining rejection reviewing maintaining quality time 0.0 2.5 5.0 7.5 Percentage of responses rank Top Second Third Biggest  Challenges  when   Working  with  Pull  Requests  
  19. 19   QuesCon  22.       What  heurisCcs  do

     you  use  for     assessing  the  quality  of  pull  requests?  
  20. 20   R330:     is  the  diff  minimal?  

    R72:     Performance  related  changes  require   test  data  or  a  test  case.    R405:     Who  submiWed  the  PR  and  what  history  did   we  have  with  him/her?  
  21. 21   technical correctness roadmap architecture fit simplicity communication/discussion change

    locality quality check automation clear purpose performance commit quality added value test manual size project conventions author reputation experience understandability documentation test result code review code quality test coverage style conformance 0 4 8 12 Percentage of responses rank Top Second Third Factors  developers  examine   when  evaluaCng     quality  of  contribuCons  
  22. 22   QuesAon  20.       Imagine  you  frequently

     have     more  than  50  pull  requests  in  your  inbox.     How  do  you  triage  them?  
  23. 23   R446:   Bug  fixes  first,  then  new  features.

        Only  if  all  bug  fix  pull  requests  are  treated.    R490:     The  lower  the  number  of  lines/files  changes,     the  more  likely  I  am  to  process  it  first.   R82:     If  I  know  the  person,  they  get  high  priority.     Sorry,  strangers.  
  24. 24   1.  CriCcality  (bug  fixes),     2.  Urgency

     (new  features)     3.  Size   20%  of  integrators  do  not   prioriCze   Pull  Request   PrioriCzaCon?  
  25. 25  

  26. “Library  Upgradeability”   26  

  27. The  Maven  Dataset   •  150,000  released  jar  files;  100,000

     with  source   •  As  in  Maven  Central  on  July  30,  2011   •  With  resolved  usage  data   27   Steven  Raemaekers,  Arie  van  Deursen,  Joost  Visser.   The  maven  repository  dataset  of  metrics,  changes,  and  dependencies.  MSR  2013:  221-­‐224  
  28. Breaking  Changes  in  APIs   28   1. // Version

    1 of Lib1 1. public class Lib1 { 2. public void foo() {...} 3. public int doStuff() {...} 4. } 1. // method2 uses foo & doStuff 1. public class System1 { 2. public void method2() { 3. Lib1 c1 = new Lib1(); 4. c1.foo(); 5. int x = c1.doStuff(); 6. anUnrelatedChange(); 7. }} 1. // Version 2 of Lib1 1. public class Lib1 { 2. public void foo(int bar) {...} 3. public String doStuff() {...} 4. } 1. // method2 uses foo & doStuff 1. public class System1 { 2. public void method2() { 3. Lib1 c1 = new Lib1(); 4. c1.foo(); 5. int x = c1.doStuff(); 6. } 7. }
  29. Binary   IncompaAbiliAes   Focus  of  study:   Removals:  Interface,

     class,  method,  or  field   Changes:      Nr  of  parameters,  or  change  in                                            field  /  parameter  /  return  type   29   “Pre-­‐exisAng  client  binaries  must  link  and  run  with  new  releases   of  the  component  without  recompiling.”  -­‐-­‐-­‐  the  CLIRR  tool.   flickr.com/photos/dullhunk/  
  30. Breaking  Changes  in  Maven’s  90,000  Updates   30   Steven

     Raemaekers,  Arie  van  Deursen,  Joost  Visser.     SemanCc  Versioning    versus  Breaking  Changes:    A  Study  of  the  Maven  Repository.  SCAM  2014.  
  31. Breaking  Changes  over  Time   31  

  32. 32   Image  Wikipedia  

  33. “The  CHAOS  Report”   33   hWp://www.projectsmart.co.uk/docs/chaos-­‐report.pdf   Original  EsCmate

                   Actual  Cost  
  34. 34  

  35. Cost  vs  Size  for  350  Projects   35    Hennie

     Huijgens,  Rini  van  Solingen,  Arie  van  Deursen:     How  to  build  a  good  pracCce  soMware  project  por`olio?  ICSE  SEIP,  2014    
  36. Cost/DuraCon  Matrix   36  

  37. Data-­‐driven  SoMware  Pricing   •  Repository  (n=22):     Historic

     projects  of  C,  not  built  by  S.   •  Baseline  (n=16):   Finalized  projects  of  C,  build  by  S,  in  scope.   •  Pilot  (n=10):   Strictly  size-­‐based  pricing,  6  month  period   •  Forecast  (n=29):   Price  agreements  based  on  funcConal  size   37    Hennie  Huijgens,  Georgios  Gousios,  Arie  van  Deursen.  Pricing  via  FuncConal  Size:     A  Case  Study  of  77  Outsourced  Projects.  Tech  report  TUD-­‐SERG-­‐2014-­‐012.      
  38. 38   P2:     The  soluCon  is  looked  into

     more  detail  in  order  to   get  the  right  FuncCon  Points  […]  This  helps  in   early  detecCon  of  issues  and  resoluCon.   P5:     FuncCon  point  analysis  is  not  applicable  to   projects  where  more  tesCng  efforts  are  required   for  less  development  changes.   P3:     Too  many  small  projects  are  negaCve  for   Company  C  due  to  economy-­‐of-­‐scale  effects.  
  39. 39  

  40. CogniCve  Bias   “System  1”:  Fast,   insCncCve,  emoConal.  

      "System  2”:  Slow,   deliberaCve,  logical.     System  2  requires  effort.     It’s  happy  to  let  System  1   do  the  work   40  
  41. The  WYSIATI  CogniCve  Bias   41  

  42. Raising  the     Change  Level   •  Git:  assembly

     language   of  change  management   •  Give  change  first-­‐class   treatment  at  all  levels   •  Parallelize  /  de-­‐linearize   change  where  possible     42   flickr.com/photos/wackybadger/  
  43. 43   Data  PreparaCon  and  Sharing     •  Usable

     data  set  requires     substanCal  engineering   •  Rethink  schemas  to   enable  data  integraCon   •  Open  SE  data?   •  Corporate  data?     Meta-­‐data!    
  44. 44   “The  major  problems  of  our  work  are  not

     so   much  technological  as  sociological  in  nature”.   (1987)  
  45. Involve  the  People   •  (Big)  data  is  a  means,

     not  an  objecCve   •  True  understanding  comes  from  talking  to   people  busy  geYng  their  hands  dirty   •  QualitaCve  research  is  hard  work   •  ObligaCon  to  share  our  results   45  
  46. Accountable  Curiosity   •  Curiosity  drives  our  research   • 

    Impact  on  cost  and  quality  of  soMware   maintenance  acCviCes  is  the  ulCmate  reward.   •  Are  we  trying  hard  enough?   46  
  47. Concluding  QuesCons:   •  Can  we  give  change  first-­‐class  treatment

        at  all  levels?   •  Are  we  willing  to  prepare  and  share  data?   •  How  are  we  involving  the  people?   •  Do  we  address  evoluCon  AND  maintenance?   47