What if? Supporting decisions with software dynamics simulations

What if? Supporting decisions with software dynamics simulations

It's awkward to perform science experiments on developers, so let's simulate them instead!

In 1968 Melvin Conway pointed out a seemingly inevitable symmetry between organisations and the software systems they construct. Organisations today are more fluid than 40 years ago, with short developer tenure, and frequent migration of individuals between projects and employers. In this slot we’ll examine - and perhaps collect - data on the tenure and productivity of programmers and use this to gain insight into codebases, by simulating their growth with simple stochastic models. From such models, we can make important predictions about the maintainability and long-term viability of software systems, with implications for how we approach software design, documentation and how we assemble teams.

4be361182fa13cf39c00ec69c1cb9e30?s=128

Robert Smallshire

October 15, 2015
Tweet

Transcript

  1. @sixty_north What if? Supporting decisions with software dynamics simulations 1

    Robert Smallshire @robsmallshire
  2. 2

  3. Randomised controlled trials 3 Experimental Science ‣ Developers don’t like

    to be watched ‣ Eliminating extraneous factors ‣ Toy problems aren’t realistic ‣ No two projects are the same ‣ Can’t do double-blind ‣ Students have little experience ‣ Time and money
  4. 4

  5. How can we know? 5 Prediction Comparison Modelling Observation Formulate

    a hypothesis. Design a conceptual model. Run simulations. Observe and record reality. Validate or refute the model. 1 2 3 4
  6. 6 Modelling system growth How many people work on your

    system? Predicting project progress How many people should work on your system? Software process dynamics How can you construct models and run simulations? 1 2 3
  7. Systems and their architectures are long lived Lifetimes in the

    software industry 7 Category Title Developers Windows XP Applications CEOs Lines of code FTSE100 Classes Modules 0 15 30 45 60 58 37 22 13 6.8 6.2 4.7 3.1 Sources: Software Lifetime and its Evolution Process over Generations, CEO Succession Practices: 2012 Edition, Investors Chronicle, Half-lives of software related entities The number of years over which half the entities are replaced
  8. 0 10000 30000 20000 Draw teams at random from a

    productivity distribution Simulating Developer Productivity 8 1 Productivity SLOC/year Productivity on 10000 SLOC codebase Probability Density 0% 50% 100% Cumulative Probability max min mode triangular distribution cumulative distribution function
  9. 9

  10. 100 1000 10000 100000 1000 10000 100000 1000000 10000000 Productivity

    (Lines of Code / Year) Total Lines of Code Use published productivity data to forward model code size. Modelling team and code evolution 10 Sources: COCOMO II At any given system size we can predict a distribution for developer productivity. Dramatically less productive on larger code bases 29000 5500
  11. 11 start with nothing some developers contribute more others less

    when a developer leaves After 5 years we have 235 k lines of code written by a total of
 19 people.
 Only 37% of the code is by current team 5 years Simulating a team of seven over five years they are replaced
  12. 12

  13. 13 157 kLoC Cumulative team size : 11 ± 2

    @ 1σ Team Size : 7 LoC : 157 k ± 23 k @ 1σ Author present : 70% ± 14% @ 1σ 3 years
  14. 14 1.8 MLoC Cumulative team size : 114 ± 9

    @ 1σ Team Size : 21 LoC : 1.8 M ± 0.08 M @ 1σ Author present : 19% ± 4% @ 1σ 20 years
  15. Probability density from 1000 simulations How long for seven to

    produce 100 000 lines of code? 15 probability of delivery on a particular day 200 400 600 800 0 Days 0 0.006 Probability
  16. Cumulative probability from 1000 simulations 16 How long for 7

    to produce 100 000 lines of code? 200 400 600 800 0 Days 100% Cumulative Probability 0% 20% 80% probability of delivery before a particular day 330 470
  17. Most authors of your product quit way back when Who

    can you still talk to? 17 days 20% after 20 years The proportion of code written by current team
  18. from the 1968 paper How do committees invent? 18 Conway’s

    Law Melvin Conway “Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure” integrated over time
  19. 19 Modelling system growth How many people work on your

    system? Predicting project progress How many people should work on your system? Software process dynamics How can you construct models and run simulations? 1 2 3
  20. 20

  21. 21 Charles R Knight (1921) Rancho la Brea Tar Pool

  22. 22 “Adding manpower to a late software project makes it

    later.” Fred Brooks / The Mythical Man-Month Wikimedia Commons
  23. How can we know? 23 Prediction Comparison Modelling Observation Formulate

    a hypothesis. Design a conceptual model. Run simulations. Observe and record reality. Validate or refute the model. 1 2 3 4
  24. Model systems for improving structures, policies and interventions System dynamics

    simulations ‣ Define problem dynamically – over time ‣ Endogenous view of significant dynamics ‣ Model reproduces problem of concern ‣ Derive understanding 24
  25. Events or equations? Discrete versus continuous modelling 25 Discrete ‣

    Individuals ‣ Populations ‣ Definite events ‣ Probability distributions ‣ Stochastic ‣ Concrete scenarios ‣ Harder to formulate as code Continuous ‣ Aggregates ‣ Levels of quantities ‣ Flow rates ‣ Equations ‣ Numerical / analytical solutions ‣ More abstract ‣ Easier to formulate as code
  26. Elements of continuous models 26 personnel hiring
 rate attrition
 rate

    desired
 personnel level Source Supply outside model boundary Sink Repository outside model boundary Rate Flows cause changes in levels Auxiliary Constants or score-keeping variables Level Repository, stock, or accumulation, inside model boundary
  27. Reference behaviour Brooks' Law 27 personnel productivity time

  28. 28 requirements (unrealised) developed software software development rate Brooks' Law

    model
  29. 29 requirements (unrealised) developed software personnel software development rate nominal

    productivity Brooks' Law model personnel allocation rate
  30. 30 Schedule A (Baseline) ! 500 function points 20 personnel

    0.1 fps/person/day
 ! 250 days to completion
  31. 31 requirements (unrealised) developed software new personnel experienced
 personnel software

    development rate assimilation rate nominal productivity Brooks' Law model personnel allocation rate
  32. 32 Schedule B ! 500 function points 20 inexperienced personnel

    0.08 fps/person/day
 ! 313 days to completion
  33. 33 requirements (unrealised) developed software new personnel experienced
 personnel software

    development rate assimilation rate nominal productivity Brooks' Law model personnel allocation rate
  34. 34 Schedule C ! 500 function points 20 inexperienced personnel

    20 day assimilation delay
 ! 215 days to completion
  35. 35 requirements (unrealised) developed software new personnel experienced
 personnel software

    development rate assimilation rate nominal productivity experienced personnel for training training overhead Brooks' Law model personnel allocation rate
  36. 36 Schedule D ! 500 function points 20 inexperienced personnel

    20 day assimilation delay 25% of an experienced person needed for training each new person during assimilation
 ! 220 days to completion
  37. 37 requirements (unrealised) developed software new personnel experienced
 personnel software

    development rate assimilation rate nominal productivity experienced personnel for training communication overhead training overhead Brooks' Law model personnel allocation rate
  38. 38 Schedule E ! 500 function points 20 inexperienced personnel

    20 day assimilation delay 25% of an experienced person needed for training each new person during assimilation Abdel-Hamid quadratic communication overhead
 ! 286 days to completion
  39. 39 Schedule E ! 500 function points 20 inexperienced personnel

    20 day assimilation delay 25% of an experienced person needed for training each new person during assimilation Abdel-Hamid quadratic communication overhead
 ! 286 days to completion
  40. 40 Schedule E Assimilation Delay Sensitivity Analysis ! 10 day

    280 days 20 day 286 days 30 day 292 days
  41. 41 requirements (unrealised) developed software new personnel experienced
 personnel software

    development rate assimilation rate nominal productivity experienced personnel for training communication overhead training overhead planned completion Brooks' Law model personnel allocation rate
  42. 42 import brooks.communication ! ! def initial(): """Configure the initial

    model state.""" return dict( step_duration_days=1, num_function_points_requirements=500, num_function_points_developed=0, num_new_personnel=20, num_experienced_personnel=0, personnel_allocation_rate=0, personnel_assimilation_rate=0, assimilation_delay_days=20, nominal_productivity=0.1, new_productivity_weight=0.8, experienced_productivity_weight=1.2, training_overhead_proportion=0.25, communication_overhead_function=brooks.communication.quadratic_overhead_proportion, software_development_rate=None, ) ! ! def intervene(step_number, elapsed_time, state): """Intervene in the current step before the main simulation step is executed.""" return state ! ! def is_complete(step_number, elapsed_time_seconds, state): """Determine whether the simulation should end.""" return state.num_function_points_developed >= state.num_function_points_requirements ! ! def complete(step_number, elapsed_time_seconds, state): """Finalise the simulation state for the last recorded step.""" state.software_development_rate = 0 return state schedule_e.py
  43. 43 import brooks.communication ! ! def initial(): """Configure the initial

    model state.""" return dict( step_duration_days=1, num_function_points_requirements=500, num_function_points_developed=0, num_new_personnel=20, num_experienced_personnel=0, personnel_allocation_rate=0, personnel_assimilation_rate=0, assimilation_delay_days=20, nominal_productivity=0.1, new_productivity_weight=0.8, experienced_productivity_weight=1.2, training_overhead_proportion=0.25, communication_overhead_function=brooks.communication.quadratic_overhead_proportion, software_development_rate=None, ) ! ! def intervene(step_number, elapsed_time, state): """Intervene in the current step before the main simulation step is executed.""" if elapsed_time == 110: state.num_new_personnel += 5 return state ! ! def is_complete(step_number, elapsed_time_seconds, state): """Determine whether the simulation should end.""" return state.num_function_points_developed >= state.num_function_points_requirements ! ! def complete(step_number, elapsed_time_seconds, state): """Finalise the simulation state for the last recorded step.""" state.software_development_rate = 0 return state schedule_f_5.py
  44. 44 Schedule F 5 Add 5 new personnel on day

    110 ! Schedule E : 286 days Schedule F5 : 283 days
  45. 45 Fred Brooks was WRONG!

  46. 46 Actually…

  47. 47 Schedule F 10 Add 10 new personnel on day

    110 ! Schedule E : 286 days Schedule F5 : 283 days Schedule F10 : 307 days
  48. 48 Fred Brooks was RIGHT!

  49. 49 ValueError: Communication overhead proportion personnel number 34.9 out of

    range Model limitations ! Prevent extrapolation outside reasonable bounds!
  50. 50

  51. 51

  52. What about cost? 52 6625 287 days 5760 288 days

    7900 301 days 9865 329 days
  53. 53 Modelling system growth How many people work on your

    system? Predicting project progress How many people should work on your system? Software process dynamics How can you construct models and run simulations? 1 2 3
  54. Simulation Tools ‣ iThink / Stella ‣ Vensim ‣ Excel

    ‣ PowerSim ‣ Simile ‣ etc 54
  55. Program it yourself ‣ Python ‣ Matplotlib (charting) ‣ Pandas

    (tables, time-series) ‣ Numpy (fast numerics) 55
  56. 56

  57. 57 Model implementation https://github.com/sixty-north/brooks

  58. 58 Software Process Dynamics Sure

  59. ‣ Secure buy-in for modelling and models ‣ Parameterise the

    model ‣ As simple as possible, but no simpler ‣ Be clear on system boundary / assumptions ‣ Experiment! ‣ Discuss results 59
  60. 60 Thank you! @sixty_north Robert Smallshire @robsmallshire http://sixty-north.com/blog/
 predictive-models-of-development-teams-and-the-systems-they-build

  61. 61 Thank you! @sixty_north Robert Smallshire @robsmallshire http://sixty-north.com/blog/
 predictive-models-of-development-teams-and-the-systems-they-build

  62. 62

  63. 63