Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evidence Based Software Engineering

Evidence Based Software Engineering

Software engineering is a field that is highly prone to fads. Development methods, ideas and practices are proposed and (sadly) followed without much evidence to back the claims. In most cases, the proponents present an unvalidated idea without a decent picture of the trade-offs. The new and emerging field of Evidence-based Software Engineering aims to bring a scientific view and validate the fads and ideas. This talk will discuss findings from analysis of software release histories and motivate why it is worth looking at history and what we can learn from it.
Tl;dr the talk will give answers to questions like:
If god classes are evil, why do we have so many?
Why is software structured fractally?
Why do developers resist change? (and what the level of resistance means)
How wealth inequality and software system design are closely related — as in communism fails even in software!

This talk was delivered at Kiandra IT on 30-Oct-2013. http://kiandra.com.au

Rajesh Vasa

October 30, 2013
Tweet

More Decks by Rajesh Vasa

Other Decks in Programming

Transcript

  1. Evidence Based Software Engineering Dr. Rajesh Vasa Swinburne University of

    Technology @rvasa  |  linkedin.com/in/rvasa  |  +Rajesh  Vasa
  2. © Rajesh Vasa, 2013 - Creative Commons BSD License About

    me (in brief…) • IBM Manufacturing Automation System (first patent in NLP) • News Feed Processors (Founder of startup - failed) • Yellow Pages Search Engine (Senior Engineer, learnt to hate C++) • Telstra Message Bank (Solution Architect / Team Lead, best 3 month project) • Telstra e-Commerce Platform (Senior Engineer, B2B and XML hell) • Amcor Packaging Intelligence (Senior Engineer, constraint optimisation) • jMetric Engine (Founder & Architect - startup acquired by Webgain Inc. — woohoo!) • Visual Cafe IDE Modelling & Metrics (Solution Architect, Silicon Valley life!) • (Webgain) Oracle Toplink ORM (Architect & Consultant, lived in a plane mostly) • Thinking Objects (CTO. AI solutions startup - sold business) • United Nations (Consultant — enjoyed the high-life in Asia) • cvMail (IT Director, loved the full freedom of running a tech. team) • Thomson Reuters (IT Director, Architect & Advisor - learnt business strategy) • Swinburne R&D Software Group (Innovation Lead) • Advisor for Neon Three (AU), Money Tree (Japan), Suyati (India), M5859 Studios (AU) • Academic: 40+ publications, PhD, Judge, Conference Chair, Course coordinator, 10+ Grad students etc. etc. I enjoy building software & have been at it for a while
  3. © Rajesh Vasa, 2013 - Creative Commons BSD License !

    ! ! Measured, analysed and reflected on software engineering for nearly 15 years ! ! ! I was doing big data, when it was small :)
  4. © Rajesh Vasa, 2013 - Creative Commons BSD License About

    me … • Built systems using many frameworks, libraries & languages (LISP, Ada, C, C++, Java, C#, Ruby…) • Worked in different domains (Manufacturing, Finance, HR, Education, Telecom, Advertising…) • Last 3 years mostly mobile (Android and iOS) • Good experience building data oriented systems, compilers, AI Techniques, Crypt, processing large data, statistics • Very limited experience in building graphics / games & multi-media heavy systems • Agile to Waterfall — used all them methods!
  5. © Rajesh Vasa, 2013 - Creative Commons BSD License !

    ! ! Buzz Word Driven Software Engineering (BuDS) ! ! SCRUM, UML, TDD, BDD, User stories Pair programming, CI, Git Evolve & Adapt Agile, Lean Process, No to Big Upfront Design Responsive, Mobile first We work in a strange world! Rails, JEE, ASP, #, $, @ ….
  6. © Rajesh Vasa, 2013 - Creative Commons BSD License !

    ! ! There is a lot of advise for Engineering Software ! ! Follow Method X Use tool Z, and technique T Agile, UML, Testing, User stories, Planning
  7. © Rajesh Vasa, 2013 - Creative Commons BSD License Here

    is a bullet list …. • Some engineers are 10 times better than others • TDD and BDD is how real engineers work • SCRUM & Daily stand-ups are effective! • Pair programming improves quality, reduces risk • God classes are evil — avoid them at all cost • Let the architecture evolve (don’t design up-front) • Rapid regular releases improve quality • Refactoring is a must • Low (loose) coupling, but high cohesion
  8. © Rajesh Vasa, 2013 - Creative Commons BSD License !

    ! ! So … what should you follow? ! ! Is there evidence for practice X?
  9. © Rajesh Vasa, 2013 - Creative Commons BSD License !

    ! ! Evidence based Software Engineering ! ! Emerging field // gaining momentum
  10. © Rajesh Vasa, 2013 - Creative Commons BSD License !

    ! ! Lets start….. ! ! (with the easy ones)
  11. © Rajesh Vasa, 2013 - Creative Commons BSD License Is

    this for real? or Is it one of them Valley myths? Some engineers are 10x better than others The only way to recruit is to hire the “10x engineer” Aileen Lee, Cowboy Ventures http://news.yahoo.com/twitter-pays-engineer-10-million-silicon-valley-tussles-130525209--sector.html
  12. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Productivity Myth • “A good developer is 30 times more productive than a bad developer” • Common Source: Sackman et. al 1968: “experimental studies comparing online and offline programming performance” • Original was based on 12 programmers for 3 hours. Checked in the 80s with 50 student developers for an hour (arrived at 10x productivity factor) • Recent evidence (Prechelt, 2009) shows productivity depends on length of program text (independent of language abstraction/level) • Take-away: Using a DSL will make you more productive (yes, there is a trade-off)
  13. © Rajesh Vasa, 2013 - Creative Commons BSD License Evidence

    is mixed Does Pair Programming improve quality? Moderately effective for complex tasks (but reduces productivity for simpler tasks) Vanhanen, HICSS, 2007 Structured reviews are a better investment
  14. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License User Reactions and Evolution • An analysis of millions of User Reviews shows: • Users will complain mostly about lack of (i) reliability, (ii) functionality, and (iii) performance (in this order) • Users will praise you for non-functional attributes -- typically cosmetic, usability etc. • Users seem to “live with” atrocious user interface and usability if the system is stable and works functionally • Functionality does not receive praise :( Source: Research in progress by Vasa et. al. (to be published in 2014)
  15. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License “Prediction is difficult, especially about the future” -- Niels Bohr Process (measures) offers the most reliable and stable predictor of quality — Rahman, Devanbhu ICSE 2013 ! ! Errors of omission are the silent killer, checklists & automation around this helps — Balachandran ICSE 2013
  16. © Rajesh Vasa, 2013 - Creative Commons BSD License Swinburne

    UG CS/SE Courses • Fully revised from 2015 onwards • NO exams in 60% of the subjects • Portfolio + Students self grade and defend • 6 project units • Git / CI / Angular.JS / RESTful / Hadoop / Issue tracker, Patterns, Scalable architectures • Math, Physics, Electronics, Computer Systems, OS • Usual stuff: Data structures, algorithms, DB, AI etc. • R&D Projects with Computer Vision, AI, Robots
  17. © Rajesh Vasa, 2013 - Creative Commons BSD License Benford’s

    Law In  natural  systems,  when  an   attribute  is  measured:   ~30%  of  the  counts  start  with  1   ~17%  start  with  2   ~12%  start  with  3   ! Prob(K)  =  log(k+1)  -­‐  log(k) E.g. Law approximately holds for population sizes in cities, river lengths, and many other naturally evolving systems
  18. © Rajesh Vasa, 2013 - Creative Commons BSD License Benford’s

    Law in Software 0 10 20 30 Percent 0 2 4 6 8 10 dd Distribution of Fields (across 1000 releases) In  ~28%  of  systems:  Field   count  starts  with  1   (e.g:  123,  1987,  14  etc.)
  19. © Rajesh Vasa, 2013 - Creative Commons BSD License Benford’s

    Law - Approximate Pattern 0 10 20 30 Percent 0 2 4 6 8 10 dd Theory Observation Pattern is similar -- an exponentially decaying curve Why bother? It also allows us to understand the underlying physics — this pattern does not change rapidly in a normally evolving system This pattern repeats at various levels of abstraction
  20. Spring Framework - Evolution 0.0% 5.0% 10.0% 15.0% 20.0% 25.0%

    30.0% 35.0% 40.0% 45.0% 50.0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ Branch Count % Classes 5 years of evolution Strong boundaries — Highly Skewed
  21. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2012 - Creative Commons BSD License (Almost) Nothing ever gets removed!! 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Release Sequence Number % Classes Unchanged Modified Deleted Change probability in Spring Framework Source: R.Vasa 2007
  22. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2012 - Creative Commons BSD License Rate of Change reduces with Age 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19 20 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Release Sequence Number % Classes Unchanged Modified New Source: R.Vasa 2007
  23. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Multiple Changes are Rare 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Modification Count % Classes (Cumulative) Axis Azureus Castor Checkstyle Findbugs Groovy Hibernate Jung Spring Struts Webwork Wicket Modification count of classes that have changed 80% of modified classes, are touched less than 5 times
  24. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Few Classes change a lot... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 Modification Distance % Classes (Cumulative) Axis Azureus Castor Checkstyle Findbugs Groovy Hibernate Jung Spring Struts Webwork Wicket Magnitude of change (ordinal scale) Majority of change is “a couple of lines” Bulk of the change is minor tweaks
  25. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Change occurs in clusters • You can exploit it: “programmers that have changed this file have also changed ....” A) The user inserts a new preference into the field fKeys[] B) ROSE suggests locations for further changes, e.g. the function initDefaults() Figure 1. After the programmer has made some changes to the ECLIPSE source (above), ROSE ROSE Eclipse Plug-in Source: Zimmerman 2004
  26. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2012 - Creative Commons BSD License Growth at different levels in Hibernate y"="$6E$07x2"+"0.0054x"+"0.0714" R²"="0.98702" y"="$1E$06x2"+"0.0091x"$"0.1116" R²"="0.98429" 0%" 200%" 400%" 600%" 800%" 1000%" 1200%" 1400%" 1600%" 0" 500" 1000" 1500" 2000" 2500" 3000" Growth'(Classes'and'Methods)'2'Hibernate' Methods Classes Methods are growing faster than classes Both are sub-linear (slightly)
  27. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2012 - Creative Commons BSD License Growth Pattern is set early y"="$6E$07x2"+"0.0054x"+"0.0714" R²"="0.98702" y"="$1E$06x2"+"0.0091x"$"0.1116" R²"="0.98429" 0%" 200%" 400%" 600%" 800%" 1000%" 1200%" 1400%" 1600%" 0" 500" 1000" 1500" 2000" 2500" 3000" Growth'(Classes'and'Methods)'2'Hibernate' y"="$2E$07x2"+"0.0006x"+"0.7316" R²"="0.90324" y"="$2E$07x2"+"0.0006x"+"0.6574" R²"="0.88995" y"="$6E$08x2"+"0.0002x"+"0.5585" R²"="0.64771" 0%" 20%" 40%" 60%" 80%" 100%" 120%" 140%" 160%" 0" 500" 1000" 1500" 2000" 2500" Checkstyle*Growth*(Classes,*Methods,*Public*Methods)* Rate is similar over years …. (study confirmed in over 50 systems)
  28. © Rajesh Vasa, 2013 - Creative Commons BSD License 0

    200,000 400,000 600,000 800,000 1,000,000 1,200,000 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Total uncommented LOC drivers arch include net fs kernel mm ipc lib init Figure 5. Growth of the major subsystems (development releases only). Source: Godfrey 2002 Super linear growth is possible … but need an architecture (software and social) that supports it
  29. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Software is not normal 0 0 0 2 2 2 4 4 4 6 6 6 8 8 8 10 10 10 Percent Percent Percent 0 0 0 50 50 50 100 100 100 150 150 150 200 200 200 Method Count Method Count Method Count Ant (Version 1.7.1) Ant (Version 1.7.1) Ant (Version 1.7.1) 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 2 3 4 5 6 7 8 9 10 11 12 13 14 15 % methods Cyclomatic complexity Hibernate - Method Cyclomatic Complexity (10 versions) 0 0 0 5 5 5 10 10 10 Percent Percent Percent 0 0 0 20 20 20 40 40 40 60 60 60 80 80 80 100 100 100 Branches (in class) Branches (in class) Branches (in class) Hibernate Cyclomatic Complexity (Class Level) Hibernate Cyclomatic Complexity (Class Level) Hibernate Cyclomatic Complexity (Class Level) • Developers centralise complexity/intelligence into a few key components • Driven by short-term memory & structural efficiency
  30. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Software is a Small World Network Figure 5. Network topology of object-oriented software Apache-tomcat V6.0.13 IV. EFFECTS BROUGHT BY COMPLEX NETWORK lifecycle of application is frequently use parameters, su TABLE 1 Degree Ranking ID Sco 1 320 118 2 315 86 3 215 76 4 254 72 5 44 72 6 223 60 7 159 60 8 444 52 9 445 50 10 457 48 B. Effect of Complexit Source: Li 2008
  31. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Small World Network - Linux tware engineering Software s can be divided t-oriented. In this e software types, n visualize them. ware architecture: operating system ans of taking the ween functions as nodes and 1841 function invokes ibes the function achieve the output umber, and other een each pair of gy is formed. gy of 630 kernel Figure 3. Network topology of partial kernel functions in the procedure- oriented software Linux V0.12 Figure 4. Network topology of all kernel functions in the procedure-oriented software Linux V0.12 The above two examples indicate that the internal structure of software system is not random. It has small world and scale- free features. Software architecture can be represented by Source: Li 2008
  32. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Why do we prefer small-world networks? • Software ends up like this due to structural optimisation undertaken by developers (Vasa 2005) • We aim is to reduce the communication overhead • This structure allows us to get to almost all abstraction within the entire system in ~ 5 hops ( e.g. a.b.c.d.e() ) • No other organisation model yields similar performance Figure 5. Network topology of object-oriented software Apache-tomcat V6.0.13 IV. EFFECTS BROUGHT BY COMPLEX NETWORK Topological structure visualization of software systems clearly depict the relationships of software units, which can help us easily get insight into the common characteristic and underlying working mechanisms of the whole system. This result has some practical value to software testing, software design and software development, etc. A. Effect of Evaluation of Key Nodes on Software Testing Software systems are dynamic. They continue to evolve in lifecycle of softw application is sho frequently used in parameters, such a TABLE 1 TOP Degree Ranking ID Score 1 320 118 2 315 86 3 215 76 4 254 72 5 44 72 6 223 60 7 159 60 8 444 52 9 445 50 10 457 48 B. Effect of Com Complexity of which can be me defined modularit the minimization maximization of coupling/maximum widespread featur structure, which m that edges appear two groups. Inf Figure 2. Multi-level architecture of traditional software engineering B. Topology Representation of Networked Software Architecture Currently, main software design methods can be divided into two kinds: process-oriented and object-oriented. In this section, we research on two representative software types, define topology abstraction principles and then visualize them. 1) Analysis of the process-oriented software architecture: We choose the source code of Linux V0.12 operating system to get an abstract network topology, by means of taking the function as node, the invoke relationship between functions as edges. The statistical parameters are 630 nodes and 1841 edges. Figure 3 is the partial topology of function invokes relationship of nucleus function, which describes the function and call relations of vsprintf and do exit. To achieve the output function, vsprintf calls is digit, strlen, number, and other functions, corresponding to an edge between each pair of functions. Eventually a local network topology is formed. Figure 4 is the global network topology of 630 kernel functions and the invoke relationship in Linux operating system. This figure presents that call relations between functions are not evenly distributed, which means few functions are frequently called, but the majority ones are less called, which follows the power-law distribution. 2) Analysis of the object-oriented software architecture: When it comes to the object-oriented software structure analysis, the source code of Apache-tomcat V6.0.13 network application server is considered. Abstracting classes as nodes, and their inheritance or combination relations as edges, the software structure topology can be constructed. The statistical Figure 3. Network topology of partial kernel functions in the procedure- oriented software Linux V0.12 Figure 4. Network topology of all kernel functions in the procedure-oriented software Linux V0.12 The above two examples indicate that the internal structure of software system is not random. It has small world and scale- free features. Software architecture can be represented by topology which brings some new inspirations to the accuracy, reliability, maintainability and security of single function or object. However, this thought hasn@t got enough attention by the traditional software engineering. In addition, because of the characteristics of networked software, it is essential to consider not only features of single software units, but also interactions between software units at the same time. !"" !""
  33. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License How can we explain Long Exception Traces?
  34. © Rajesh Vasa, 2013 - Creative Commons BSD License ©

    Rajesh Vasa, 2013 - Creative Commons BSD License Closing Advise • Architect to satisfy non-functional requirements • Document how not why (per quality attribute) • Use a DSL if one is available for your domain • Convention over Configuration • Don’t worry if a few modules/classes/methods are big & clunky — they are not a big concern • Use checklists, review stuff • A consistent process is useful (not what it is called) • Good quality requirements are critical (4 line user stories on a wall are not sufficient)