Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Valuable Software Engineering

Valuable Software Engineering

Keynote delivered at MODELS 2024 in Linz, Austria, September 26, 2024.

As software engineers, the value we bring to society lies in the software systems we construct, maintain, and operate. Ideally, we do this in a cost-effective and predictable manner. In reality, however, software projects have a reputation of being late and overly costly.

In this talk, I will reflect on this drawing from experiences in the financial and public sector. Within the former, I will look at agile software development at scale at ING, a global bank headquartered in The Netherlands. Based on years of historic data of around 300 agile software development teams at ING, I will reflect on effort estimation, on time delivery, delay, and team dynamics, at scale. I will contrast this with the public sector by looking at assessments of over 100 government IT projects, conducted by the Advisory Council on IT Assessment (AcICT) of the Dutch government over the past years. Here I will reflect on the assessment procedure and its outcomes, including the main types of risks identified (which may be technical or organizational) and the nature of the advice given (which can include project termination).

Given the insights from the financial and public sector, I will conclude by sketching research directions in the areas of governance, predictable value delivery, and modeling.

https://conf.researchr.org/details/models-2024/models-2024-models-2024-keynotes/3/Valuable-Software-Engineering

Arie van Deursen

September 26, 2024
Tweet

More Decks by Arie van Deursen

Other Decks in Technology

Transcript

  1. Valuable Software Engineering Arie van Deursen, TU Delft MODELS 2024,

    Linz, Austria @[email protected] 1 Adele Bloch-Bauer I, Gustav Klimt, 1907. Wikipedia
  2. David Notkin 1955-2013 “ … the intent is to make

    the engineering of software more effective so that society can benefit even more from the amazing potential of software.” 2 “If we care about influence, as I hope we do, then adding value to society is the real measure we should pursue.” ACM TOSEM Editorial, 2013
  3. My Journey 3 1990 2000 2010 2020 RESEARCH SOCEITY Domain

    models Reconstructed models Variability models Software process models Language models
  4. The Financial Sector • Data intensive • Software intensive •

    High stakes • Highly regulated • Long (system, data) lifetimes 4 High impact societal sector, with critical software engineering challenges (and modeling tradition in finance)
  5. ING Bank Global bank based in The Netherlands Five year

    collaboration with TU Delft: • Explainable AI • Human-AI decision making • Data integration • Incident management and AIOps • Release planning • Search-based testing and repair 5
  6. Agile at Scale at ING • ING Bank: 15,000 IT

    staff • Self-organizing teams (5-9 developers) • Short iterations (1-4 weeks) • User stories, features, epics • Delivered in releases (2-6 months) • Quarterly planning of all releases 6 Years of high-quality data available at ING
  7. Why is My Project Late? What are factors affecting timely

    epic delivery? • Let’s ask! How do these factors impact schedule deviation? • Let’s measure and model! 8 Elvan Kula et al IEEE TSE 2022
  8. Timely Epic Delivery: Perceived Factors Survey 1: Which factors? •

    289 responses • 25 factors; 5 dimensions Survey 2: Factor importance? • 337 responses • Rated impact level per factor Factor top 10: 1. Requirements refinement 2. Task dependencies 3. Organizational alignment 4. Organizational politics 5. Geographic distribution 6. Technical dependencies 7. Agile maturity 8. Regular delivery 9. Team stability 10. Skills and knowledge 9
  9. Measuring Delay: Balanced Relative Error • If actual delivery date

    after estimated date (”late”, pos%): • If actual delivery date before estimated date (“early”, neg%): • Collected BRE from 3,771 epics (273 teams), for 3 years 10
  10. 13 Predictor Variables • 35 metrics for 20 factors •

    13 metrics explain 67% of variation (MARS model, ) • Match with perception? ▪ Underestimated: size ▪ Agreed effect: dependencies, seniority, stability ▪ Overestimated: refinement, geography, ▪ Agreed little effect: coverage, code smells, … 11
  11. Dynamic Delay Prediction • Delay knowledge increases as epic unfolds

    (in milestones) • Mobility literature: Delay adheres to patterns, which can be learned by clustering delay time series • Is epic delay subject to patterns? • Can patterns improve delay prediction? 12
  12. Epic Delay Patterns 13 Elvan Kula et al FSE 2023

    Dataset: 4,040 epics of at least 10 sprints from 270 teams, 2017—2022 % epics in category: 36% 44% 14% 6%
  13. Epic Conclusions • There are measurable factors contributing to epic

    delay ▪ Size, project dependencies, past performance • Delay follows patterns ▪ Largest pattern is timely at start with delay peak at end, due to security and incidents • Factors + patterns predict delay, dynamically ▪ Beats the global and iterative SoTA baselines 15
  14. Midway Reflection Secrets to success? • A well-chosen meta-model (schema)

    of the data collected • Carefully collected multi-year data • Involvement of people (surveys) to give meaning to results • Learned models that are interpretable 16 Alhambra in winter
  15. The Public Sector • Government digitalization affects all aspects of

    society • Taxes, permits, pensions, social benefits, health insurance, … • Infrastructure, traffic, sector regulation, open government, … 17
  16. Public Sector Challenges • Complex (political) decision making • High

    demands on privacy, availability, transparency, inclusion, accessibility,… • Long (system, data) lifetimes • Accountability to minister, parliament, voter • Poor government digitalization undermines trust and democracy 18
  17. Advisory Council for IT Assessment (AcICT) • Dutch independent council

    (2015) • Advices ministers and parliament on risks and chances of success in complex government IT systems • Enshrined in law since 2024: ▪ Govt obliged to submit systems for assessment, collaborate, and respond • All reports are public 19 https://www.adviescollegeicttoetsing.nl/
  18. Council Organization • Five cabinet-appointed council members ▪ Experts from

    society, industry and academia • Supported by office of ~25 assessors • Assessment takes around six months • Data collection, interviews, analysis and advice formulation, fact checking, response from minister, … • Outcome: 8-10 page advice to minister 20
  19. A Word of Caution • Council focuses on large (>

    €5M) projects ▪ From these, council selects high risk projects ▪ This gives biased, distorted view of government IT • Other sectors have failures too, but these are less … public • The public sector also has plenty of successes • The public sector is full of hard-working, amazing professionals dedicated to making society better 21 Council members, 2023
  20. Assessment Framework Risk Areas 1. Business case, benefits, finance 2.

    Project organization and ownership 3. Risk management and project dependencies 4. Alignment business processes and IT solution 5. Scope control 6. Architecture, functional feasibility, technical realizability 7. Planning and realization 8. Procurement, tendering 9. Acceptance and transition to line 24 https://www.adviescollegeicttoetsing.nl/onze- werkwijze/documenten/publicaties/2021/12/01/toetskader-acict
  21. Levels of Impact Manual inspection of 100 reports Identified 3

    types of impact 1. MINOR: Continue, with suggestions for improvement 2. REVISE: Continue, with urgent interventions 3. MAJOR: Abort or major interventions 26
  22. Project Types • Build new system • Replace existing system

    • Adjust existing system substantially • Engage in major new procurement agreement 27
  23. Impact Differences per Project Type 28 Green field / replacement:

    • 35-40% revise • 35-40% major Evolution: 72% (13/18) minor impact
  24. Example 1: OpenVMS • Unemployment benefit systems: ▪ From: Cobol

    + Codasyl on Itanium/OpenVMS ▪ To: Java + relational database on Linux • Automated code conversion • 2020-2025, budget €36M • Assessment in 2023 ▪ Half (€19M) of budget spent 29
  25. Advice Halfway assessment organization itself decided to terminate project •

    Hard to maintain code explosion: ”JOBOL” Advice: 1. Build multi-disciplinary re-platforming team 2. Develop multiple alternative scenarios 3. Draw lessons from failed conversion 30 https://www.adviescollegeicttoetsing.nl/documenten/publicaties/2023/12/21/bit-advies-programma-openvms
  26. Example 2: Traffic Management • Replace 26 traffic management systems

    • Adopt & customize COTS solution • 2015-2026, budget €166M (originally €35M) • Assessment in 2022: Half (€83M) of budget spent 31
  27. Advice 1. Align ambition and capabilities 2. Take lead over

    supplier 3. Prioritize moving maintenance and operations to line 4. Organize fallback scenario 32
  28. Example 3: Funding Education • Distribute €32B/year over all Dutch

    educational institutions • Modernize current .NET systems ▪ Target: Rule-based platform + Java • 2019-2024, budget €18M ▪ €12M spent in 2022 • Assessments in 2021 and 2023 33
  29. Advice • Professionalize culture: ▪ Governance & finance ▪ Development

    & maintenance • Terminate use of (niche) rule-based platform • Invest in current .NET systems to safeguard continuity • Plan for stepwise modernization 34 https://www.adviescollegeicttoetsing.nl/onderzoeken/documenten/publicaties/2023/09/25/bit -advies-doorontwikkelen-applicatielandschap-bekostiging-2
  30. [ MODELS in Assessments? ] Prevalence Low code: • Blueriq,

    Mendix, Oracle Apex, … Domain-specific frameworks • Rule, case, law, document management and archiving • Real estate, traffic, … • ERP, SAP, MS Dynamics, … General modeling / UML, SysML Risks Identified Investments in ‘economies of scope’ shared with too few stakeholders • Vendor lock-in • Niche technology • Lifetime of technology Use of models not a clear differentiator between success and failure 35
  31. Recurring Advice Based on manual analysis of reports: 1. Reduce

    risk: Make it smaller 2. Articulate needs 3. Strengthen governance 4. Define mitigation measures 5. Invest in own capabilities 36 Egon Schiele, Leutnant Heinrich Wagner, Wikipedia
  32. Contrasting the two Studies Release planning @ ING • Problem

    well-scoped • Rich, well-structured data • Good fit with quantitative research methods • Clear research results • More research has potential to further optimize solutions Assessments @ AcICT • Problem broad & messy • Thick, but unstructured data • Must resort to qualitative research methods • Hard to get clear results • Tempting to move problems out of (SE) research scope 37
  33. Models to the Rescue? “ ... the most direct benefits

    of MDE can be summarized as: • Increase of communication effectiveness of stakeholders • Increase in the productivity of the development team thanks to the (partial) automation of the development process” 38 2012
  34. Increase in Productivity? • Of little use if you don’t

    know what you want … • Benefits come when combined with shorter feedback cycles • More work for the problem owner! ▪ Give feedback, take into production, manage organizational change • Done well, the benefits then become: ▪ Value delivered earlier; lower costs of failure 39
  35. Increase of Communication Effectiveness? • “Tactical” communication: ▪ Domain model,

    bound to code ▪ Within Eric Evan’s “bounded context” • “Strategic” communication: ▪ High level, coarse-grain “features” ▪ Reason about feasibility, cost, progress ▪ Devise alternative roadmaps ▪ Align with (conflicted) stakeholders 40 Substantial body of work on model-driven approaches that support this In practice strong reliance on scaled agile frameworks like SAFe.
  36. AI to the Rescue? • Capabilities of foundation models are

    mind blowing • This will affect many aspects of software engineering • The ambitions of Artificial General Intelligence reach even higher • Will generative AI solve our problems? 41 Sam Altman (img src Wikipedia)
  37. I Large Language Models for Code 42 When to invoke

    code completion? (AIware, 2024, ) Benchmark for long-context tasks, with JetBrains 600,000 actual code completions, ICSE 2024 Summarizing binaries, SANER 2023 Memorization in LLMs, ICSE 2024 Ambition to boost developer productivity!
  38. Generative AI for the Government? Please let’s not have: Given

    that government is document heavy And that complex projects have a large document footprint The last thing we need is continuous generation of plausible-sounding text no one feels responsible for Needed instead: • Clear sense of direction • To the point documents • Crisp communication • Consensus building • Accountability 43 Supporting this is hard: Models might help; LLMs won’t.
  39. Focused AI Timnit Gebru (SaTML 2023): We should build smaller-scale

    systems (that are well-scoped and well-defined) for which we can provide specifications for expected behavior, tolerance and safety protocols. 44 Timnit Gebru, img Wikipedia https://x.com/NicolasPapernot/status/1623885641380425728
  40. Model-Driven Data-Driven Tycho Brahe (1567-1601) Meticulous data collection of planet

    positions 45 Johannes Kepler (1571-1630) Models / laws of planet movement
  41. We need to: • Think about the value we want

    to bring to society • Have the courage to attack hard / urgent problems • Strengthen strategic communication in IT • Let models learn from data • Focus the use of AI: explainable and with guarantees For each of these, modeling is indispensable 46
  42. Valuable Software Engineering Arie van Deursen, TU Delft MODELS 2024,

    Linz, Austria @[email protected] 47 Adele Bloch-Bauer I, Gustav Klimt, 1907. Wikipedia