Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Explainable Software Engineering in the Public ...

Explainable Software Engineering in the Public Sector

Inaugural lecture delivered on March 25 in the context of my appointment as International Francqui Chair at the University of Namur for the year 2024-2025.

Abstract:

https://www.unamur.be/en/info/chaire-francqui-2025

The field of software engineering seeks to devise theories, methods, tools, and techniques that support the development, operation, and evolution of the digital infrastructure modern society relies on. While the software engineering capabilities have advanced substantially over the past decades, it remains challenging to deliver high quality systems in a timely and cost-effective manner. Government systems in particular have a weak reputation in this respect.

To better understand why, we analyze 125 complex software projects in the public sector in The Netherlands. The projects are described in public reports published by the Advisory Council on IT Assessments (AcICT), which advises the Dutch parliament and cabinet on riks and chances of success in complex Information Technology (IT) projects. The projects span a time period of 10 years, represent a total budget of over 14 billion Euros, and cover such areas as tax collection, social security, pensions, health, traffic control, defense, or water management.

We study these reports through the lens of "explainability", focusing on supporting decision making. Furthermore, we reflect on current advances in software engineering, including modern software testing and large language models, in addressing current software engineering challenges.

Arie van Deursen

March 25, 2025
Tweet

More Decks by Arie van Deursen

Other Decks in Technology

Transcript

  1. Explainable Software Engineering In the Public Sector Francqui Inaugural Lecture

    Namur, March 2025 Arie van Deursen Delft University of Technology @avandeursen@mastodon.acm.org 1
  2. THANK YOU! • Fondation Francqui • University of Namur •

    Xavier Devroey • Team AcICT • SERG Delft • AI4Fintech with ING • AI4SE with JetBrains 2
  3. Software Engineering Research [Empirical methods, theory building] Seek to understand

    the methods and techniques that collaborating people use to develop software systems that bring value to society [Design science, interventions] Use this understanding to propose and evaluate novel software development methods and techniques 3
  4. SE ❤ AI AI4SE: Augment the software development life cycle

    with Artificial Intelligence SE4AI: Adjust the software development process to the needs of AI-based systems 4
  5. Explainable Artificial Intelligence “The ability of AI systems to provide

    clear and understandable explanations for their actions and decisions.” “Its central goal is to make the behaviour of these systems understandable to humans by elucidating the underlying mechanisms of their decision-making processes.” 5 European Data Protec.on Supervisor (EDPS), 2023
  6. Approaches to XAI Self-Interpretable • White box models • Inherently

    “understandable” • Decision trees, linear regression models, … • Humans can trace back outputs to input features After the Fact • Opaque black box models • ”Post-hoc” explanations • Global: Explain model (e.g., feature importance) • Local: Explain how specific output might relate to inputs 6
  7. “Explanation in Artificial Intelligence: Insights from the Social Sciences” (Tim

    Miller, Artificial Intelligence, 2018) An explanation is an answer to a why-question 7
  8. Explanations are Contextual • Contrastive: compared to counterfactual alternative •

    Selective: focusing on relevant parts of full causal chain • Social: transferring knowledge, assuming prior knowledge (Tim Miller, Artificial Intelligence, 2018) 8
  9. Counterfactual Explanations • Factual: Model denies loan • Counterfactual: Alternative

    inputs that would accept loan • Algorithmic recourse: Change of behavior to get desired outcome 9
  10. A Library for Generating Counterfactuals • Possible, faithful, plausible, “close”

    to the factual, … • Gradient descent in feature space (with extra cost terms) • Leverage ‘energy’ in input data seen during training • Macro-effects after recourse • Train for better counterfactuals • Rich library of Julia packages 10 h"ps://github.com/JuliaTrustworthyAI Patrick Altmeyer JuliaCon, 2022, 2023 IEEE SaTML, 2023 AAAI 2024
  11. Explainable Software Engineering System Explanations The ability of software systems

    to provide clear and understandable explanations for their actions and decisions Development Explanations The ability of software development organizations to provide clear and understandable explanations for their actions and decisions 11
  12. 12 Tech sector • $$$ income from ads • Billions

    of users • 10s of 1000s of developers (large tech corps) • Metrics driven • High developer autonomy Financial sector • €€€ interest on assets and liabilities • Billions of transactions • 1000s of developers (medium-sized bank) • Critical to society • Highly regulated Public sector • €€€ taxes • Billions of transactions • 1000s of developers (ministry) • All aspects of society • Public accountability Img: Maeslantkering, Rijkswaterstaat
  13. Public Sector Challenges • Complex (political) decision making • High

    demands on privacy, availability, transparency, inclusion, accessibility,… • Long (system, data) lifetimes • Accountability to minister, parliament, voter • Poor government digitalization undermines trust and democracy 13
  14. Advisory Council for IT Assessment (AcICT) • Dutch independent council

    (2015) • Advices ministers and parliament on risks and chances of success in complex government IT systems • Enshrined in law since 2024: § Govt obliged to submit systems for assessment, collaborate, and respond • All reports are public 14 https://www.adviescollegeicttoetsing.nl/
  15. Council Organization • Five cabinet-appointed council members § Experts from

    society, industry and academia • Supported by office of ~25 assessors • Assessment takes around six months • Data collection, interviews, analysis and advice formulation, fact checking, response from minister, … • Outcome: 8-10 page advice to minister 15
  16. A Word of Caution • Council focuses on large (>

    €5M) projects § From these, council selects high risk projects § This gives biased, distorted view of government IT • All sectors have failures, but these are less … public • The public sector also has plenty of successes • The public sector is full of hard-working, amazing professionals dedicated to making society better 16 Council members, 2024
  17. Assessment Framework Risk Areas 1. Business case, benefits, finance 2.

    Project organization and ownership 3. Risk management and project dependencies 4. Alignment business processes and IT solution 5. Scope control 6. Architecture, functional feasibility, technical realizability 7. Planning and realization 8. Procurement, tendering 9. Acceptance and transition to line 19 https://www.adviescollegeicttoetsing.nl/onze- werkwijze/documenten/publicaties/2021/12/01/toetskader-acict
  18. Levels of Impact Manual inspection of 100 reports Identified 3

    types of impact 1. MINOR: Continue, with suggestions for improvement 2. REVISE: Continue, with urgent interventions 3. MAJOR: Abort or major interventions 21
  19. Project Types • Build new system • Replace existing system

    • Adjust existing system substantially • Engage in major new procurement agreement 22
  20. Impact Differences per Project Type 23 Green field / replacement:

    • 35-40% revise • 35-40% major EvoluJon: 72% (13/18) minor impact
  21. Example 1: OpenVMS • Unemployment benefit systems S1 & S2:

    § From: Cobol + Codasyl on Itanium/OpenVMS § To: Java + relational database on Linux • Automated code conversion • 2020-2025, budget €36M • Assessment in 2023 § Half (€19M) of budget spent 24
  22. Advice Halfway assessment organization itself decided to terminate project •

    Hard to maintain code explosion: ”JOBOL” Advice: 1. S1 (no Codasyl): Replatform to Linux 2. S2: Develop multiple alternative scenarios 3. Draw lessons from failed conversion 25 https://www.adviescollegeicttoetsing.nl/documenten/publicaties/2023/12/21/bit-advies-programma-openvms
  23. Example 2: Traffic Management • Replace 26 traffic management systems

    • Adopt & customize solution based on commercial product • 2015-2026, budget €166M (originally €35M) • Assessment in 2022: Half (€83M) of budget spent 26
  24. Advice 1. Align ambition and capabilities 2. Take lead over

    supplier 3. Prioritize moving maintenance and operations to line 4. Organize fallback scenario 27
  25. Example 3: Funding Education • Distribute €32B/year over all Dutch

    educational institutions • Modernize current .NET systems § Target: Rule-based platform + Java • 2019-2024, budget €18M § €12M spent in 2022 • Assessments in 2021 and 2023 28
  26. Advice • Professionalize culture: § Governance & finance § Development

    & maintenance • Terminate use of (niche) rule- based platform • Invest in current .NET systems to safeguard continuity • Plan for stepwise modernization 29 hFps://www.adviescollegeicFoetsing.nl/onderzoeken/documenten/publicaGes/2023/09/25/bit -advies-doorontwikkelen-applicaGelandschap-bekosGging-2
  27. Recurring Advice Based on manual analysis of reports: 1. Reduce

    risk: Make it smaller 2. Articulate needs 3. Strengthen governance 4. Define mitigation measures 5. Invest in own capabilities 30 Egon Schiele, Leutnant Heinrich Wagner, Wikipedia
  28. Intermezzo: The US Dept of the Treasury • Handling $5.45

    trillion in federal payments • Managing U.S. government debt • Collecting federal taxes (via Internal Revenue Service IRS) • Enforcing tax laws • Currency circulation • Oversight on financial sector 31 Alexander Hamilton
  29. IRS IT in 2023 Inflation Reduction Act: • 10 year,

    $80 billion extra for IRS • Become a “digital first” tax collector • Reduce $7 trillion of uncollected taxes • Allocate $4.8 billion for IT modernization • Grow IRS from 80,000 to 165,000 employees 32 Janet Yellen
  30. US Treasury IT in 2025 • DOGE team gained access

    to Treasury’s § payment data (privacy) § payment systems code (integrity) • Intent to fire half of 100,000 IRS employees 33 “But the most alarming aspect [is …] the systema7c dismantling of security measures that would detect and prevent misuse […] by removing the career officials in charge of those security measures and replacing them with inexperienced operators.“ Bruce Schneier, Foreign Policy, Feb 2025
  31. End of US Intermezzo: Thoughts and Prayers • Support civil

    servants who are keeping critical systems up and running • Train our students to act in accordance with codes of ethics and professional conduct (e.g. from Assoc. Computing Machinery ACM) • Resist the illusion that a posse of whiz kids can quickly solve deeply complex (IT) problems • Undermining government IT can bring a nation down 34 Timothy Snyder & Nora Krug. On Tyranny. Twenty Lessons from the TwenGeth Century. Graphic EdiGon, 2021
  32. AI to the Rescue? • Capabilities of foundation models are

    mind blowing • This will affect many aspects of society, including government IT • The ambitions of Artificial General Intelligence reach even higher • Will (generative) AI solve our problems? 35 Sam Altman (img src Wikipedia)
  33. Nothing Beats Good Data • The best AI investment may

    be in your data • Cherish your data, keep it clean • Diffusion and auto-encoders can add superpowers to it • Good data forms basis for good synthetic data • Good data = open data 36 Florence NighJngale, 1855 “Causes of mortality in the army in the east”
  34. I ❤ Large Language Models (for Code) 37 When to

    invoke code compleJon? (AIware, 2024, 🏆) Benchmark for long-context tasks, with JetBrains 600,000 actual code completions, ICSE 2024 Summarizing binaries, SANER 2023 MemorizaJon in LLMs, ICSE 2024
  35. LLMs for Developers • Coding is language-centered • Development is

    data rich • Fertile ground for large language models: § Powerful tools in hands of skilled developers § Promise of better code with less effort 38
  36. Coding Productivity? • Improving productivity is of little use …

    … if you don’t know what you want • Benefits come when combined with shorter feedback cycles • More work for the “problem owner!” § Give feedback, take into production, manage organizational change • Done well, the benefits then become: § Value delivered earlier; lower costs of failure 39
  37. Language Models for the Government? • Government and its (IT)

    projects are document-heavy • LLMs can easily generate plausible-sounding text no one takes responsibility for § This will slow down rather than speed up government (IT) projects • What’s needed instead: § Clear sense of direction and crisp documentation § Consensus building and accountability 40 Img: JuBi, Wikipedia
  38. Focused AI Timnit Gebru (SaTML 2023): “We should build smaller-scale

    systems (that are well-scoped and well-defined) for which we can provide specifications for expected behavior, tolerance and safety protocols.” 41 Timnit Gebru, img Wikipedia https://x.com/NicolasPapernot/status/1623885641380425728
  39. Modernization • IT systems have life spans of 30+ years

    • Constant change in user expectations, system landscape, technology, … • Large-scale “modernizations” high risk • Council recommendations: 1. Invest in understanding current landscape 2. Modernize in small steps 3. Avoid by pro-active improvements 42
  40. Software Testing • The systematic selection of scenarios from the

    infinite executions domain • So that their pass ratio informs progress monitoring and decisions to ship changes to production • Given that the execution of a scenario is easier to grasp than a structural view on the system 43
  41. The Namur-Francqui Explainability Lectures • A new lens on software

    engineering • Build on software testing theory • A ‘test case’ is an ‘executable example’ • Students work together, collecting ‘why-questions’ and explanations • Exploring systematic approaches to ‘explanation generation’ 44
  42. Open Learning • Learn from each other § All students

    can see all results • Build on your own expertise § Students take their own knowledge as starting point • Set your own goals § Students have freedom to set their own direction • Engage with the outside world § Students reach out to practice and practitioners 45 Erik Duval
  43. Dialectic Learning in Software Engineering 1. Study / internalize existing

    software engineering theories and approaches 2. Do it! Apply these in development activities in realistic setting 3. Confront the two with each other § How does this theory really work? § Does this theory apply to my system? Why? Why not? Thesis: Theory Anti-Thesis: Practice Synthesis: Understanding 46
  44. System Under Study: CryptPad • Open source suite of office

    applications that runs in the browser • All data end-to-end encrypted in the browser. • Server only receives encrypted data it cannot grasp • Sharing and collaborative editing without anyone watching [ EU-made, open source, private alternative to Google Docs ] 47 hcps://cryptpad.fr
  45. 48

  46. Key Take Aways 1. Government digitalization is facing big challenges

    that demand our attention 2. Well functioning IT systems require long term stable leadership and investments 3. (Generative) AI is no silver bullet: We need focused AI 4. SE research needs a stronger focus on process predictability and explainability 49
  47. Explainable Software Engineering In the Public Sector Francqui Inaugural Lecture

    Namur, March 2025 Arie van Deursen Delft University of Technology @avandeursen@mastodon.acm.org 50