Explainable Software Engineering in the Public Sector

Explainable Software Engineering In the Public Sector Francqui Inaugural Lecture
Namur, March 2025 Arie van Deursen Delft University of Technology @[email protected] 1

THANK YOU! • Fondation Francqui • University of Namur •
Xavier Devroey • Team AcICT • SERG Delft • AI4Fintech with ING • AI4SE with JetBrains 2

Software Engineering Research [Empirical methods, theory building] Seek to understand
the methods and techniques that collaborating people use to develop software systems that bring value to society [Design science, interventions] Use this understanding to propose and evaluate novel software development methods and techniques 3

SE ❤ AI AI4SE: Augment the software development life cycle
with Artiﬁcial Intelligence SE4AI: Adjust the software development process to the needs of AI-based systems 4

Explainable Artificial Intelligence “The ability of AI systems to provide
clear and understandable explanations for their actions and decisions.” “Its central goal is to make the behaviour of these systems understandable to humans by elucidating the underlying mechanisms of their decision-making processes.” 5 European Data Protec.on Supervisor (EDPS), 2023

Approaches to XAI Self-Interpretable • White box models • Inherently
“understandable” • Decision trees, linear regression models, … • Humans can trace back outputs to input features After the Fact • Opaque black box models • ”Post-hoc” explanations • Global: Explain model (e.g., feature importance) • Local: Explain how specific output might relate to inputs 6

“Explanation in Artiﬁcial Intelligence: Insights from the Social Sciences” (Tim
Miller, Artiﬁcial Intelligence, 2018) An explanation is an answer to a why-question 7

Explanations are Contextual • Contrastive: compared to counterfactual alternative •
Selective: focusing on relevant parts of full causal chain • Social: transferring knowledge, assuming prior knowledge (Tim Miller, Artiﬁcial Intelligence, 2018) 8

Counterfactual Explanations • Factual: Model denies loan • Counterfactual: Alternative
inputs that would accept loan • Algorithmic recourse: Change of behavior to get desired outcome 9

A Library for Generating Counterfactuals • Possible, faithful, plausible, “close”
to the factual, … • Gradient descent in feature space (with extra cost terms) • Leverage ‘energy’ in input data seen during training • Macro-effects after recourse • Train for better counterfactuals • Rich library of Julia packages 10 h"ps://github.com/JuliaTrustworthyAI Patrick Altmeyer JuliaCon, 2022, 2023 IEEE SaTML, 2023 AAAI 2024

Explainable Software Engineering System Explanations The ability of software systems
to provide clear and understandable explanations for their actions and decisions Development Explanations The ability of software development organizations to provide clear and understandable explanations for their actions and decisions 11

12 Tech sector • $$$ income from ads • Billions
of users • 10s of 1000s of developers (large tech corps) • Metrics driven • High developer autonomy Financial sector • €€€ interest on assets and liabilities • Billions of transactions • 1000s of developers (medium-sized bank) • Critical to society • Highly regulated Public sector • €€€ taxes • Billions of transactions • 1000s of developers (ministry) • All aspects of society • Public accountability Img: Maeslantkering, Rijkswaterstaat

Public Sector Challenges • Complex (political) decision making • High
demands on privacy, availability, transparency, inclusion, accessibility,… • Long (system, data) lifetimes • Accountability to minister, parliament, voter • Poor government digitalization undermines trust and democracy 13

Advisory Council for IT Assessment (AcICT) • Dutch independent council
(2015) • Advices ministers and parliament on risks and chances of success in complex government IT systems • Enshrined in law since 2024: § Govt obliged to submit systems for assessment, collaborate, and respond • All reports are public 14 https://www.adviescollegeicttoetsing.nl/

Council Organization • Five cabinet-appointed council members § Experts from
society, industry and academia • Supported by ofﬁce of ~25 assessors • Assessment takes around six months • Data collection, interviews, analysis and advice formulation, fact checking, response from minister, … • Outcome: 8-10 page advice to minister 15

A Word of Caution • Council focuses on large (>
€5M) projects § From these, council selects high risk projects § This gives biased, distorted view of government IT • All sectors have failures, but these are less … public • The public sector also has plenty of successes • The public sector is full of hard-working, amazing professionals dedicated to making society better 16 Council members, 2024

10-20 Assessments per Year; Spread over Ministries 17 Dataset of
> 100 public reports. Analysis WIP

Project Cost 18 Total: €12.8 billion Median: €34 million Maximum:
Defense, €3.2 billion

Assessment Framework Risk Areas 1. Business case, beneﬁts, ﬁnance 2.
Project organization and ownership 3. Risk management and project dependencies 4. Alignment business processes and IT solution 5. Scope control 6. Architecture, functional feasibility, technical realizability 7. Planning and realization 8. Procurement, tendering 9. Acceptance and transition to line 19 https://www.adviescollegeicttoetsing.nl/onze- werkwijze/documenten/publicaties/2021/12/01/toetskader-acict

Risk area prevalence in 100 reports 20

Levels of Impact Manual inspection of 100 reports Identified 3
types of impact 1. MINOR: Continue, with suggestions for improvement 2. REVISE: Continue, with urgent interventions 3. MAJOR: Abort or major interventions 21

Project Types • Build new system • Replace existing system
• Adjust existing system substantially • Engage in major new procurement agreement 22

Impact Differences per Project Type 23 Green field / replacement:
• 35-40% revise • 35-40% major EvoluJon: 72% (13/18) minor impact

Example 1: OpenVMS • Unemployment beneﬁt systems S1 & S2:
§ From: Cobol + Codasyl on Itanium/OpenVMS § To: Java + relational database on Linux • Automated code conversion • 2020-2025, budget €36M • Assessment in 2023 § Half (€19M) of budget spent 24

Advice Halfway assessment organization itself decided to terminate project •
Hard to maintain code explosion: ”JOBOL” Advice: 1. S1 (no Codasyl): Replatform to Linux 2. S2: Develop multiple alternative scenarios 3. Draw lessons from failed conversion 25 https://www.adviescollegeicttoetsing.nl/documenten/publicaties/2023/12/21/bit-advies-programma-openvms

Example 2: Trafﬁc Management • Replace 26 trafﬁc management systems
• Adopt & customize solution based on commercial product • 2015-2026, budget €166M (originally €35M) • Assessment in 2022: Half (€83M) of budget spent 26

Advice 1. Align ambition and capabilities 2. Take lead over
supplier 3. Prioritize moving maintenance and operations to line 4. Organize fallback scenario 27

Example 3: Funding Education • Distribute €32B/year over all Dutch
educational institutions • Modernize current .NET systems § Target: Rule-based platform + Java • 2019-2024, budget €18M § €12M spent in 2022 • Assessments in 2021 and 2023 28

Advice • Professionalize culture: § Governance & ﬁnance § Development
& maintenance • Terminate use of (niche) rule- based platform • Invest in current .NET systems to safeguard continuity • Plan for stepwise modernization 29 hFps://www.adviescollegeicFoetsing.nl/onderzoeken/documenten/publicaGes/2023/09/25/bit -advies-doorontwikkelen-applicaGelandschap-bekosGging-2

Recurring Advice Based on manual analysis of reports: 1. Reduce
risk: Make it smaller 2. Articulate needs 3. Strengthen governance 4. Define mitigation measures 5. Invest in own capabilities 30 Egon Schiele, Leutnant Heinrich Wagner, Wikipedia

Intermezzo: The US Dept of the Treasury • Handling $5.45
trillion in federal payments • Managing U.S. government debt • Collecting federal taxes (via Internal Revenue Service IRS) • Enforcing tax laws • Currency circulation • Oversight on ﬁnancial sector 31 Alexander Hamilton

IRS IT in 2023 Inflation Reduction Act: • 10 year,
$80 billion extra for IRS • Become a “digital first” tax collector • Reduce $7 trillion of uncollected taxes • Allocate $4.8 billion for IT modernization • Grow IRS from 80,000 to 165,000 employees 32 Janet Yellen

US Treasury IT in 2025 • DOGE team gained access
to Treasury’s § payment data (privacy) § payment systems code (integrity) • Intent to ﬁre half of 100,000 IRS employees 33 “But the most alarming aspect [is …] the systema7c dismantling of security measures that would detect and prevent misuse […] by removing the career oﬃcials in charge of those security measures and replacing them with inexperienced operators.“ Bruce Schneier, Foreign Policy, Feb 2025

End of US Intermezzo: Thoughts and Prayers • Support civil
servants who are keeping critical systems up and running • Train our students to act in accordance with codes of ethics and professional conduct (e.g. from Assoc. Computing Machinery ACM) • Resist the illusion that a posse of whiz kids can quickly solve deeply complex (IT) problems • Undermining government IT can bring a nation down 34 Timothy Snyder & Nora Krug. On Tyranny. Twenty Lessons from the TwenGeth Century. Graphic EdiGon, 2021

AI to the Rescue? • Capabilities of foundation models are
mind blowing • This will affect many aspects of society, including government IT • The ambitions of Artificial General Intelligence reach even higher • Will (generative) AI solve our problems? 35 Sam Altman (img src Wikipedia)

Nothing Beats Good Data • The best AI investment may
be in your data • Cherish your data, keep it clean • Diffusion and auto-encoders can add superpowers to it • Good data forms basis for good synthetic data • Good data = open data 36 Florence NighJngale, 1855 “Causes of mortality in the army in the east”

I ❤ Large Language Models (for Code) 37 When to
invoke code compleJon? (AIware, 2024, 🏆) Benchmark for long-context tasks, with JetBrains 600,000 actual code completions, ICSE 2024 Summarizing binaries, SANER 2023 MemorizaJon in LLMs, ICSE 2024

LLMs for Developers • Coding is language-centered • Development is
data rich • Fertile ground for large language models: § Powerful tools in hands of skilled developers § Promise of better code with less effort 38

Coding Productivity? • Improving productivity is of little use …
… if you don’t know what you want • Beneﬁts come when combined with shorter feedback cycles • More work for the “problem owner!” § Give feedback, take into production, manage organizational change • Done well, the beneﬁts then become: § Value delivered earlier; lower costs of failure 39

Language Models for the Government? • Government and its (IT)
projects are document-heavy • LLMs can easily generate plausible-sounding text no one takes responsibility for § This will slow down rather than speed up government (IT) projects • What’s needed instead: § Clear sense of direction and crisp documentation § Consensus building and accountability 40 Img: JuBi, Wikipedia

Focused AI Timnit Gebru (SaTML 2023): “We should build smaller-scale
systems (that are well-scoped and well-deﬁned) for which we can provide speciﬁcations for expected behavior, tolerance and safety protocols.” 41 Timnit Gebru, img Wikipedia https://x.com/NicolasPapernot/status/1623885641380425728

Modernization • IT systems have life spans of 30+ years
• Constant change in user expectations, system landscape, technology, … • Large-scale “modernizations” high risk • Council recommendations: 1. Invest in understanding current landscape 2. Modernize in small steps 3. Avoid by pro-active improvements 42

Software Testing • The systematic selection of scenarios from the
inﬁnite executions domain • So that their pass ratio informs progress monitoring and decisions to ship changes to production • Given that the execution of a scenario is easier to grasp than a structural view on the system 43

The Namur-Francqui Explainability Lectures • A new lens on software
engineering • Build on software testing theory • A ‘test case’ is an ‘executable example’ • Students work together, collecting ‘why-questions’ and explanations • Exploring systematic approaches to ‘explanation generation’ 44

Open Learning • Learn from each other § All students
can see all results • Build on your own expertise § Students take their own knowledge as starting point • Set your own goals § Students have freedom to set their own direction • Engage with the outside world § Students reach out to practice and practitioners 45 Erik Duval

Dialectic Learning in Software Engineering 1. Study / internalize existing
software engineering theories and approaches 2. Do it! Apply these in development activities in realistic setting 3. Confront the two with each other § How does this theory really work? § Does this theory apply to my system? Why? Why not? Thesis: Theory Anti-Thesis: Practice Synthesis: Understanding 46

System Under Study: CryptPad • Open source suite of ofﬁce
applications that runs in the browser • All data end-to-end encrypted in the browser. • Server only receives encrypted data it cannot grasp • Sharing and collaborative editing without anyone watching [ EU-made, open source, private alternative to Google Docs ] 47 hcps://cryptpad.fr

Key Take Aways 1. Government digitalization is facing big challenges
that demand our attention 2. Well functioning IT systems require long term stable leadership and investments 3. (Generative) AI is no silver bullet: We need focused AI 4. SE research needs a stronger focus on process predictability and explainability 49

Explainable Software Engineering In the Public Sector Francqui Inaugural Lecture
Namur, March 2025 Arie van Deursen Delft University of Technology @[email protected] 50

Explainable Software Engineering in the Public ...

Explainable Software Engineering in the Public Sector

More Decks by Arie van Deursen

Other Decks in Technology

Featured

Transcript