Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Explainable Software Engineering

Explainable Software Engineering

Keynote delivered at the Bits & Chips Event held in Eindhoven, The Netherlands, on October 12, 2023.

In artificial intelligence (AI), it’s increasingly recognized that components that learn from data need to be explainable. In this talk, we take explainability one step further, using it as a lens to rethink the full software engineering life cycle. To that end, we consider explainability of both the software engineering process and the resulting software system. We use this to shed new light on bug fixing, the use of language models during coding, delay prediction during agile planning, and the digitalization of society. Based on this, we envision a future of software engineering in which explainability is a first-class citizen.

The research covered in the talk was partly co-sponsored by ING Bank, through AI for Fintech Research (AFR), an ICAI Lab (2020-2024). The presentation also served as the official announcement of AI4SE, a 5 year collaboration between JetBrains Research and TU Delft.

Relevant links and papers covered:

- Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. https://arxiv.org/abs/1706.07269

- Patrick Altmeyer, Arie van Deursen, Cynthia Liem. Explaining Black-Box Models through Counterfactuals. JuliaCon 2023. https://proceedings.juliacon.org/papers/10.21105/jcon.00130

- Patrick Altmeyer, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, Cynthia Liem. Endogenous Macrodynamics in Algorithmic Recourse. IEEE SaTML, 2023. https://openreview.net/pdf?id=-LFT2YicI9v

- Taija: Trustworthy Artificial Intelligence in Julia. https://github.com/JuliaTrustworthyAI

- Matthías Páll Gissurarson, Leonhard Applis, Annibale Panichella, Arie van Deursen, David Sands. PropR: property-based automatic program repair. ICSE 2022. https://dl.acm.org/doi/abs/10.1145/3510003.3510620

- Code4Me. https://code4me.me/

- The AI for Software Engineering (AI4SE) Lab. JetBrains & TU Delft, 2023. https://lp.jetbrains.com/research/ai-for-se/

- Elvan Kula, Eric Greuter, Arie van Deursen, Georgios Gousios; Dynamic Prediction of Delays in Software Projects using Delay Patterns and Bayesian Modeling. FSE 2023. https://arxiv.org/abs/2309.12449

- Elvan Kula, Eric Greuter, Arie van Deursen, Georgios Gousios; Factors Affecting On-Time Delivery in Large-Scale Agile Software Development. IEEE TSE 2022. https://ieeexplore.ieee.org/abstract/document/9503331

- Adviescollege ICT Toetsing, AcICT / Advisory Council on IT Assessment. 2023. https://www.adviescollegeicttoetsing.nl/

Arie van Deursen

October 12, 2023
Tweet

More Decks by Arie van Deursen

Other Decks in Technology

Transcript

  1. “Explanation in Artificial Intelligence: Insights from the Social Sciences” (Tim

    Miller, Artificial Intelligence, 2018) An explanation is an answer to a why-question 2
  2. Explanations are Contextual • Contrastive: compared to counterfactual alternative •

    Selective: focusing on relevant parts of full causal chain • Social: transferring knowledge, assuming prior knowledge (Tim Miller, Artificial Intelligence, 2018) 3
  3. Counterfactual Reasoning • Factual: Model denies loan • Counterfactual: Alternative

    inputs that would accept loan • Algorithmic recourse: Change of behavior to get desired outcome 4 Patrick Altmeyer
  4. A Library for Generating Counterfactuals • Possible, faithful, plausible, “close”

    to the factual, … • Gradient descent in feature space (with extra cost terms) • Rich library of Julia packages • Macro-effects after recourse adoption 5 https://github.com/JuliaTrustworthyAI Patrick Altmeyer IEEE SaTML, 2023 JuliaCon, 2022, 2023
  5. (Automated) Program Repair • Current (factual) code: yields undesired behavior

    • Imagined (counterfactual) code: should meet desired behavior! • Formulate as (evolutionary) search § Population of candidate fixes § Cross-over & mutation to transform fixes § Fitness function (test failure ratio) to guide selection 7 https://program-repair.org/ Elena @ One day at a time
  6. Searching for Fixes … for Haskell? • Strongly typed •

    Property-based tests besides unit tests • Rich compiler infrastructure Does this help in automated program repair? 8 Matti Gissurarson Leonhard Applis ICSE2022
  7. Typed Holes in Haskell Code • Compiler can suggest “valid

    hole fits” in relevant context • Compiler can be configured with “fit-finding” plugins 10
  8. PropR – Property-Based Repair Evolutionary search for fix candidates: 1.

    Check tests & properties 2. Find expressions “covered” by violations 3. Find places to punch holes (“perforation”) 4. Find / combine hole-fit candidates (custom compiler plugin) so that the fix meets ever more properties 11
  9. Does this Work? Or is it Overfitting? • Experiments on

    failing student programs • Fix proposed for 43% of programs • 30% of fixes are good (manual inspection): § => Overall fix rate of 13% § Tests pass, concise, natural, … • 70% of fixes are poor: § Overfit on tests; just make them pass § Humans can imagine (counterfactual) test cases that still would fail with proposed fix 12
  10. Code4Me: A plugin for AI-Assisted Completion • Amazing (coding) capabilities

    of large language models • Black box models • Plenty of success stories, but often self-reported (big tech) • What do devs really do with them? • Investigating this with Code4Me completion plugin (JB + VS) 13 Maliheh Izadi
  11. Code4Me Telemetry Data • Suggested insertion • Accepted insertion •

    Final insertion § Line after 30 seconds • Context (opt-in) • Model used InCoder (2022) UniXcoder (2022) CodeGPT (2021) 14
  12. Why Were (10,000) Completions Rejected? 55%: Model underperforms: § Variable,

    function, literal, … § Nr of parameters, types, … 20%: Model not trained for usage: § Mid-token, errors in / no context, … 8%: OK, but user knows better: § Overlap with right context, … 17%: Not Applicable: § No code: comment, latex, settings, … 18
  13. JetBrains & TU Delft • AI4SE: AI for Software Engineering

    • 2023-2028 • Five research tracks: § Validating (AI) generated code § Optimizing code language models § IDE-AI alignment § Run time information in the IDE § Programming education • ICAI Lab with 10 PhD candidates 20 https://jb.gg/ai4se Vladimir Kovalenko Maliheh Izadi
  14. Beyond Code: Agile at Scale • ING Bank, with 15,000

    IT staff • Self-organizing teams (5-9 developers) • Short iterations (1-4 weeks) • User stories, features, epics • Delivered in releases (2-6 months) • Quarterly planning of all releases 22 Elvan Kula
  15. Why is My Project Late? • What are factors affecting

    timely epic delivery? § Let’s ask! • How do these factors impact schedule deviation? § Let’s measure and model! 23 Elvan Kula IEEE TSE 2022
  16. Timely Epic Delivery: Perceived Factors • Survey 1: Which factors?

    • 289 responses • 25 factors; 5 dimensions • Survey 2: Factor importance? • 337 responses • Rated impact level per factor Factor top 10: 1. Requirements refinement 2. Task dependencies 3. Organizational alignment 4. Organizational politics 5. Geographic distribution 6. Technical dependencies 7. Agile maturity 8. Regular delivery 9. Team stability 10. Skills and Knowledge 24
  17. Measuring Delay: Balanced Relative Error • If actual delivery date

    after estimated date (”late”, pos%): • If actual delivery date before estimated date (“early”, neg%): • Collected BRE from 3,771 epics (273 teams), for 3 years 25
  18. 13 Predictor Variables • 35 metrics for 20 factors •

    13 metrics explain 67% of variation (MARS model) • Match with perception? § Underestimated: size § Agreed: dependencies, seniority, stability § Overestimated: geography, regular delivery § Agreed low/no effect: coverage, code smells, … 26
  19. Dynamic Delay Prediction • Delay knowledge increases as epic unfolds

    (in milestones) • Mobility literature: Delay adheres to patterns, which can be learned from clustering delay time series • Is epic delay subject to patterns? • Can patterns improve delay prediction? 27
  20. Epic Delay Patterns 28 Elvan Kula FSE 2023 Dataset: 4,040

    epics of at least 10 sprints from 270 teams, 2017--2022 % Epics in category: 36% 44% 14% 6%
  21. Explainable Digitalization? • Why-questions in government IT? • Explainable to

    whom? § voters, taxpayers, end-users, members of parliament, cabinet, policy makers, civil servants, developers, …? • Will digitalization strengthen or weaken our democracy? 30
  22. Assessment Framework in Nine Chapters 1. Business case, benefits, finance

    2. Project organization and ownership 3. Risk management and project dependencies 4. Alignment business processes and IT solution 5. Scope management 6. Architecture, functional feasibility, technical realizability 7. Realization and planning 8. Tendering aspects 9. Acceptance, production, and transition to line https://www.adviescollegeicttoetsing.nl/onze- werkwijze/documenten/publicaties/2021/12/01/toetskader-acict 31
  23. 100 Reports Since 2015 • Government is complex • Common

    risks § Cost and time overruns § Wishful thinking in governance § Knowledge deficit w.r.t. (IT) suppliers • Common advice § Keep it small and simple § Deliver measurable results early § Manage IT suppliers rigorously 32
  24. 33

  25. The Road Ahead • Recognize explainability as first class citizen

    in software engineering § In coding and beyond • Build carefully curated (public) software engineering data sets • Focus on domains with constrained counterfactuals • Thoroughly investigate how language models can deliver true value in SE 34
  26. 35 Explainable Software Engineering Arie van Deursen Delft University of

    Technology Bits & Chips Event, October 12, 2023 @[email protected] Radioactive; Marjane Satrapi
  27. Image Credits • Rembrandt van Rijn: De anatomische les (1632),

    Wikipedia • Malevich: Suprematism, Wikipedia. • One day at a time, Season 2. Elena, 2018. • Simon Greig, Square peg. Flickr, CC-BY-NC-SA • Fractal, Wikipedia • Red Stage Curtain, Sethoscope, Flickr, CC-BY-NC • Visual Inspection, Sanofi, Flickr, CC-BY-NC • “JetBrains huurt Terrace Tower”, www.jll.nl, September 2021 • Edmund Bass, Nederlandse Spoorwegen • Franz Kafka, Metamorphosis, Gyan Publishing House • Tweede Kamer, plenaire zaal • Malevich: Suprematism, Wikipedia. • Radioactive, directed by Marjane Satrapi 36