Slide 1

Slide 1 text

1 Explainable Software Engineering Arie van Deursen Delft University of Technology Bits & Chips Event, October 12, 2023 @[email protected]

Slide 2

Slide 2 text

“Explanation in Artificial Intelligence: Insights from the Social Sciences” (Tim Miller, Artificial Intelligence, 2018) An explanation is an answer to a why-question 2

Slide 3

Slide 3 text

Explanations are Contextual • Contrastive: compared to counterfactual alternative • Selective: focusing on relevant parts of full causal chain • Social: transferring knowledge, assuming prior knowledge (Tim Miller, Artificial Intelligence, 2018) 3

Slide 4

Slide 4 text

Counterfactual Reasoning • Factual: Model denies loan • Counterfactual: Alternative inputs that would accept loan • Algorithmic recourse: Change of behavior to get desired outcome 4 Patrick Altmeyer

Slide 5

Slide 5 text

A Library for Generating Counterfactuals • Possible, faithful, plausible, “close” to the factual, … • Gradient descent in feature space (with extra cost terms) • Rich library of Julia packages • Macro-effects after recourse adoption 5 https://github.com/JuliaTrustworthyAI Patrick Altmeyer IEEE SaTML, 2023 JuliaCon, 2022, 2023

Slide 6

Slide 6 text

6 Counterfactulals … in bug reports Factual: Buggy behavior Counterfactual: Expected behavior

Slide 7

Slide 7 text

(Automated) Program Repair • Current (factual) code: yields undesired behavior • Imagined (counterfactual) code: should meet desired behavior! • Formulate as (evolutionary) search § Population of candidate fixes § Cross-over & mutation to transform fixes § Fitness function (test failure ratio) to guide selection 7 https://program-repair.org/ Elena @ One day at a time

Slide 8

Slide 8 text

Searching for Fixes … for Haskell? • Strongly typed • Property-based tests besides unit tests • Rich compiler infrastructure Does this help in automated program repair? 8 Matti Gissurarson Leonhard Applis ICSE2022

Slide 9

Slide 9 text

Unit Tests vs Properties 9

Slide 10

Slide 10 text

Typed Holes in Haskell Code • Compiler can suggest “valid hole fits” in relevant context • Compiler can be configured with “fit-finding” plugins 10

Slide 11

Slide 11 text

PropR – Property-Based Repair Evolutionary search for fix candidates: 1. Check tests & properties 2. Find expressions “covered” by violations 3. Find places to punch holes (“perforation”) 4. Find / combine hole-fit candidates (custom compiler plugin) so that the fix meets ever more properties 11

Slide 12

Slide 12 text

Does this Work? Or is it Overfitting? • Experiments on failing student programs • Fix proposed for 43% of programs • 30% of fixes are good (manual inspection): § => Overall fix rate of 13% § Tests pass, concise, natural, … • 70% of fixes are poor: § Overfit on tests; just make them pass § Humans can imagine (counterfactual) test cases that still would fail with proposed fix 12

Slide 13

Slide 13 text

Code4Me: A plugin for AI-Assisted Completion • Amazing (coding) capabilities of large language models • Black box models • Plenty of success stories, but often self-reported (big tech) • What do devs really do with them? • Investigating this with Code4Me completion plugin (JB + VS) 13 Maliheh Izadi

Slide 14

Slide 14 text

Code4Me Telemetry Data • Suggested insertion • Accepted insertion • Final insertion § Line after 30 seconds • Context (opt-in) • Model used InCoder (2022) UniXcoder (2022) CodeGPT (2021) 14

Slide 15

Slide 15 text

After 12 Months and 2 Million Completions 15

Slide 16

Slide 16 text

Trigger Points Matter 16

Slide 17

Slide 17 text

Accepted Suggestions are Still Changed 17

Slide 18

Slide 18 text

Why Were (10,000) Completions Rejected? 55%: Model underperforms: § Variable, function, literal, … § Nr of parameters, types, … 20%: Model not trained for usage: § Mid-token, errors in / no context, … 8%: OK, but user knows better: § Overlap with right context, … 17%: Not Applicable: § No code: comment, latex, settings, … 18

Slide 19

Slide 19 text

[ Intermezzo ] Today: New Lab Announcement! 19

Slide 20

Slide 20 text

JetBrains & TU Delft • AI4SE: AI for Software Engineering • 2023-2028 • Five research tracks: § Validating (AI) generated code § Optimizing code language models § IDE-AI alignment § Run time information in the IDE § Programming education • ICAI Lab with 10 PhD candidates 20 https://jb.gg/ai4se Vladimir Kovalenko Maliheh Izadi

Slide 21

Slide 21 text

[ End of Intermezzo ] 21

Slide 22

Slide 22 text

Beyond Code: Agile at Scale • ING Bank, with 15,000 IT staff • Self-organizing teams (5-9 developers) • Short iterations (1-4 weeks) • User stories, features, epics • Delivered in releases (2-6 months) • Quarterly planning of all releases 22 Elvan Kula

Slide 23

Slide 23 text

Why is My Project Late? • What are factors affecting timely epic delivery? § Let’s ask! • How do these factors impact schedule deviation? § Let’s measure and model! 23 Elvan Kula IEEE TSE 2022

Slide 24

Slide 24 text

Timely Epic Delivery: Perceived Factors • Survey 1: Which factors? • 289 responses • 25 factors; 5 dimensions • Survey 2: Factor importance? • 337 responses • Rated impact level per factor Factor top 10: 1. Requirements refinement 2. Task dependencies 3. Organizational alignment 4. Organizational politics 5. Geographic distribution 6. Technical dependencies 7. Agile maturity 8. Regular delivery 9. Team stability 10. Skills and Knowledge 24

Slide 25

Slide 25 text

Measuring Delay: Balanced Relative Error • If actual delivery date after estimated date (”late”, pos%): • If actual delivery date before estimated date (“early”, neg%): • Collected BRE from 3,771 epics (273 teams), for 3 years 25

Slide 26

Slide 26 text

13 Predictor Variables • 35 metrics for 20 factors • 13 metrics explain 67% of variation (MARS model) • Match with perception? § Underestimated: size § Agreed: dependencies, seniority, stability § Overestimated: geography, regular delivery § Agreed low/no effect: coverage, code smells, … 26

Slide 27

Slide 27 text

Dynamic Delay Prediction • Delay knowledge increases as epic unfolds (in milestones) • Mobility literature: Delay adheres to patterns, which can be learned from clustering delay time series • Is epic delay subject to patterns? • Can patterns improve delay prediction? 27

Slide 28

Slide 28 text

Epic Delay Patterns 28 Elvan Kula FSE 2023 Dataset: 4,040 epics of at least 10 sprints from 270 teams, 2017--2022 % Epics in category: 36% 44% 14% 6%

Slide 29

Slide 29 text

Delay Patterns Improve Delay Prediction 29

Slide 30

Slide 30 text

Explainable Digitalization? • Why-questions in government IT? • Explainable to whom? § voters, taxpayers, end-users, members of parliament, cabinet, policy makers, civil servants, developers, …? • Will digitalization strengthen or weaken our democracy? 30

Slide 31

Slide 31 text

Assessment Framework in Nine Chapters 1. Business case, benefits, finance 2. Project organization and ownership 3. Risk management and project dependencies 4. Alignment business processes and IT solution 5. Scope management 6. Architecture, functional feasibility, technical realizability 7. Realization and planning 8. Tendering aspects 9. Acceptance, production, and transition to line https://www.adviescollegeicttoetsing.nl/onze- werkwijze/documenten/publicaties/2021/12/01/toetskader-acict 31

Slide 32

Slide 32 text

100 Reports Since 2015 • Government is complex • Common risks § Cost and time overruns § Wishful thinking in governance § Knowledge deficit w.r.t. (IT) suppliers • Common advice § Keep it small and simple § Deliver measurable results early § Manage IT suppliers rigorously 32

Slide 33

Slide 33 text

33

Slide 34

Slide 34 text

The Road Ahead • Recognize explainability as first class citizen in software engineering § In coding and beyond • Build carefully curated (public) software engineering data sets • Focus on domains with constrained counterfactuals • Thoroughly investigate how language models can deliver true value in SE 34

Slide 35

Slide 35 text

35 Explainable Software Engineering Arie van Deursen Delft University of Technology Bits & Chips Event, October 12, 2023 @[email protected] Radioactive; Marjane Satrapi

Slide 36

Slide 36 text

Image Credits • Rembrandt van Rijn: De anatomische les (1632), Wikipedia • Malevich: Suprematism, Wikipedia. • One day at a time, Season 2. Elena, 2018. • Simon Greig, Square peg. Flickr, CC-BY-NC-SA • Fractal, Wikipedia • Red Stage Curtain, Sethoscope, Flickr, CC-BY-NC • Visual Inspection, Sanofi, Flickr, CC-BY-NC • “JetBrains huurt Terrace Tower”, www.jll.nl, September 2021 • Edmund Bass, Nederlandse Spoorwegen • Franz Kafka, Metamorphosis, Gyan Publishing House • Tweede Kamer, plenaire zaal • Malevich: Suprematism, Wikipedia. • Radioactive, directed by Marjane Satrapi 36