The Fellowship of Trust in AI

by Thomas Zimmermann

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

In the realm of software, an AI revolution is afoot, transforming how we create and consume our digital world.

Slide 3

Slide 3 text

AI-powered code editing with Copilot

Slide 4

Slide 4 text

Speed of innovation has increased

Slide 5

Slide 5 text

The rise of AI-powered apps and Copilots

Slide 6

Slide 6 text

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. ACM Queue, Volume 20, Issue 6, November/December 2022, pp 35–57 Developers reported spending less time on Stack Overflow due to Copilot’s code suggestions. Developers' roles shifted from primarily writing code to reviewing and understanding code suggested by AI. Copilot opened new learning opportunities like mastering new programming languages. Developers' trust plays a crucial role in adoption, as any unexpected behavior can significantly impact its usage.

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Central to this grand transformation is the vital role of trust in AI-based software tools.

Slide 9

Slide 9 text

Trust matters for tool adoption and tool usage So many tools... yet so few in use! Meanwhile, tools continue to emerge and evolve from traditional to AI-assisted tools Lack of trust in a tool can lead to suboptimal use and poor outcomes

Slide 10

Slide 10 text

Developers struggle build appropriate trust with AI-powered code generation tools ***

Slide 11

Slide 11 text

Developers struggle build appropriate trust with AI-powered code generation tools

Slide 12

Slide 12 text

Designing AI systems for responsible trust is important “Overreliance on AI occurs when users start accepting incorrect AI outputs. This can lead to issues and errors… An important goal of AI system design is to empower users to develop appropriate reliance on AI. ” (Passi and Vorvoreanu, 2022)

Slide 13

Slide 13 text

15 Developers’ calibrated trust in AI is a prerequisite for their safe and effective use of AI tools. Lack of trust hinders adoption Blind trust leads to overlooking mistakes

Slide 14

Slide 14 text

What factors influence developers' trust in software tools? Brittany Johnson, Christian Bird, Denae Ford, Nicole Forsgren, Thomas Zimmermann: Make Your Tools Sparkle with Trust: The PICSE Framework for Trust in Software Tools. ICSE-SEIP 2023: 409-419

Slide 15

Slide 15 text

What factors influence developers’ trust in software tools? Interviews with 18 practitioners Transcribe Code codebook (v1) Thematic Analysis ` ` ` ` ` ` P P Validate I I C C S S E E Define and discuss trust in tools and collaborators ... Validate Survey with 300+ practitioners

Slide 16

Slide 16 text

Slide 17

Slide 17 text

P I C S E ersonal nteraction ontrol ystem xpectations Factors related to engagement with tool Intrinsic, extrinsic, and social factors Factors related to control over usage Properties of the tool before and during use Community Source reputation Clear advantages Validation support Feedback loops Educational value Ownership Autonomy Workflow integration Ease of installation Polished presentation Safe and secure Correctness Consistency Performance Meeting expectations Transparent data practices Style matching Goal matching Meeting expectations developers built What factors influence developers’ trust in software tools?

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Personal Intrinsic, extrinsic, and social factors Community There's an accessible community of developers that use the tool. "That's probably recommended because over the community that's how it's preferable. Then you're leaning towards more into the more community-wide practices." - Software Dev Engineer Lead "Even if I trust the brand, nobody else is on there... I wouldn't download the app, the social media. If there is no network, why would I use it?" - Software Engineer I C S E nteraction ontrol ystem xpectations

Slide 20

Slide 20 text

Intrinsic, extrinsic, and social factors Source Reputation Personal Intrinsic, extrinsic, and social factors I C S E nteraction ontrol ystem xpectations The reputation of or familiarity with the individual, organization, or platform associated with introduction to the tool. "If a person that I personally trust a lot for example, a coworker that I work closely and that I have a lot of respect to, then, of course, that also carries weight. " - Software Engineer " I definitely get more excited about a Microsoft tool or product as opposed to a Google product or an Amazon product. " - Senior Software Engineer

Slide 21

Slide 21 text

Clear Advantages Personal Intrinsic, extrinsic, and social factors I C S E nteraction ontrol ystem xpectations The ability to see the benefits of using the tool, typically from use and validation by others. "When I tuned into that, it was a combination of seeing that and seeing how powerful it was and how easy it was...What am I doing? This looks great." - Systems Engineer "...while I’m in that car and AI Is doing the right thing, I’ll see, it actually stopped the right car. It actually identified that someone crossing the road and all those small nitpick details. Then that trust will build up and I can rely on AI okay." - Software Dev Engineer Lead

Slide 22

Slide 22 text

Source reputation (especially relevant for adoption) Introduce tools via trusted sources Requires knowledge of network (perhaps best for internal tools) Clear advantages (especially relevant for adoption) Provide tool demos and comparisons Create forums for showcasing new tools (particularly internally) Community (before and during use) Build and foster community around tool Make visible and accessible (common on GitHub) Using PICSE for building trust

Slide 23

Slide 23 text

Generally, similar priorities for both: Consistency and meeting expectations important Interaction factors generally less important However, for AI-assisted tools, developers: Prioritize validation support, autonomy, and source reputation (who built it) Deprioritize factors like goal matching and style matching Our study found several similarities between developer trust in AI-assisted tools and trust in their collaborators! Are there differences between trust in traditional tools and AI-assisted tools?

Slide 24

Slide 24 text

How can we design for trust in AI-powered code generation tools? Ruotong Wang, Ruijia Cheng, Denae Ford, Thomas Zimmermann: Investigating and Designing for Trust in AI-powered Code Generation Tools. FAccT 2024

Slide 25

Slide 25 text

The MATCH model for responsible trust Designing for Responsible Trust in AI Systems: A Communication Perspective, Q. Vera Liao & S. Shyam Sundar, FAccT 2022 Trustworthiness of AI systems can be communicated via system design. (Liao and Sundar, FAccT 2022) Trustworthiness of AI systems can be communicated via system design. (Liao and Sundar, FAccT 2022)

Slide 26

Slide 26 text

The MATCH model for responsible trust Designing for Responsible Trust in AI Systems: A Communication Perspective, Q. Vera Liao & S. Shyam Sundar, FAccT 2022 A trustworthiness cue is any information within a system that can cue, or contribute to, users’ trust judgements. Trust affordances are displayed properties of a system that engender trustworthiness cues. Trust heuristics are any rules of thumb applied by a user to associate a given cue with a judgment of trustworthiness. (Liao and Sundar, 2022)

Slide 27

Slide 27 text

Research Questions What do developers need to build appropriate trust with AI-code generation tools? What challenges do developers face in the current trust-building process? How can we design UX enhancements to support users building appropriate trust? Study 1: Experience sampling + Debrief interview Study 2: Design probe + Interviews Understand notions of trust Explore potential design solutions

Slide 28

Slide 28 text

Study 1: Experience sampling + Debrief interview Procedures A week of collecting significant moments of using Copilot via screenshots and short descriptions Prompt through Microsoft Teams: When you are appreciative of, frustrated by, or hesitant/uncertain to use the code generation tool Participants Randomly sampled 1500 internal developers + Interns Teams Channel 17 participants with various levels of programming experience and experience with AI-powered code generation tools Example of an experience entry

Slide 29

Slide 29 text

Finding 1: Developers’ information needs in building appropriate trust Developers need to build reasonable expectations of the AI tool’s ability and risks to build appropriate trust • What benefits to expect when collaborating with AI • What use cases to use AI for • What are security and privacy implications of using AI “It comes back to learning what Copilot is suited for versus not suited for, just building the intuition. Once you have that intuition, you don’t put Copilot in to positions where you know it will fail...” (P13)

Slide 30

Slide 30 text

Developers want information about to what extent and in what way they can control and set preference for... • What the AI produces • When and how AI steps in • What code context AI uses “I don’t want Copilot to give me anything unless I type trigger.... It’s too much. It started as a co-pilot, but now it’s the pilot and I’m becoming the co-pilot.” (P8) Finding 1: Developers’ information needs in building appropriate trust

Slide 31

Slide 31 text

The evaluation of AI suggestions in each specific instance form the basis of developers’ trust perception to AI code generation tools. • How good the suggestion is • Why the suggestions are made Strategies to make sure that “the code is actually correct”: • Logically go through the problem • Validate by running the code • Write formal tests Finding 1: Developers’ information needs in building appropriate trust

Slide 32

Slide 32 text

Expectation of AI’s ability and risks • What benefits to expect when collaborating with AI • What use cases to use AI for • What security and privacy implication the AI brings Ways to control AI • What the AI produces • When and how AI steps in • What code context AI uses Quality and reasons of AI suggestions • How good the suggestion is • Why the suggestions are made Finding 1: Developers’ information needs in building appropriate trust

Slide 33

Slide 33 text

Finding 2: Challenges developers face in building appropriate trust Setting proper expectations • Bias from initial experience and experience with similar tools. • “It’s takes three good recommendations to build trust versus one bad recommendation to lose trust.” (P5) Controlling AI tools • Lack of guidance to harness AI • “I felt like a lot of the time I ended up just fighting it.” (P7) Inadequate support for evaluating individual AI suggestions • Lack of debugging support and cognitive load of reviewing • “The code reviews cost you more than the actually writing the code.” (P8)

Slide 34

Slide 34 text

Study 2: Design probe + Interviews Procedures Using three design probes, interview developers about affordance and trustworthiness cues that support building appropriate trust 1. Control mechanisms to set preference 2. Explanation of suggestions 3. Feedback analytics Participants 12 internal and external developers with varied experience in code generation tools, work experience, roles in team

Slide 35

Slide 35 text

Design recommendations for tool builders Empower users to build appropriate expectations by • Communicate the uses cases, and potential risks and benefits of the system • Design for evolving trust Offer affordances and guidance in customizing the system Provide signals for assessing quality of code suggestions

Slide 36

Slide 36 text

How do online communities affect developers’ trust in AI-powered tools? Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, Denae Ford: “It would work for me too”: How Online Communities Shape Software Developers' Trust in AI- Powered Code Generation Tools. To appear in ACM Transactions on Interactive Intelligent Systems

Slide 37

Slide 37 text

Why online communities? Yixuan Zhang, Nurul Suhaimi, Nutchanon Yongsatianchot, Joseph D Gaggiano, Miso Kim, Shivani A Patel, Yifan Sun, Stacy Marsella, Jacqueline Griffin, and Andrea G Parker. 2022. Shifting Trust: Examining How Trust and Distrust Emerge, Transform, and Collapse in COVID-19 Information Seeking. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 78, 1–21. https://doi.org/10.1145/3491102.3501889 Trust is shaped by people’s information-seeking and assessment practices through emerging information platforms.” (Zhang et al, 2022) Trust is shaped by people’s information-seeking and assessment practices through emerging information platforms.” (Zhang et al, 2022)

Slide 38

Slide 38 text

Extending the MATCH model for responsible trust System focused Online communities

Slide 39

Slide 39 text

Design Probe Develop mockup prototypes 11 developers Think out loud about prototypes; brainstorm new features Semi-structured interview 17 developer community participants Recruited for a mix of tools and platforms The role of online community in: ○ Expectation on AI ○ Use cases of AI ○ Vulnerable situations w/ AI ○ … How do online communities shape developers' trust in AI code generation tools? How can we design to facilitate trust building in AI using affordances of online communities? Research Questions

Slide 40

Slide 40 text

The code has been posted by other programmers, people voted on it... If others have used the solution and it worked, it gives you a little more faith. When unsure about AI suggestions, users go to online communities for evaluation. Code solutions in online communities are deemed as trustworthy because of: ● Transparent source ● Explicit evaluation & triangulation ● Credibility from identity signals “ Pathway #1: Community offers evaluation on AI suggestions

Slide 41

Slide 41 text

Engagement with specific experience shared by others helps users to develop: ● Reasonable expectation on AI capability ● Strategies of when to trust AI ● Empirical understanding of suggestions ● Awareness of broader implication of AI generated code I read a bunch of what people think of the outcome... [It] helps me make my own perception of whether it is something that is useful for me or not. If everyone has bad experience in the use cases that I care about, I won't trust it at all. Otherwise, I can know where to be careful and what to avoid in the future. “ Pathway #2: Users learn from others’ experience with AI

Slide 42

Slide 42 text

52 I once saw an interesting Copilot suggestion and want to try it myself. But I couldn’t get it even with the same prompt. I don’t know what their setup is. “ Challenges in effectively using online communities Despite the benefits of sharing specific experience, user sharings lacks: ● Project context & replicability ● Effective description of interaction with AI ● Diversity and relevance

Slide 43

Slide 43 text

The extended MATCH model with communities Design #2: Community curated experience Design #1: Community evaluation signals Community sensemaking Collective heuristics Online communities

Slide 44

Slide 44 text

Design Probes Design 1: Introducing community evaluation signals to the AI code generation experience

Slide 45

Slide 45 text

Copilot Community Analytics 578 Code snippets similar to this have been suggested to users in your organization 52% Accepted w/o editing 12% Rejected directly 36% Made edits 11 2 See similar suggestions in Copilot Community Search code snippet in:

Slide 46

Slide 46 text

Slide 47

Slide 47 text

Slide 48

Slide 48 text

578 Code snippets similar to this have been suggested to users in your organization 52% Accepted w/o editing 12% Rejected directly 36% Made edits 11 2 See user sharings in Community Search code snippet in: Copilot Community Analytics Identity/ Reputation signals

Slide 49

Slide 49 text

Design Probes: User Feedback Community Statistics: Helpful, objective metrics for users to decide how to trust AI suggestion Need more scaffolds for interpreting the numbers, e.g., user intention and rationales Design 1: Introducing community evaluation signals to the AI code generation experience

Slide 50

Slide 50 text

User Voting: Proactive way to indicate feedback Want to see the outcome of voting (e.g., customization) reflected in future AI suggestions Design Probes: User Feedback Design 1: Introducing community evaluation signals to the AI code generation experience

Slide 51

Slide 51 text

Identity Signals: Helpful to further interpret the statistics Want more relevance, e.g., expertise in specific tasks Need transparency on what data is collected and how the data will be used Design Probes: User Feedback Design 1: Introducing community evaluation signals to the AI code generation experience

Slide 52

Slide 52 text

Overall: Popup window can be distracting Need more seamless integration into programming workflow, e.g., preview and summary Design Probes: User Feedback Design 1: Introducing community evaluation signals to the AI code generation experience

Slide 53

Slide 53 text

Design 2: Developer community dedicated to specific experience with the AI code generation tool Design Probes

Slide 54

Slide 54 text

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 'use strict'; // Increase max listeners for event emitters require('events').EventEmitter.defaultMaxListeners = 100; const gulp = require('gulp'); const util = require('./build/lib/util'); const path = require('path'); const compilation = require('./build/lib/compilation'); // Fast compile for development time gulp.task('clean-client', util.rimraf('out')); gulp.task('compile-client', ['clean-client'], compilation.compileTask('out', false)); gulp.task('watch-client', ['clean-client'], compilation.watchTask('out', false)); // Full compile, including nls and inline sources in sourcemaps, for build gulp.task('clean-client-build', util.rimraf('out-build')); gulp.task('compile-client-build', ['clean-client-build'], compilation.compileTask('out-build', true)); gulp.task('watch-client-build', ['clean-client-build'], compilation.watchTask('out-build', true)); 12 4 Explore Copilot Community 8 2 3 3 COPILOT COMMUNITY Similar Experiences See how others in your organization interact with Copilot when getting similar suggestions as the one you got just now. Copilot Auto Recording:

Slide 55

Slide 55 text

Slide 56

Slide 56 text

Copilot Community Discovery Interesting suggestion by Copilot in TS 3 3 156 Views | 1 hour ago My review of Copilot for Ruby 97 Views | 3 hours ago 7 5 Tricks to prompt Copilot 97 Views | 3 hours ago 32 15 Using Copilot to implement a Web App 97 Views | 3 hours ago 7 9 Tut Cop 97 Vi Similar to Your Experiences Interesting suggestion by Copilot in TS 3 3 My review of Copilot for Ruby 3 3 Tricks to prompt Copilot 3 3 Using Copilot to implement a Web App 3 3 Tut Cop Search New Top: Forked Language Sentiment Topics My likes | My GitHub See how others in your organization interact with Copilot when getting similar suggestions as the ones you have gotten.

Slide 57

Slide 57 text

IDE side panel: Expansion on the community statistics Time consuming to watch the videos within a programming session Need a more efficient way to present AI interaction, e.g., code snippets linked to project Assurance of confidentiality in sharing Design Probes: User Feedback Design 2: Developer community dedicated to specific experience with the AI code generation tool

Slide 58

Slide 58 text

External community: Great for discovery and learning outside programming workflow Need more enriched content than a screen recording video, e.g., voice over, text-based tutorial More lightweight options to replication Design Probes: User Feedback Design 2: Developer community dedicated to specific experience with the AI code generation tool

Slide 59

Slide 59 text

Design Recommendations Dedicated user communities can help developers understand, adopt, and develop appropriate trust with code generation AI. The user community should offer: ● Scaffolds to share specific, authentic experience with AI ● Integration into users' workflow. ● Assistance to effectively utilize community content ● Assurance for privacy and confidentiality

Slide 60

Slide 60 text

MODELS is a pivotal fellowship in this epic journey, guiding us through the challenges and triumphs of the AI age.

Slide 61

Slide 61 text

#1 AI for the entire software lifecycle GitHub Copilot was focused on code editing within the IDE. Software creation is more than writing code. Huge opportunity to apply AI to the entire software lifecycle, including modeling of software. The ultimate “shift left”? (AI for MODELS)

Slide 62

Slide 62 text

#2 Help people build AI-powered software Future software will be AI-powered (“AIware”). How can we model, build, test, and deploy these AIware systems in a scalable and in a disciplined way? Important to avoid “AI debt”. How can we model the architecture of AIware systems. Explainability, Validation, and Verification of AIware systems. (MODELS for AI)

Slide 63

Slide 63 text

AIware is democratizing software creation

Slide 64

Slide 64 text

New AI-focused conferences in SE “Software for all and by all” is the future of humanity.

Slide 65

Slide 65 text

#3 Provide great human-AI interaction Important to figure out and model how humans will interact with AI system. Design an experience that makes the interaction seamless. Consider HCI from the beginning. Systems that adapt and respond dynamically to user preferences.

Slide 66

Slide 66 text

Mixed Reality Environments

Slide 67

Slide 67 text

#4 Leverage AI for software science Huge potential for AI to be used in research design, data analysis. Great brainstorming partner. But keep in mind: AI isn't perfect, so people need to vet suggestions. Role of research is changing given the rapid speed of innovation. The output and artifacts of the scientific process are changing. Can we apply model-driven techniques?

Slide 68

Slide 68 text

Can GPT-4 Summarize Papers as Cartoons? Yes! :-) Can GPT-4 Replicate Empirical Software Engineering Research? Jenny T. Liang, Carmen Badea, Christian Bird, Robert DeLine, Denae Ford, Nicole Forsgren, Thomas Zimmermann PACMSE (FSE) 2024. AI-generated images may be incorrect. None of the authors wore a lab coat during this research. :-)

Slide 69

Slide 69 text

#5 Apply AI in a responsible way How to design and build software systems using AI in a responsible, ethical way that users can trust and do not negatively affect society? What mechanism and regulations do we need to oversee AI systems? How can we model and verify AI governance and compliance? How about societal impacts, ethical considerations, and human factors?