Slide 1

Slide 1 text

© Microsoft Corporation Hello Clippy! Lessons Learned from RSSEs Thomas Zimmermann Microsoft Research

Slide 2

Slide 2 text

© Microsoft Corporation University of Passau Saarland University University of Calgary Microsoft Research PhD Assistant Professor (2007-2008) Researcher (since 2008)

Slide 3

Slide 3 text

© Microsoft Corporation

Slide 4

Slide 4 text

© Microsoft Corporation

Slide 5

Slide 5 text

© Microsoft Corporation

Slide 6

Slide 6 text

© Microsoft Corporation

Slide 7

Slide 7 text

© Microsoft Corporation

Slide 8

Slide 8 text

© Microsoft Corporation Annotations for Risky Locations

Slide 9

Slide 9 text

© Microsoft Corporation "A recommendation system for software engineering is a software application that provides information items estimated to be valuable for a software engineering task in a given context." [Robillard, Walker, Zimmermann, 2009] B+

Slide 10

Slide 10 text

© Microsoft Corporation Three Things I Think I Know About Software and are important to RSSEs

Slide 11

Slide 11 text

© Microsoft Corporation software is diversity

Slide 12

Slide 12 text

© Microsoft Corporation © Microsoft Corporation Developer Tester Builder Dev. Lead Test Lead Manager people projects knowledge

Slide 13

Slide 13 text

© Microsoft Corporation Number of projects that are needed to cover the Ohloh universe with respect to seven dimensions (language, size, contributors, churn, commits, age, activity). Each point in the graph means that x projects can cover y percent of the universe. Meiyappan Nagappan, Thomas Zimmermann, Christian Bird: Diversity in software engineering research. ESEC/SIGSOFT FSE 2013: 466-476 people projects knowledge

Slide 14

Slide 14 text

© Microsoft Corporation people projects knowledge Build tool support for frequently needed knowledge Frequency Knowledge

Slide 15

Slide 15 text

© Microsoft Corporation one size does not fit all

Slide 16

Slide 16 text

© Microsoft Corporation developers are smart

Slide 17

Slide 17 text

© Microsoft Corporation and software is complex

Slide 18

Slide 18 text

© Microsoft Corporation

Slide 19

Slide 19 text

© Microsoft Corporation My wish list: RSSEs for software analytics

Slide 20

Slide 20 text

© Microsoft Corporation analytics is the use of analysis, data, and systematic reasoning to make decisions. Definition by Thomas H. Davenport, Jeanne G. Harris Analytics at Work – Smarter Decisions, Better Results software analytics is analytics on software data

Slide 21

Slide 21 text

© Microsoft Corporation

Slide 22

Slide 22 text

© Microsoft Corporation history of software analytics Tim Menzies, Thomas Zimmermann: Software Analytics: So What? IEEE Software 30(4): 31-37 (2013)

Slide 23

Slide 23 text

© Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing Data

Slide 24

Slide 24 text

© Microsoft Corporation Sharing Insights Sharing Insights Sharing Methods

Slide 25

Slide 25 text

© Microsoft Corporation Example: Branch Analytics Christian Bird, Thomas Zimmermann: Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012: 45 Emad Shihab, Christian Bird, Thomas Zimmermann: The effect of branching strategies on software quality. ESEM 2012: 301-310 Christian Bird, Thomas Zimmermann, Alex Teterev: A theory of branches as goals and virtual teams. CHASE 2011: 53-56

Slide 26

Slide 26 text

© Microsoft Corporation

Slide 27

Slide 27 text

© Microsoft Corporation main Branches at Microsoft

Slide 28

Slide 28 text

© Microsoft Corporation main networking multimedia Branches at Microsoft

Slide 29

Slide 29 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks

Slide 30

Slide 30 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks integration

Slide 31

Slide 31 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks integration integration

Slide 32

Slide 32 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks integration integration

Slide 33

Slide 33 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks Process overhead Time delay integration integration

Slide 34

Slide 34 text

© Microsoft Corporation Code Flow for a Single File Blue nodes are edits to the file Orange nodes are move operations

Slide 35

Slide 35 text

© Microsoft Corporation Branch Decisions How do we coordinate parallel development? How do we structure the branch hierarchy? Can we reduce the complexity of branching?

Slide 36

Slide 36 text

© Microsoft Corporation Branch Analytics Techniques: • Survey developers to understand problems with branching • Mine source control for relationship of teams and branches • Simulate benefits and cost of alternative branch structures Actions/Tools: • Alert stakeholders about possible conflicts • Recommend branch structure (delete, create, fold branches) • Perform semi-automatic branch refactoring

Slide 37

Slide 37 text

© Microsoft Corporation Which Branches Need Coordination? Compare all pairs of branches by file similarity and developer similarity. Dark areas mean many branch pairs in that area. Same files, but different team means potential problems Same files, but different team means potential problems Different Files Same Files Different Teams Same Teams

Slide 38

Slide 38 text

© Microsoft Corporation Assessing a Branch Simulate alternate branch structure to assess cost and benefit of individual branches • Cost: Average Delay Increase per Edit How much delay does a branch introduce into development? • Cost: Integrations per Edit on a Branch What is the integration/edit within a branch? • Benefit: Provided Isolation per Edit How many conflicts does a branch prevent per edit?

Slide 39

Slide 39 text

© Microsoft Corporation Simulating Removal of a Single Branch A B integration integration A B A B A Compare 1 with 4 to assess cost and benefit of branch B

Slide 40

Slide 40 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch 41

Slide 41

Slide 41 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch To release branch 42

Slide 42

Slide 42 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch Parent Branch Victim Branch Child Branch 43 Simulation (what-if)

Slide 43

Slide 43 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch faster code flow Parent Branch Victim Branch Child Branch 44 Simulation (what-if)

Slide 44

Slide 44 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch faster code flow unneeded integrations removed Parent Branch Victim Branch Child Branch 45 Simulation (what-if)

Slide 45

Slide 45 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch no longer isolated faster code flow unneeded integrations removed Parent Branch Victim Branch Child Branch no longer isolated no longer isolated no longer isolated no longer isolated 46 Simulation (what-if)

Slide 46

Slide 46 text

© Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit) Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch

Slide 47

Slide 47 text

© Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit) Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch If high-cost-low-benefit had been removed, changes would each have saved 8.9 days of delay and only introduced 0.04 additional conflicts.

Slide 48

Slide 48 text

© Microsoft Corporation Build tools for frequent questions Use data scientists for infrequent questions Why did I show you this? Make it easier for data scientist to build tools Frequency Questions

Slide 49

Slide 49 text

© Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists in Software Engineering. To appear ICSE 2014

Slide 50

Slide 50 text

© Microsoft Corporation Microsoft’s Top 10 Questions Essential Essential + Worthwhile How do users typically use my application? 80.0% 99.2% What parts of a software product are most used and/or loved by customers? 72.0% 98.5% How effective are the quality gates we run at checkin? 62.4% 96.6% How can we improve collaboration and sharing between teams? 54.5% 96.4% What are the best key performance indicators (KPIs) for monitoring services? 53.2% 93.6% What is the impact of a code change or requirements change to the project and its tests? 52.1% 94.0% What is the impact of tools on productivity? 50.5% 97.2% How do I avoid reinventing the wheel by sharing and/or searching for code? 50.0% 90.9% What are the common patterns of execution in my application? 48.7% 96.6% How well does test coverage correspond to actual code usage by our customers? 48.7% 92.0%

Slide 51

Slide 51 text

© Microsoft Corporation RSSE for Software Analytics Opportunities • Provide recommendations – What analysis method to use and when? • How to understand results from data? • How to measure success/insight? • Provide tools to transform manual empirical analysis into reusable analysis

Slide 52

Slide 52 text

© Microsoft Corporation Hello Clippy! Lessons Learned from RSSEs Thomas Zimmermann Microsoft Research © Microsoft Corporation © Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing Data © Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists in Software Engineering. To appear ICSE 2014 © Microsoft Corporation My wish list: RSSEs for software analytics

Slide 53

Slide 53 text

© Microsoft Corporation Thank you!