Slide 1

Slide 1 text

© Microsoft Corporation Analytics for Smarter Software Development Thomas Zimmermann, Microsoft Research, USA Joint work with Chris Bird, Nachi Nagappan and many others.

Slide 2

Slide 2 text

© Microsoft Corporation

Slide 3

Slide 3 text

© Microsoft Corporation 40 percent of major decisions are based not on facts, but on the manager’s gut. Accenture survey among 254 US managers in industry. http://newsroom.accenture.com/article_display.cfm?article_id=4777

Slide 4

Slide 4 text

© Microsoft Corporation analytics is the use of analysis, data, and systematic reasoning to make decisions. Definition by Thomas H. Davenport, Jeanne G. Harris Analytics at Work – Smarter Decisions, Better Results

Slide 5

Slide 5 text

© Microsoft Corporation history of software analytics Tim Menzies, Thomas Zimmermann: Software Analytics: So What? IEEE Software 30(4): 31-37 (2013)

Slide 6

Slide 6 text

© Microsoft Corporation

Slide 7

Slide 7 text

© Microsoft Corporation trinity of software analytics Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, Tao Xie: Software Analytics in Practice. IEEE Software 30(5): 30-37, September/October 2013. MSR Asia Software Analytics group: http://research.microsoft.com/en-us/groups/sa/

Slide 8

Slide 8 text

© Microsoft Corporation guidelines for analytics (1) The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining. Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine Learning Technologies in Software Engineering

Slide 9

Slide 9 text

© Microsoft Corporation guidelines for analytics (2) Be easy to use. People aren't always analysis experts. Be concise. People have little time. Measure many artifacts with many indicators. Identify important/unusual items automatically. Relate activity to features/areas. Focus on past & present over future. Recognize that developers and managers have different needs. Information Needs for Software Development Analytics. Ray Buse, Thomas Zimmermann. ICSE 2012 SEIP Track

Slide 10

Slide 10 text

© Microsoft Corporation © Microsoft Corporation Smart analytics © Microsoft Corporation Development analytics © Microsoft Corporation Usage analytics © Microsoft Corporation The future © Microsoft Corporation What’s next?

Slide 11

Slide 11 text

© Microsoft Corporation Smart analytics

Slide 12

Slide 12 text

© Microsoft Corporation

Slide 13

Slide 13 text

© Microsoft Corporation

Slide 14

Slide 14 text

© Microsoft Corporation Jack Bauer

Slide 15

Slide 15 text

© Microsoft Corporation Chloe O’Brian

Slide 16

Slide 16 text

© Microsoft Corporation

Slide 17

Slide 17 text

© Microsoft Corporation All he needed was a paper clip

Slide 18

Slide 18 text

© Microsoft Corporation smart analytics is actionable

Slide 19

Slide 19 text

© Microsoft Corporation smart analytics is real time

Slide 20

Slide 20 text

© Microsoft Corporation smart analytics is diversity

Slide 21

Slide 21 text

© Microsoft Corporation The Stakeholders The Tools The Questions

Slide 22

Slide 22 text

© Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists in Software Engineering. To appear ICSE 2014

Slide 23

Slide 23 text

© Microsoft Corporation Microsoft’s Top 10 Questions Essential Essential + Worthwhile How do users typically use my application? 80.0% 99.2% What parts of a software product are most used and/or loved by customers? 72.0% 98.5% How effective are the quality gates we run at checkin? 62.4% 96.6% How can we improve collaboration and sharing between teams? 54.5% 96.4% What are the best key performance indicators (KPIs) for monitoring services? 53.2% 93.6% What is the impact of a code change or requirements change to the project and its tests? 52.1% 94.0% What is the impact of tools on productivity? 50.5% 97.2% How do I avoid reinventing the wheel by sharing and/or searching for code? 50.0% 90.9% What are the common patterns of execution in my application? 48.7% 96.6% How well does test coverage correspond to actual code usage by our customers? 48.7% 92.0%

Slide 24

Slide 24 text

© Microsoft Corporation smart analytics is people

Slide 25

Slide 25 text

© Microsoft Corporation The Decider The Brain The Innovator Photo of MSA 2010 by Daniel M German ([email protected]) The Researcher

Slide 26

Slide 26 text

© Microsoft Corporation smart analytics is sharing

Slide 27

Slide 27 text

© Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing Data

Slide 28

Slide 28 text

© Microsoft Corporation Sharing Insights Sharing Insights Sharing Methods

Slide 29

Slide 29 text

© Microsoft Corporation Branch Analytics Christian Bird, Thomas Zimmermann: Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012: 45 Emad Shihab, Christian Bird, Thomas Zimmermann: The effect of branching strategies on software quality. ESEM 2012: 301-310 Christian Bird, Thomas Zimmermann, Alex Teterev: A theory of branches as goals and virtual teams. CHASE 2011: 53-56

Slide 30

Slide 30 text

© Microsoft Corporation

Slide 31

Slide 31 text

© Microsoft Corporation main Branches at Microsoft

Slide 32

Slide 32 text

© Microsoft Corporation main networking multimedia Branches at Microsoft

Slide 33

Slide 33 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks

Slide 34

Slide 34 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks integration

Slide 35

Slide 35 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks integration integration

Slide 36

Slide 36 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks integration integration

Slide 37

Slide 37 text

© Microsoft Corporation main networking multimedia Branches at Microsoft Changes are isolated => Less build and test breaks Process overhead Time delay integration integration

Slide 38

Slide 38 text

© Microsoft Corporation Code Flow for a Single File Blue nodes are edits to the file Orange nodes are move operations

Slide 39

Slide 39 text

© Microsoft Corporation Branch Decisions How do we coordinate parallel development? How do we structure the branch hierarchy? Can we reduce the complexity of branching?

Slide 40

Slide 40 text

© Microsoft Corporation Branch Analytics Techniques: • Survey developers to understand problems with branching • Mine source control for relationship of teams and branches • Simulate benefits and cost of alternative branch structures Actions/Tools: • Alert stakeholders about possible conflicts • Recommend branch structure (delete, create, fold branches) • Perform semi-automatic branch refactoring

Slide 41

Slide 41 text

© Microsoft Corporation Assessing a Branch Simulate alternate branch structure to assess cost and benefit of individual branches • Cost: Average Delay Increase per Edit How much delay does a branch introduce into development? • Cost: Integrations per Edit on a Branch What is the integration/edit within a branch? • Benefit: Provided Isolation per Edit How many conflicts does a branch prevent per edit?

Slide 42

Slide 42 text

© Microsoft Corporation Simulating Removal of a Single Branch A B integration integration A B A B A Compare 1 with 4 to assess cost and benefit of branch B

Slide 43

Slide 43 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch 65

Slide 44

Slide 44 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch To release branch 66

Slide 45

Slide 45 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch Parent Branch Victim Branch Child Branch 67 Simulation (what-if)

Slide 46

Slide 46 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch faster code flow Parent Branch Victim Branch Child Branch 68 Simulation (what-if)

Slide 47

Slide 47 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch faster code flow unneeded integrations removed Parent Branch Victim Branch Child Branch 69 Simulation (what-if)

Slide 48

Slide 48 text

© Microsoft Corporation Parent Branch Victim Branch Child Branch no longer isolated faster code flow unneeded integrations removed Parent Branch Victim Branch Child Branch no longer isolated no longer isolated no longer isolated no longer isolated 70 Simulation (what-if)

Slide 49

Slide 49 text

© Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit) Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch

Slide 50

Slide 50 text

© Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit) Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch If high-cost-low-benefit had been removed, changes would each have saved 8.9 days of delay and only introduced 0.04 additional conflicts.

Slide 51

Slide 51 text

© Microsoft Corporation Skill in Halo Reach Jeff Huang, Thomas Zimmermann, Nachiappan Nagappan, Charles Harrison, Bruce C. Phillips: Mastering the art of war: how patterns of gameplay influence skill in Halo. CHI 2013: 695-704

Slide 52

Slide 52 text

© Microsoft Corporation

Slide 53

Slide 53 text

How do patterns of play affect players’ skill in Halo Reach? 5 Skill and Other Titles 6 Skill Changes and Retention 7 Mastery and Demographics 8 Predicting Skill 2 Play Intensity 3 Skill after Breaks 4 Skill before Breaks 1 General Statistics

Slide 54

Slide 54 text

The Cohort of Players The mean skill value µ for each player after each Team Slayer match µ ranges between 0 and 10, although 50% fall between 2.5 and 3.5 Initially µ = 3 for each player, stabilizing after a couple dozen matches TrueSkill in Team Slayer We looked at the cohort of players who started in the release week with complete set of gameplay for those players up to 7 months later (over 3 million players) 70 Person Survey about Player Experience

Slide 55

Slide 55 text

2 Play Intensity Telegraph operators gradually increase typing speed over time

Slide 56

Slide 56 text

2.1 2.3 2.5 2.7 2.9 3.1 0 10 20 30 40 50 60 70 80 90 100 mu Games Played So Far 2 Play Intensity Median skill typically increases slowly over time Skill

Slide 57

Slide 57 text

2 Play Intensity (Games per Week) 2.1 2.3 2.5 2.7 2.9 3.1 0 10 20 30 40 50 60 70 80 90 100 mu Games Played So Far 0 - 2 games / week [N=59164] 2 - 4 games / week [N=101448] 4 - 8 games / week [N=226161] 8 - 16 games / week [N=363832] 16 - 32 games / week [N=319579] 32 - 64 games / week [N=420258] 64 - 128 games / week [N=415793] 128 - 256 games / week [N=245725] 256+ games / week [N=115010] Median skill typically increases slowly over time Skill

Slide 58

Slide 58 text

2 Play Intensity (Games per Week) 2.1 2.3 2.5 2.7 2.9 3.1 0 10 20 30 40 50 60 70 80 90 100 mu Games Played So Far 0 - 2 games / week [N=59164] 2 - 4 games / week [N=101448] 4 - 8 games / week [N=226161] 8 - 16 games / week [N=363832] 16 - 32 games / week [N=319579] 32 - 64 games / week [N=420258] 64 - 128 games / week [N=415793] 128 - 256 games / week [N=245725] 256+ games / week [N=115010] Median skill typically increases slowly over time Skill

Slide 59

Slide 59 text

2 Play Intensity (Games per Week) 2.1 2.3 2.5 2.7 2.9 3.1 0 10 20 30 40 50 60 70 80 90 100 mu Games Played So Far 0 - 2 games / week [N=59164] 2 - 4 games / week [N=101448] 4 - 8 games / week [N=226161] 8 - 16 games / week [N=363832] 16 - 32 games / week [N=319579] 32 - 64 games / week [N=420258] 64 - 128 games / week [N=415793] 128 - 256 games / week [N=245725] 256+ games / week [N=115010] But players who play more overall eventually surpass those who play 4–8 games per week (not shown in chart) Players who play 4–8 games per week do best Median skill typically increases slowly over time Skill

Slide 60

Slide 60 text

3 Change in Skill Following a Break “In the most drastic scenario, you can lose up to 80 percent of your fitness level in as few as two weeks [of taking a break]…”

Slide 61

Slide 61 text

-0.03 -0.02 -0.01 0 0.01 0.02 0.03 0 5 10 15 20 25 30 35 40 45 50 Δmu Days of Break Next Game 2 Games Later 3 Games Later 4 Games Later 5 games later 10 games later 3 Change in Skill Following a Break Median skill slightly increases after each game played without breaks Longer breaks correlate with larger skill drops, but not linearly On average, it takes 8–10 games to regain skill lost after 30 day breaks Breaks of 1–2 days correlate in tiny drops in skill Change in Skill

Slide 62

Slide 62 text

Analysis of Skill Data Step 1: Select a population of players. For our Halo study, we selected a cohort of 3.2 million Halo Reach players on Xbox Live who started playing the game in its first week of release. Step 2: If necessary, sample the population of players and ensure that the sample is representative. In our study we used the complete population of players in this cohort, and our dataset had every match played by that population. Step 3: Divide the population into groups and plot the development of the dependent variable over time. For example, when plotting the players’ skill in the charts, we took the median skill at every point along the x-axis for each group in order to reduce the bias that would otherwise occur when using the mean. Step 4: Convert the time series into a symbolic representation to correlate with other factors, for example retention. Repeat steps 1–4 as needed for any other dependent variables of interest.

Slide 63

Slide 63 text

© Microsoft Corporation What’s next?

Slide 64

Slide 64 text

© Microsoft Corporation call to action

Slide 65

Slide 65 text

© Microsoft Corporation Data Analysis Patterns http://dapse.unbox.org/

Slide 66

Slide 66 text

© Microsoft Corporation

Slide 67

Slide 67 text

© Microsoft Corporation Analytics for Smarter Software Development Thomas Zimmermann, Microsoft Research, USA Joint work with Chris Bird, Nachi Nagappan and many others.

Slide 68

Slide 68 text

© Microsoft Corporation Analytics for Smarter Software Development Thomas Zimmermann, Microsoft Research, USA Joint work with Chris Bird, Nachi Nagappan and many others.

Slide 69

Slide 69 text

© Microsoft Corporation Thank you!