Software Analytics = Sharing Information

© Microsoft Corporation Software Analytics = Sharing Information Thomas Zimmermann
Microsoft Research, USA

© Microsoft Corporation 40 percent of major decisions are based
not on facts, but on the manager’s gut. Accenture survey among 254 US managers in industry. http://newsroom.accenture.com/article_display.cfm?article_id=4777

© Microsoft Corporation analytics is the use of analysis, data,
and systematic reasoning to make decisions. Definition by Thomas H. Davenport, Jeanne G. Harris Analytics at Work – Smarter Decisions, Better Results

© Microsoft Corporation game analytics Halo heat maps Free to
play

© Microsoft Corporation history of software analytics Tim Menzies, Thomas
Zimmermann: Software Analytics: So What? IEEE Software 30(4): 31-37 (2013)

© Microsoft Corporation the many names software intelligence software analytics
software development analytics analytics for software development empirical software engineering mining software repositories

© Microsoft Corporation Ahmed E. Hassan, Tao Xie: Software intelligence:
the future of mining software engineering data. FoSER 2010: 161-166 [Software Intelligence] offers software practitioners (not just developers) up-to-date and pertinent information to support their daily decision-making processes. […] Raymond P. L. Buse, Thomas Zimmermann: Analytics for software development. FoSER 2010: 77-80 The idea of analytics is to leverage potentially large amounts of data into real and actionable insights. Dongmei Zhang, Yingnong Dang, Jian- Guang Lou, Shi Han, Haidong Zhang, and Tao Xie, Software Analytics as a Learning Case in Practice: Approaches and Experiences. MALETS 2011 Software analytics is to enable software practitioners1 to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. 1 Software practitioners typically include software developers, testers, usability engineers, and managers, etc. Raymond P. L. Buse, Thomas Zimmermann: Information needs for software development analytics. ICSE 2012: 987-996 Software development analytics […] empower[s] software development teams to independently gain and share insight from their data without relying on a separate entity. Tim Menzies, Thomas Zimmermann: Software Analytics: So What? IEEE Software 30(4): 31-37 (2013) Software analytics is analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions. Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, Tao Xie: Software Analytics in Practice. IEEE Software 30(5): 30-37 (2013) With software analytics, software practitioners explore and analyze data to obtain insightful, actionable information for tasks regarding software development, systems, and users.

© Microsoft Corporation trinity of software analytics Dongmei Zhang, Shi
Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, Tao Xie: Software Analytics in Practice. IEEE Software 30(5): 30-37, September/October 2013. MSR Asia Software Analytics group: http://research.microsoft.com/en-us/groups/sa/

© Microsoft Corporation inductive engineering The Inductive Software Engineering Manifesto:
Principles for Industrial Data Mining. Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine Learning Technologies in Software Engineering

© Microsoft Corporation guidelines for analytics Be easy to use.
People aren't always analysis experts. Be concise. People have little time. Measure many artifacts with many indicators. Identify important/unusual items automatically. Relate activity to features/areas. Focus on past & present over future. Recognize that developers and managers have different needs. Information Needs for Software Development Analytics. Ray Buse, Thomas Zimmermann. ICSE 2012 SEIP Track

© Microsoft Corporation Information Needs for Software Development Analytics. Ray
Buse, Thomas Zimmermann. ICSE 2012 SEIP Track Description Insight Relevant Techniques Summarization Search for important or unusual factors to associated with a time range. Characterize events, understand why they happened. Topic analysis, NLP Alerts (& Correlations) Continuous search for unusual changes or relationships in variables Notice important events. Statistics, Repeated measures Forecasting Search for and predict unusual events in the future based on current trends. Anticipate events. Extrapolation, Statistics Trends How is an artifact changing? Understand the direction of the project. Regression analysis Overlays What artifacts account for current activity? Understand the relationships between artifacts. Cluster analysis, repository mining Goals How are features/artifacts changing in the context of completion or some other goal? Assistance for planning Root-cause analysis Modeling Compares the abstract history of similar artifacts. Identify important factors in history. Learn from previous projects. Machine learning Benchmarking Identify vectors of similarity/difference across artifacts. Assistance for resource allocation and many other decisions Statistics Simulation Simulate changes based on other artifact models. Assistance for general decisions What-if? analysis

© Microsoft Corporation Researcher Developer Tester Dev. Lead Test Lead
Manager stakeholders tools questions

© Microsoft Corporation Measurements Surveys Benchmarking Qualitative Analysis Clustering Prediction
What-if analysis Segmenting Multivariate Analysis Interviews stakeholders tools questions

© Microsoft Corporation Build tools for frequent questions Use data
scientists for infrequent questions Frequency Questions stakeholders tools questions

© Microsoft Corporation Percentages Question Category Essential Worthwhile+ Unw 
Q27 How do users typically use my application? DP 80.0% 99.2% 0.8  Q18 What parts of a software product are most used and/or loved by customers? CR 72.0% 98.5% 0.0  Q50 How effective are the quality gates we run at checkin? DP 62.4% 96.6% 0.8  Q115 How can we improve collaboration and sharing between teams? TC 54.5% 96.4% 0.0  Q86 What are best key performance indicators (KPIs) for monitoring services? SVC 53.2% 93.6% 0.9  Q40 What is the impact of a code change or requirements change to the project and tests? DP 52.1% 94.0% 0.0  Q74 What is the impact of tools on productivity? PROD 50.5% 97.2% 0.9  Q84 How do I avoid reinventing the wheel by sharing and/or searching for code? RSC 50.0% 90.9% 0.9  Q28 What are the common patterns of execution in my application? DP 48.7% 96.6% 0.8  Q66 How well does test coverage correspond to actual code usage by our customers? EQ 48.7% 92.0% 0.0  Q42 What tools can help us measure and estimate the risk associated with code changes? DP 47.8% 92.2% 0.0  Q59 What are effective metrics for ship quality? EQ 47.8% 96.5% 1.7  Q100 How much do design changes cost us and how can we reduce their risk? SL 46.6% 94.8% 0.8  Q19 What are the best ways to change a product's features without losing customers? CR 46.2% 92.3% 1.5  Q131 Which test strategies find the most impactful bugs (e.g., assertions, in-circuit testing, A/B testing)? TP 44.5% 91.8% 0.9  Q83 When should I write code from scratch vs. reuse legacy code? RSC 44.5% 84.5% 3.6 Q1 What is the impact and/or cost of findings bugs at a certain stage in the development cycle? BUG 43.1% 87.9% 2.5  Q92 What is the tradeoff between releasing more features or releasing more often? SVC 42.5% 79.6% 0.0  Q2 What kinds of mistakes do developers make in their software? Which ones are the most common? BUG 41.7% 98.3% 0.0  Q25 How important is a particular requirement? CR 41.7% 87.4% 2.3  Q60 How should we use metrics to help us decide when a feature is good enough to release (or poor enough to cancel)? EQ 41.1% 90.2% 3.5  Q17 What is the best way to collect customer feedback? CR 39.8% 93.0% 1.5  Q3 In what places in their software code do developers make the most mistakes? BUG 35.0% 94.0% 0.0 What kinds of problems happen because there is too much software process? © Microsoft Corporation Analyze This! 145 Questions for Data Scientists in Software Engineering. Andrew Begel, Thomas Zimmermann.

© Microsoft Corporation  Customer  Practices and processes 
Product quality Analyze This! 145 Questions for Data Scientists in Software Engineering. Andrew Begel, Thomas Zimmermann.

© Microsoft Corporation Percentages Question Category Essential Worthwhile+ Unwise Q72
Which individual measures correlate with employee productivity (e.g., employee age, tenure, engineering skills, education, promotion velocity, IQ)? PROD 7.3% 44.5% 25.5% Q71 Which coding measures correlate with employee productivity (e.g., lines of code, time it takes to build the software, a particular tool set, pair programming, number of hours of coding per day, language)? PROD 15.6% 56.9% 22.0% Q75 What metrics can be used to compare employees? PROD 19.4% 67.6% 21.3% Q70 How can we measure the productivity of a Microsoft employee? PROD 19.1% 70.9% 20.9% Q6 Is the number of bugs a good measure of developer effectiveness? BUG 16.4% 54.3% 17.2% Q128 Can I generate 100% test coverage? TP 15.3% 44.1% 14.4% Q113 Who should be in charge of creating and maintaining a consistent company-wide software process and tool chain? PROC 21.9% 55.3% 12.3% Q112 What are the benefits of a consistent, company-wide software process and tool chain? PROC 25.2% 78.3% 10.4% Q34 When are code comments worth the effort to write them? DP 7.9% 41.2% 9.6% Q24 How much time and money does it cost to add customer input into your design? CR 15.9% 68.2% 8.3% Analyze This! 145 Questions for Data Scientists in Software Engineering. Andrew Begel, Thomas Zimmermann. not every question is “wise”

© Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing
Data

© Microsoft Corporation Defect prediction • Learn a prediction model
from historic data • Predict defects for the same project • Hundreds of prediction models exist • Models work fairly well with precision and recall of up to 80%. Predictor Precision Recall Pre-Release Bugs 73.80% 62.90% Test Coverage 83.80% 54.40% Dependencies 74.40% 69.90% Code Complexity 79.30% 66.00% Code Churn 78.60% 79.90% Org. Structure 86.20% 84.00% From: N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality. ICSE 2008.

© Microsoft Corporation Why cross-project prediction? Some projects do have
not enough data to train prediction models or the data is of poor quality New projects do have no data yet Can such projects use models from other projects? (=cross-project prediction) Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, Brendan Murphy: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. ESEC/SIGSOFT FSE 2009: 91-100

© Microsoft Corporation A first experiment: Firefox and IE Firefox
can predict defects in IE. But IE cannot predict Firefox. WHY? precision=0.76; recall=0.88 precision=0.54; recall=0.04 Firefox Internet Explorer

© Microsoft Corporation Sharing models Sharing models does not always
work. In what situations does sharing models work?

© Microsoft Corporation Skill in Halo Reach Jeff Huang, Thomas
Zimmermann, Nachiappan Nagappan, Charles Harrison, Bruce C. Phillips: Mastering the art of war: how patterns of gameplay influence skill in Halo. CHI 2013: 695-704

How do patterns of play affect players’ skill in Halo
Reach? 5 Skill and Other Titles 6 Skill Changes and Retention 7 Mastery and Demographics 8 Predicting Skill 2 Play Intensity 3 Skill after Breaks 4 Skill before Breaks 1 General Statistics

The Cohort of Players The mean skill value µ for
each player after each Team Slayer match µ ranges between 0 and 10, although 50% fall between 2.5 and 3.5 Initially µ = 3 for each player, stabilizing after a couple dozen matches TrueSkill in Team Slayer We looked at the cohort of players who started in the release week with complete set of gameplay for those players up to 7 months later (over 3 million players) 70 Person Survey about Player Experience

© Microsoft Corporation Analysis of Skill Data Step 1: Select
a population of players. For our Halo study, we selected a cohort of 3.2 million Halo Reach players on Xbox Live who started playing the game in its first week of release. Step 2: If necessary, sample the population of players and ensure that the sample is representative. In our study we used the complete population of players in this cohort, and our dataset had every match played by that population. Step 3: Divide the population into groups and plot the development of the dependent variable over time. For example, when plotting the players’ skill in the charts, we took the median skill at every point along the x-axis for each group in order to reduce the bias that would otherwise occur when using the mean. Step 4: Convert the time series into a symbolic representation to correlate with other factors, for example retention. Repeat steps 1–4 as needed for any other dependent variables of interest.

2 Play Intensity Telegraph operators gradually increase typing speed over
time

2.1 2.3 2.5 2.7 2.9 3.1 0 10 20 30
40 50 60 70 80 90 100 mu Games Played So Far 2 Play Intensity Median skill typically increases slowly over time

2 Play Intensity (Games per Week) 2.1 2.3 2.5 2.7
2.9 3.1 0 10 20 30 40 50 60 70 80 90 100 mu Games Played So Far 0 - 2 games / week [N=59164] 2 - 4 games / week [N=101448] 4 - 8 games / week [N=226161] 8 - 16 games / week [N=363832] 16 - 32 games / week [N=319579] 32 - 64 games / week [N=420258] 64 - 128 games / week [N=415793] 128 - 256 games / week [N=245725] 256+ games / week [N=115010] But players who play more overall eventually surpass those who play 4–8 games per week (not shown in chart) Players who play 4–8 games per week do best Median skill typically increases slowly over time

3 Change in Skill Following a Break “In the most
drastic scenario, you can lose up to 80 percent of your fitness level in as few as two weeks [of taking a break]…”

-0.03 -0.02 -0.01 0 0.01 0.02 0.03 0 5 10
15 20 25 30 35 40 45 50 Δmu Days of Break Next Game 2 Games Later 3 Games Later 4 Games Later 5 games later 10 games later 3 Change in Skill Following a Break Median skill slightly increases after each game played without breaks Longer breaks correlate with larger skill drops, but not linearly On average, it takes 8–10 games to regain skill lost after 30 day breaks Breaks of 1–2 days correlate in tiny drops in skill

6 Skill Changes and Retention SAX (Symbolic Aggregate approXimation) discretizes
time series into a symbolic representation

Time-series of skill measured for first 100 games Most common
pattern is steady improvement of skill Next most common pattern is a steady decline in skill 6 Skill Changes and Retention Pattern Frequency Total Games 61791 217 45814 252 36320 257 27290 219 22759 216 22452 253 20659 260 20633 222 19858 247 19292 216 17573 219 17454 245 17389 260 15670 215 13692 236 12516 239

Time-series of skill measured for first 100 games Most common
pattern is steady improvement of skill Next most common pattern is a steady decline in skill Improving players actually end up playing fewer games than players with declining skill Pattern Frequency Total Games 61791 217 45814 252 36320 257 27290 219 22759 216 22452 253 20659 260 20633 222 19858 247 19292 216 17573 219 17454 245 17389 260 15670 215 13692 236 12516 239 6 Skill Changes and Retention

© Microsoft Corporation Social behavior in a Shooter game Sauvik
Das, Thomas Zimmermann, Nachiappan Nagappan, Bruce Phillips, Chuck Harrison. Revival Actions in a Shooter Game. DESVIG 2013 Workshop

© Microsoft Corporation Impact of social behavior on retention AAA
title 26,000 players with ~1,000,000 sessions of game play data Random sample

© Microsoft Corporation Players who revive other players Dimension Characteristic
Change Engagement Session count +297.44% Skill Kills +100.21% Was revived –54.55% Deaths –12.44% Success Likelihood to win match +18.88% Social Gave weapon +486.14%

© Microsoft Corporation Analysis pattern: Cluster + Contrast 1. Use
k-means clustering to cluster players in the sample along the social features. 2. Analyze the cluster centroids to understand the differences in social behavior across clusters. 3. Run a survival analysis to observe trends in retention across clusters. 72

© Microsoft Corporation smart analytics is actionable real time diversity
people sharing © Microsoft Corporation Usage analytics Analytics for Xbox games © Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing Data

Software Analytics = Sharing Information

Software Analytics = Sharing Information

More Decks by Thomas Zimmermann

Other Decks in Research

Featured

Transcript