Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Software Analytics = Sharing Information

Software Analytics = Sharing Information

Keynote at the PROMISE 2013 conference

Thomas Zimmermann

October 09, 2013
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. © Microsoft Corporation
    Software Analytics =
    Sharing Information
    Thomas Zimmermann
    Microsoft Research, USA

    View Slide

  2. © Microsoft Corporation

    View Slide

  3. © Microsoft Corporation
    40 percent of major
    decisions are based
    not on facts, but on
    the manager’s gut.
    Accenture survey among 254 US managers in industry.
    http://newsroom.accenture.com/article_display.cfm?article_id=4777

    View Slide

  4. © Microsoft Corporation
    analytics is the use
    of analysis, data, and
    systematic reasoning
    to make decisions.
    Definition by Thomas H. Davenport, Jeanne G. Harris
    Analytics at Work – Smarter Decisions, Better Results

    View Slide

  5. © Microsoft Corporation
    web analytics
    (Slide by Ray Buse)

    View Slide

  6. © Microsoft Corporation
    game analytics
    Halo heat maps
    Free to play

    View Slide

  7. © Microsoft Corporation

    View Slide

  8. © Microsoft Corporation

    View Slide

  9. © Microsoft Corporation
    FREE for ESEM Attendees:
    http://qmags.com/ISW/free-esem

    View Slide

  10. © Microsoft Corporation
    history of software analytics
    Tim Menzies, Thomas Zimmermann: Software Analytics: So What?
    IEEE Software 30(4): 31-37 (2013)

    View Slide

  11. © Microsoft Corporation
    the many names
    software intelligence
    software analytics
    software development analytics
    analytics for software development
    empirical software engineering
    mining software repositories

    View Slide

  12. © Microsoft Corporation
    the many definitions

    View Slide

  13. © Microsoft Corporation
    Ahmed E. Hassan, Tao Xie: Software
    intelligence: the future of mining
    software engineering data. FoSER 2010:
    161-166
    [Software Intelligence] offers software practitioners (not just
    developers) up-to-date and pertinent information to support
    their daily decision-making processes. […]
    Raymond P. L. Buse, Thomas
    Zimmermann: Analytics for software
    development. FoSER 2010: 77-80
    The idea of analytics is to leverage potentially large amounts
    of data into real and actionable insights.
    Dongmei Zhang, Yingnong Dang, Jian-
    Guang Lou, Shi Han, Haidong Zhang, and
    Tao Xie, Software Analytics as a Learning
    Case in Practice: Approaches and
    Experiences. MALETS 2011
    Software analytics is to enable software practitioners1 to
    perform data exploration and analysis in order to obtain
    insightful and actionable information for data-driven tasks
    around software and services.
    1 Software practitioners typically include software developers,
    testers, usability engineers, and managers, etc.
    Raymond P. L. Buse, Thomas
    Zimmermann: Information needs for
    software development analytics. ICSE
    2012: 987-996
    Software development analytics […] empower[s] software
    development teams to independently gain and share insight
    from their data without relying on a separate entity.
    Tim Menzies, Thomas Zimmermann:
    Software Analytics: So What? IEEE
    Software 30(4): 31-37 (2013)
    Software analytics is analytics on software data for managers
    and software engineers with the aim of empowering software
    development individuals and teams to gain and share insight
    from their data to make better decisions.
    Dongmei Zhang, Shi Han, Yingnong Dang,
    Jian-Guang Lou, Haidong Zhang, Tao Xie:
    Software Analytics in Practice. IEEE
    Software 30(5): 30-37 (2013)
    With software analytics, software practitioners explore and
    analyze data to obtain insightful, actionable information for
    tasks regarding software development, systems, and users.

    View Slide

  14. © Microsoft Corporation
    the many definitions

    View Slide

  15. © Microsoft Corporation
    trinity of software analytics
    Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, Tao Xie:
    Software Analytics in Practice. IEEE Software 30(5): 30-37, September/October 2013.
    MSR Asia Software Analytics group: http://research.microsoft.com/en-us/groups/sa/

    View Slide

  16. © Microsoft Corporation
    inductive engineering
    The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining.
    Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem
    Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine
    Learning Technologies in Software Engineering

    View Slide

  17. © Microsoft Corporation
    guidelines for analytics
    Be easy to use. People aren't always analysis experts.
    Be concise. People have little time.
    Measure many artifacts with many indicators.
    Identify important/unusual items automatically.
    Relate activity to features/areas.
    Focus on past & present over future.
    Recognize that developers and managers have different needs.
    Information Needs for Software Development Analytics.
    Ray Buse, Thomas Zimmermann. ICSE 2012 SEIP Track

    View Slide

  18. © Microsoft Corporation
    Information Needs for Software Development Analytics.
    Ray Buse, Thomas Zimmermann. ICSE 2012 SEIP Track
    Description Insight Relevant Techniques
    Summarization Search for important or unusual factors to
    associated with a time range.
    Characterize events, understand
    why they happened.
    Topic analysis, NLP
    Alerts (&
    Correlations)
    Continuous search for unusual changes or
    relationships in variables
    Notice important events. Statistics, Repeated
    measures
    Forecasting Search for and predict unusual events in
    the future based on current trends.
    Anticipate events. Extrapolation, Statistics
    Trends How is an artifact changing? Understand the direction of the
    project.
    Regression analysis
    Overlays What artifacts account for current activity? Understand the relationships
    between artifacts.
    Cluster analysis,
    repository mining
    Goals How are features/artifacts changing in the
    context of completion or some other goal?
    Assistance for planning Root-cause analysis
    Modeling Compares the abstract history of similar
    artifacts. Identify important factors in
    history.
    Learn from previous projects. Machine learning
    Benchmarking Identify vectors of similarity/difference
    across artifacts.
    Assistance for resource allocation
    and many other decisions
    Statistics
    Simulation Simulate changes based on other artifact
    models.
    Assistance for general decisions What-if? analysis

    View Slide

  19. © Microsoft Corporation
    Smart analytics

    View Slide

  20. © Microsoft Corporation

    View Slide

  21. © Microsoft Corporation

    View Slide

  22. © Microsoft Corporation
    Jack Bauer

    View Slide

  23. © Microsoft Corporation
    Chloe
    O’Brian

    View Slide

  24. © Microsoft Corporation

    View Slide

  25. © Microsoft Corporation
    All he needed was a paper clip

    View Slide

  26. © Microsoft Corporation
    smart analytics is
    actionable

    View Slide

  27. © Microsoft Corporation
    smart analytics is
    real time

    View Slide

  28. © Microsoft Corporation
    Scene from the movie War Games (1983).

    View Slide

  29. © Microsoft Corporation
    smart analytics is
    diversity

    View Slide

  30. © Microsoft Corporation
    Researcher Developer Tester Dev. Lead Test Lead Manager
    stakeholders tools questions

    View Slide

  31. © Microsoft Corporation
    Measurements
    Surveys
    Benchmarking
    Qualitative Analysis
    Clustering
    Prediction
    What-if analysis
    Segmenting
    Multivariate Analysis
    Interviews
    stakeholders tools questions

    View Slide

  32. © Microsoft Corporation
    Build tools for
    frequent questions
    Use data scientists for
    infrequent questions
    Frequency
    Questions
    stakeholders tools questions

    View Slide

  33. © Microsoft Corporation
    Percentages
    Question Category Essential Worthwhile+ Unw
     Q27 How do users typically use my application? DP 80.0% 99.2% 0.8
     Q18 What parts of a software product are most used and/or loved by customers? CR 72.0% 98.5% 0.0
     Q50 How effective are the quality gates we run at checkin? DP 62.4% 96.6% 0.8
     Q115 How can we improve collaboration and sharing between teams? TC 54.5% 96.4% 0.0
     Q86 What are best key performance indicators (KPIs) for monitoring services? SVC 53.2% 93.6% 0.9
     Q40 What is the impact of a code change or requirements change to the project and tests? DP 52.1% 94.0% 0.0
     Q74 What is the impact of tools on productivity? PROD 50.5% 97.2% 0.9
     Q84 How do I avoid reinventing the wheel by sharing and/or searching for code? RSC 50.0% 90.9% 0.9
     Q28 What are the common patterns of execution in my application? DP 48.7% 96.6% 0.8
     Q66 How well does test coverage correspond to actual code usage by our customers? EQ 48.7% 92.0% 0.0
     Q42 What tools can help us measure and estimate the risk associated with code changes? DP 47.8% 92.2% 0.0
     Q59 What are effective metrics for ship quality? EQ 47.8% 96.5% 1.7
     Q100 How much do design changes cost us and how can we reduce their risk? SL 46.6% 94.8% 0.8
     Q19 What are the best ways to change a product's features without losing customers? CR 46.2% 92.3% 1.5
     Q131 Which test strategies find the most impactful bugs (e.g., assertions, in-circuit testing,
    A/B testing)? TP 44.5% 91.8% 0.9
     Q83 When should I write code from scratch vs. reuse legacy code? RSC 44.5% 84.5% 3.6
    Q1 What is the impact and/or cost of findings bugs at a certain stage in the development cycle? BUG 43.1% 87.9% 2.5
     Q92 What is the tradeoff between releasing more features or releasing more often? SVC 42.5% 79.6% 0.0
     Q2 What kinds of mistakes do developers make in their software? Which ones are the most common? BUG 41.7% 98.3% 0.0
     Q25 How important is a particular requirement? CR 41.7% 87.4% 2.3
     Q60 How should we use metrics to help us decide when a feature is good enough to
    release (or poor enough to cancel)? EQ 41.1% 90.2% 3.5
     Q17 What is the best way to collect customer feedback? CR 39.8% 93.0% 1.5
     Q3 In what places in their software code do developers make the most mistakes? BUG 35.0% 94.0% 0.0
    What kinds of problems happen because there is too much software process?
    © Microsoft Corporation
    Analyze This! 145 Questions for Data Scientists in Software Engineering.
    Andrew Begel, Thomas Zimmermann.

    View Slide

  34. © Microsoft Corporation
     Customer
     Practices and processes
     Product quality
    Analyze This! 145 Questions for Data Scientists in Software Engineering.
    Andrew Begel, Thomas Zimmermann.

    View Slide

  35. © Microsoft Corporation
    Percentages
    Question Category Essential Worthwhile+ Unwise
    Q72 Which individual measures correlate with employee productivity (e.g., employee age,
    tenure, engineering skills, education, promotion velocity, IQ)? PROD 7.3% 44.5% 25.5%
    Q71 Which coding measures correlate with employee productivity (e.g., lines of code, time it
    takes to build the software, a particular tool set, pair programming, number of hours of
    coding per day, language)? PROD 15.6% 56.9% 22.0%
    Q75 What metrics can be used to compare employees? PROD 19.4% 67.6% 21.3%
    Q70 How can we measure the productivity of a Microsoft employee? PROD 19.1% 70.9% 20.9%
    Q6 Is the number of bugs a good measure of developer effectiveness? BUG 16.4% 54.3% 17.2%
    Q128 Can I generate 100% test coverage? TP 15.3% 44.1% 14.4%
    Q113 Who should be in charge of creating and maintaining a consistent company-wide software
    process and tool chain? PROC 21.9% 55.3% 12.3%
    Q112 What are the benefits of a consistent, company-wide software process and tool chain? PROC 25.2% 78.3% 10.4%
    Q34 When are code comments worth the effort to write them? DP 7.9% 41.2% 9.6%
    Q24 How much time and money does it cost to add customer input into your design? CR 15.9% 68.2% 8.3%
    Analyze This! 145 Questions for Data Scientists in Software Engineering.
    Andrew Begel, Thomas Zimmermann.
    not every question is “wise”

    View Slide

  36. © Microsoft Corporation
    smart analytics is
    people

    View Slide

  37. © Microsoft Corporation
    The Decider The Brain The Innovator

    View Slide

  38. © Microsoft Corporation
    The Researcher
    PROMISE 2011, Banff, Canada.

    View Slide

  39. © Microsoft Corporation
    smart analytics is
    sharing

    View Slide

  40. © Microsoft Corporation
    Sharing Insights
    Sharing Methods
    Sharing Models
    Sharing Data

    View Slide

  41. © Microsoft Corporation
    Sharing Data

    View Slide

  42. © Microsoft Corporation

    View Slide

  43. © Microsoft Corporation

    View Slide

  44. © Microsoft Corporation
    Sharing Models

    View Slide

  45. © Microsoft Corporation
    Defect prediction
    • Learn a prediction model
    from historic data
    • Predict defects for the
    same project
    • Hundreds of prediction
    models exist
    • Models work fairly well
    with precision and recall
    of up to 80%.
    Predictor Precision Recall
    Pre-Release Bugs 73.80% 62.90%
    Test Coverage 83.80% 54.40%
    Dependencies 74.40% 69.90%
    Code Complexity 79.30% 66.00%
    Code Churn 78.60% 79.90%
    Org. Structure 86.20% 84.00%
    From: N. Nagappan, B. Murphy, and V. Basili. The influence of
    organizational structure on software quality. ICSE 2008.

    View Slide

  46. © Microsoft Corporation
    Why cross-project prediction?
    Some projects do have not enough
    data to train prediction models or
    the data is of poor quality
    New projects do have no data yet
    Can such projects use models from
    other projects?
    (=cross-project prediction)
    Thomas Zimmermann, Nachiappan Nagappan, Harald Gall,
    Emanuel Giger, Brendan Murphy: Cross-project defect prediction:
    a large scale experiment on data vs. domain vs. process.
    ESEC/SIGSOFT FSE 2009: 91-100

    View Slide

  47. © Microsoft Corporation
    A first experiment: Firefox and IE
    Firefox can predict defects in IE.
    But IE cannot predict Firefox. WHY?
    precision=0.76; recall=0.88
    precision=0.54; recall=0.04
    Firefox Internet Explorer

    View Slide

  48. © Microsoft Corporation
    622 experiments later
    only
    3.4%successful

    View Slide

  49. © Microsoft Corporation

    View Slide

  50. © Microsoft Corporation
    Sharing models
    Sharing models does
    not always work.
    In what situations does
    sharing models work?

    View Slide

  51. © Microsoft Corporation
    Sharing
    Insights
    Sharing Insights Sharing Methods

    View Slide

  52. © Microsoft Corporation
    Skill in Halo Reach
    Jeff Huang, Thomas Zimmermann, Nachiappan Nagappan, Charles
    Harrison, Bruce C. Phillips: Mastering the art of war: how patterns of
    gameplay influence skill in Halo. CHI 2013: 695-704

    View Slide

  53. How do patterns of play affect
    players’ skill in Halo Reach?
    5 Skill and Other Titles
    6 Skill Changes and Retention
    7 Mastery and Demographics
    8 Predicting Skill
    2 Play Intensity
    3 Skill after Breaks
    4 Skill before Breaks
    1 General Statistics

    View Slide

  54. The Cohort of Players
    The mean skill value µ for each player after each Team Slayer match
    µ ranges between 0 and 10, although 50% fall between 2.5 and 3.5
    Initially µ = 3 for each player, stabilizing after a couple dozen matches
    TrueSkill in Team Slayer
    We looked at the cohort of players who started in the release week
    with complete set of gameplay for those players up to 7 months later
    (over 3 million players)
    70 Person Survey about Player Experience

    View Slide

  55. © Microsoft Corporation
    Analysis of Skill Data
    Step 1: Select a population of players.
    For our Halo study, we selected a cohort of 3.2 million Halo Reach players
    on Xbox Live who started playing the game in its first week of release.
    Step 2: If necessary, sample the population of players and ensure that
    the sample is representative.
    In our study we used the complete population of players in this cohort, and
    our dataset had every match played by that population.
    Step 3: Divide the population into groups and plot the development of
    the dependent variable over time.
    For example, when plotting the players’ skill in the charts, we took the
    median skill at every point along the x-axis for each group in order to
    reduce the bias that would otherwise occur when using the mean.
    Step 4: Convert the time series into a symbolic representation to
    correlate with other factors, for example retention.
    Repeat steps 1–4 as needed for any other dependent variables of interest.

    View Slide

  56. 2 Play Intensity
    Telegraph operators gradually increase typing speed over time

    View Slide

  57. 2.1
    2.3
    2.5
    2.7
    2.9
    3.1
    0 10 20 30 40 50 60 70 80 90 100
    mu
    Games Played So Far
    2 Play Intensity
    Median skill typically
    increases slowly over time

    View Slide

  58. 2 Play Intensity (Games per Week)
    2.1
    2.3
    2.5
    2.7
    2.9
    3.1
    0 10 20 30 40 50 60 70 80 90 100
    mu
    Games Played So Far
    0 - 2 games / week [N=59164]
    2 - 4 games / week [N=101448]
    4 - 8 games / week [N=226161]
    8 - 16 games / week [N=363832]
    16 - 32 games / week [N=319579]
    32 - 64 games / week [N=420258]
    64 - 128 games / week [N=415793]
    128 - 256 games / week [N=245725]
    256+ games / week [N=115010]
    But players who play
    more overall eventually
    surpass those who play
    4–8 games per week
    (not shown in chart)
    Players who play 4–8
    games per week do best
    Median skill typically
    increases slowly over time

    View Slide

  59. 3 Change in Skill Following a Break
    “In the most drastic scenario, you can lose
    up to 80 percent of your fitness level in as
    few as two weeks [of taking a break]…”

    View Slide

  60. -0.03
    -0.02
    -0.01
    0
    0.01
    0.02
    0.03
    0 5 10 15 20 25 30 35 40 45 50
    Δmu
    Days of Break
    Next Game
    2 Games Later
    3 Games Later
    4 Games Later
    5 games later
    10 games later
    3 Change in Skill Following a Break
    Median skill slightly
    increases after each game
    played without breaks
    Longer breaks correlate
    with larger skill drops, but
    not linearly
    On average, it takes 8–10
    games to regain skill lost
    after 30 day breaks
    Breaks of 1–2 days
    correlate in tiny
    drops in skill

    View Slide

  61. 6 Skill Changes and Retention
    SAX (Symbolic Aggregate approXimation) discretizes
    time series into a symbolic representation

    View Slide

  62. Time-series of skill measured for first 100 games
    Most common pattern is steady
    improvement of skill
    Next most common pattern is a
    steady decline in skill
    6 Skill Changes and Retention
    Pattern Frequency Total Games
    61791 217
    45814 252
    36320 257
    27290 219
    22759 216
    22452 253
    20659 260
    20633 222
    19858 247
    19292 216
    17573 219
    17454 245
    17389 260
    15670 215
    13692 236
    12516 239

    View Slide

  63. Time-series of skill measured for first 100 games
    Most common pattern is steady
    improvement of skill
    Next most common pattern is a
    steady decline in skill
    Improving players actually end
    up playing fewer games than
    players with declining skill
    Pattern Frequency Total Games
    61791 217
    45814 252
    36320 257
    27290 219
    22759 216
    22452 253
    20659 260
    20633 222
    19858 247
    19292 216
    17573 219
    17454 245
    17389 260
    15670 215
    13692 236
    12516 239
    6 Skill Changes and Retention

    View Slide

  64. © Microsoft Corporation
    Social behavior in a
    Shooter game
    Sauvik Das, Thomas Zimmermann, Nachiappan Nagappan, Bruce
    Phillips, Chuck Harrison. Revival Actions in a Shooter Game.
    DESVIG 2013 Workshop

    View Slide

  65. © Microsoft Corporation
    Impact of social behavior on retention
    AAA title
    26,000 players with
    ~1,000,000 sessions of
    game play data
    Random sample

    View Slide

  66. © Microsoft Corporation
    Players who revive other players
    Dimension Characteristic Change
    Engagement Session count +297.44%
    Skill Kills +100.21%
    Was revived –54.55%
    Deaths –12.44%
    Success Likelihood to win match +18.88%
    Social Gave weapon +486.14%

    View Slide

  67. © Microsoft Corporation
    A simple social model
    Player-instigated Team-instigated With-Enemy

    View Slide

  68. © Microsoft Corporation
    Analysis pattern: Cluster + Contrast
    1. Use k-means clustering to cluster players
    in the sample along the social features.
    2. Analyze the cluster centroids to
    understand the differences in social
    behavior across clusters.
    3. Run a survival analysis to observe trends
    in retention across clusters.
    72

    View Slide

  69. © Microsoft Corporation
    Call to Action

    View Slide

  70. © Microsoft Corporation
    Book “Analyzing Software Data”
    http://menzies.us/asd
    Proposals due
    October 15

    View Slide

  71. © Microsoft Corporation
    Data Analysis Patterns
    http://dapse.unbox.org/

    View Slide

  72. © Microsoft Corporation

    View Slide

  73. © Microsoft Corporation
    smart analytics is
    actionable
    real time
    diversity
    people
    sharing
    © Microsoft Corporation
    Usage analytics
    Analytics for Xbox games
    © Microsoft Corporation
    Sharing Insights
    Sharing Methods
    Sharing Models
    Sharing Data

    View Slide

  74. © Microsoft Corporation
    Thank you!

    View Slide