Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analytics for Smarter Software Development

Analytics for Smarter Software Development

Presented at the Software Experts Summit 2014

Thomas Zimmermann

May 30, 2014
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. © Microsoft Corporation
    Analytics for Smarter Software Development
    Thomas Zimmermann, Microsoft Research, USA
    Joint work with Chris Bird, Nachi Nagappan and many others.

    View full-size slide

  2. © Microsoft Corporation

    View full-size slide

  3. © Microsoft Corporation
    40 percent of major
    decisions are based
    not on facts, but on
    the manager’s gut.
    Accenture survey among 254 US managers in industry.
    http://newsroom.accenture.com/article_display.cfm?article_id=4777

    View full-size slide

  4. © Microsoft Corporation
    analytics is the use
    of analysis, data, and
    systematic reasoning
    to make decisions.
    Definition by Thomas H. Davenport, Jeanne G. Harris
    Analytics at Work – Smarter Decisions, Better Results

    View full-size slide

  5. © Microsoft Corporation
    history of software analytics
    Tim Menzies, Thomas Zimmermann: Software Analytics: So What?
    IEEE Software 30(4): 31-37 (2013)

    View full-size slide

  6. © Microsoft Corporation

    View full-size slide

  7. © Microsoft Corporation
    trinity of software analytics
    Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, Tao Xie:
    Software Analytics in Practice. IEEE Software 30(5): 30-37, September/October 2013.
    MSR Asia Software Analytics group: http://research.microsoft.com/en-us/groups/sa/

    View full-size slide

  8. © Microsoft Corporation
    guidelines for analytics (1)
    The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining.
    Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem
    Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine
    Learning Technologies in Software Engineering

    View full-size slide

  9. © Microsoft Corporation
    guidelines for analytics (2)
    Be easy to use. People aren't always analysis experts.
    Be concise. People have little time.
    Measure many artifacts with many indicators.
    Identify important/unusual items automatically.
    Relate activity to features/areas.
    Focus on past & present over future.
    Recognize that developers and managers have different needs.
    Information Needs for Software Development Analytics.
    Ray Buse, Thomas Zimmermann. ICSE 2012 SEIP Track

    View full-size slide

  10. © Microsoft Corporation
    © Microsoft Corporation
    Smart analytics
    © Microsoft Corporation
    Development analytics
    © Microsoft Corporation
    Usage analytics
    © Microsoft Corporation
    The future
    © Microsoft Corporation
    What’s next?

    View full-size slide

  11. © Microsoft Corporation
    Smart analytics

    View full-size slide

  12. © Microsoft Corporation

    View full-size slide

  13. © Microsoft Corporation

    View full-size slide

  14. © Microsoft Corporation
    Jack Bauer

    View full-size slide

  15. © Microsoft Corporation
    Chloe
    O’Brian

    View full-size slide

  16. © Microsoft Corporation

    View full-size slide

  17. © Microsoft Corporation
    All he needed was a paper clip

    View full-size slide

  18. © Microsoft Corporation
    smart analytics is
    actionable

    View full-size slide

  19. © Microsoft Corporation
    smart analytics is
    real time

    View full-size slide

  20. © Microsoft Corporation
    smart analytics is
    diversity

    View full-size slide

  21. © Microsoft Corporation
    The Stakeholders
    The Tools The Questions

    View full-size slide

  22. © Microsoft Corporation
    http://aka.ms/145Questions
    Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists
    in Software Engineering. To appear ICSE 2014

    View full-size slide

  23. © Microsoft Corporation
    Microsoft’s Top 10 Questions Essential
    Essential +
    Worthwhile
    How do users typically use my application? 80.0% 99.2%
    What parts of a software product are most used and/or loved by
    customers?
    72.0% 98.5%
    How effective are the quality gates we run at checkin? 62.4% 96.6%
    How can we improve collaboration and sharing between teams? 54.5% 96.4%
    What are the best key performance indicators (KPIs) for
    monitoring services?
    53.2% 93.6%
    What is the impact of a code change or requirements change to
    the project and its tests?
    52.1% 94.0%
    What is the impact of tools on productivity? 50.5% 97.2%
    How do I avoid reinventing the wheel by sharing and/or searching
    for code?
    50.0% 90.9%
    What are the common patterns of execution in my application? 48.7% 96.6%
    How well does test coverage correspond to actual code usage by
    our customers?
    48.7% 92.0%

    View full-size slide

  24. © Microsoft Corporation
    smart analytics is
    people

    View full-size slide

  25. © Microsoft Corporation
    The Decider The Brain The Innovator
    Photo of MSA 2010 by Daniel M German ([email protected])
    The Researcher

    View full-size slide

  26. © Microsoft Corporation
    smart analytics is
    sharing

    View full-size slide

  27. © Microsoft Corporation
    Sharing Insights
    Sharing Methods
    Sharing Models
    Sharing Data

    View full-size slide

  28. © Microsoft Corporation
    Sharing
    Insights
    Sharing Insights Sharing Methods

    View full-size slide

  29. © Microsoft Corporation
    Branch Analytics
    Christian Bird, Thomas Zimmermann: Assessing the value of branches with
    what-if analysis. SIGSOFT FSE 2012: 45
    Emad Shihab, Christian Bird, Thomas Zimmermann: The effect of branching
    strategies on software quality. ESEM 2012: 301-310
    Christian Bird, Thomas Zimmermann, Alex Teterev: A theory of branches as
    goals and virtual teams. CHASE 2011: 53-56

    View full-size slide

  30. © Microsoft Corporation

    View full-size slide

  31. © Microsoft Corporation
    main
    Branches at Microsoft

    View full-size slide

  32. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft

    View full-size slide

  33. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks

    View full-size slide

  34. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    integration

    View full-size slide

  35. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    integration
    integration

    View full-size slide

  36. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    integration
    integration

    View full-size slide

  37. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    Process overhead
    Time delay
    integration
    integration

    View full-size slide

  38. © Microsoft Corporation
    Code Flow for a Single File
    Blue nodes are
    edits to the file
    Orange nodes are
    move operations

    View full-size slide

  39. © Microsoft Corporation
    Branch Decisions
    How do we coordinate parallel
    development?
    How do we structure the branch
    hierarchy? Can we reduce the
    complexity of branching?

    View full-size slide

  40. © Microsoft Corporation
    Branch Analytics
    Techniques:
    • Survey developers to understand problems with branching
    • Mine source control for relationship of teams and branches
    • Simulate benefits and cost of alternative branch structures
    Actions/Tools:
    • Alert stakeholders about possible conflicts
    • Recommend branch structure (delete, create, fold branches)
    • Perform semi-automatic branch refactoring

    View full-size slide

  41. © Microsoft Corporation
    Assessing a Branch
    Simulate alternate branch structure to assess cost and
    benefit of individual branches
    • Cost: Average Delay Increase per Edit
    How much delay does a branch introduce into development?
    • Cost: Integrations per Edit on a Branch
    What is the integration/edit within a branch?
    • Benefit: Provided Isolation per Edit
    How many conflicts does a branch prevent per edit?

    View full-size slide

  42. © Microsoft Corporation
    Simulating Removal of a Single Branch
    A
    B
    integration integration
    A
    B
    A
    B
    A
    Compare 1 with 4 to assess cost and benefit of branch B

    View full-size slide

  43. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    65

    View full-size slide

  44. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    To release
    branch
    66

    View full-size slide

  45. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    Parent Branch
    Victim Branch
    Child Branch
    67
    Simulation (what-if)

    View full-size slide

  46. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    faster
    code flow
    Parent Branch
    Victim Branch
    Child Branch
    68
    Simulation (what-if)

    View full-size slide

  47. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    faster
    code flow
    unneeded
    integrations removed
    Parent Branch
    Victim Branch
    Child Branch
    69
    Simulation (what-if)

    View full-size slide

  48. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    no longer
    isolated
    faster
    code flow
    unneeded
    integrations removed
    Parent Branch
    Victim Branch
    Child Branch
    no longer
    isolated
    no longer
    isolated
    no longer
    isolated
    no longer
    isolated
    70
    Simulation (what-if)

    View full-size slide

  49. © Microsoft Corporation
    Assessing branches
    Delay
    (Cost)
    Provided Isolation
    (Benefit)
    Green dots
    are branches
    with high benefit
    and low cost
    Red dots
    are branches
    with high cost
    but low benefit
    Each dot
    is a branch

    View full-size slide

  50. © Microsoft Corporation
    Assessing branches
    Delay
    (Cost)
    Provided Isolation
    (Benefit)
    Green dots
    are branches
    with high benefit
    and low cost
    Red dots
    are branches
    with high cost
    but low benefit
    Each dot
    is a branch
    If high-cost-low-benefit had been removed,
    changes would each have saved 8.9 days of delay
    and only introduced 0.04 additional conflicts.

    View full-size slide

  51. © Microsoft Corporation
    Skill in Halo Reach
    Jeff Huang, Thomas Zimmermann, Nachiappan Nagappan, Charles
    Harrison, Bruce C. Phillips: Mastering the art of war: how patterns of
    gameplay influence skill in Halo. CHI 2013: 695-704

    View full-size slide

  52. © Microsoft Corporation

    View full-size slide

  53. How do patterns of play affect
    players’ skill in Halo Reach?
    5 Skill and Other Titles
    6 Skill Changes and Retention
    7 Mastery and Demographics
    8 Predicting Skill
    2 Play Intensity
    3 Skill after Breaks
    4 Skill before Breaks
    1 General Statistics

    View full-size slide

  54. The Cohort of Players
    The mean skill value µ for each player after each Team Slayer match
    µ ranges between 0 and 10, although 50% fall between 2.5 and 3.5
    Initially µ = 3 for each player, stabilizing after a couple dozen matches
    TrueSkill in Team Slayer
    We looked at the cohort of players who started in the release week
    with complete set of gameplay for those players up to 7 months later
    (over 3 million players)
    70 Person Survey about Player Experience

    View full-size slide

  55. 2 Play Intensity
    Telegraph operators gradually increase typing speed over time

    View full-size slide

  56. 2.1
    2.3
    2.5
    2.7
    2.9
    3.1
    0 10 20 30 40 50 60 70 80 90 100
    mu
    Games Played So Far
    2 Play Intensity
    Median skill typically
    increases slowly over time
    Skill

    View full-size slide

  57. 2 Play Intensity (Games per Week)
    2.1
    2.3
    2.5
    2.7
    2.9
    3.1
    0 10 20 30 40 50 60 70 80 90 100
    mu
    Games Played So Far
    0 - 2 games / week [N=59164]
    2 - 4 games / week [N=101448]
    4 - 8 games / week [N=226161]
    8 - 16 games / week [N=363832]
    16 - 32 games / week [N=319579]
    32 - 64 games / week [N=420258]
    64 - 128 games / week [N=415793]
    128 - 256 games / week [N=245725]
    256+ games / week [N=115010]
    Median skill typically
    increases slowly over time
    Skill

    View full-size slide

  58. 2 Play Intensity (Games per Week)
    2.1
    2.3
    2.5
    2.7
    2.9
    3.1
    0 10 20 30 40 50 60 70 80 90 100
    mu
    Games Played So Far
    0 - 2 games / week [N=59164]
    2 - 4 games / week [N=101448]
    4 - 8 games / week [N=226161]
    8 - 16 games / week [N=363832]
    16 - 32 games / week [N=319579]
    32 - 64 games / week [N=420258]
    64 - 128 games / week [N=415793]
    128 - 256 games / week [N=245725]
    256+ games / week [N=115010]
    Median skill typically
    increases slowly over time
    Skill

    View full-size slide

  59. 2 Play Intensity (Games per Week)
    2.1
    2.3
    2.5
    2.7
    2.9
    3.1
    0 10 20 30 40 50 60 70 80 90 100
    mu
    Games Played So Far
    0 - 2 games / week [N=59164]
    2 - 4 games / week [N=101448]
    4 - 8 games / week [N=226161]
    8 - 16 games / week [N=363832]
    16 - 32 games / week [N=319579]
    32 - 64 games / week [N=420258]
    64 - 128 games / week [N=415793]
    128 - 256 games / week [N=245725]
    256+ games / week [N=115010]
    But players who play
    more overall eventually
    surpass those who play
    4–8 games per week
    (not shown in chart)
    Players who play 4–8
    games per week do best
    Median skill typically
    increases slowly over time
    Skill

    View full-size slide

  60. 3 Change in Skill Following a Break
    “In the most drastic scenario, you can lose
    up to 80 percent of your fitness level in as
    few as two weeks [of taking a break]…”

    View full-size slide

  61. -0.03
    -0.02
    -0.01
    0
    0.01
    0.02
    0.03
    0 5 10 15 20 25 30 35 40 45 50
    Δmu
    Days of Break
    Next Game
    2 Games Later
    3 Games Later
    4 Games Later
    5 games later
    10 games later
    3 Change in Skill Following a Break
    Median skill slightly
    increases after each game
    played without breaks
    Longer breaks correlate
    with larger skill drops, but
    not linearly
    On average, it takes 8–10
    games to regain skill lost
    after 30 day breaks
    Breaks of 1–2 days
    correlate in tiny
    drops in skill
    Change in Skill

    View full-size slide

  62. Analysis of Skill Data
    Step 1: Select a population of players.
    For our Halo study, we selected a cohort of 3.2 million Halo Reach players
    on Xbox Live who started playing the game in its first week of release.
    Step 2: If necessary, sample the population of players and ensure that
    the sample is representative.
    In our study we used the complete population of players in this cohort, and
    our dataset had every match played by that population.
    Step 3: Divide the population into groups and plot the development of
    the dependent variable over time.
    For example, when plotting the players’ skill in the charts, we took the
    median skill at every point along the x-axis for each group in order to
    reduce the bias that would otherwise occur when using the mean.
    Step 4: Convert the time series into a symbolic representation to
    correlate with other factors, for example retention.
    Repeat steps 1–4 as needed for any other dependent variables of interest.

    View full-size slide

  63. © Microsoft Corporation
    What’s next?

    View full-size slide

  64. © Microsoft Corporation
    call to action

    View full-size slide

  65. © Microsoft Corporation
    Data Analysis Patterns
    http://dapse.unbox.org/

    View full-size slide

  66. © Microsoft Corporation

    View full-size slide

  67. © Microsoft Corporation
    Analytics for Smarter Software Development
    Thomas Zimmermann, Microsoft Research, USA
    Joint work with Chris Bird, Nachi Nagappan and many others.

    View full-size slide

  68. © Microsoft Corporation
    Analytics for Smarter Software Development
    Thomas Zimmermann, Microsoft Research, USA
    Joint work with Chris Bird, Nachi Nagappan and many others.

    View full-size slide

  69. © Microsoft Corporation
    Thank you!

    View full-size slide