Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hello Clippy! Lessons Learned from RSSEs

Hello Clippy! Lessons Learned from RSSEs

Keynote at the RSSE 2014 workshop

Thomas Zimmermann

June 03, 2014
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. © Microsoft Corporation
    Hello Clippy!
    Lessons Learned from RSSEs
    Thomas Zimmermann
    Microsoft Research

    View full-size slide

  2. © Microsoft Corporation
    University
    of Passau
    Saarland
    University
    University
    of Calgary
    Microsoft
    Research
    PhD
    Assistant
    Professor
    (2007-2008)
    Researcher
    (since 2008)

    View full-size slide

  3. © Microsoft Corporation

    View full-size slide

  4. © Microsoft Corporation

    View full-size slide

  5. © Microsoft Corporation

    View full-size slide

  6. © Microsoft Corporation

    View full-size slide

  7. © Microsoft Corporation

    View full-size slide

  8. © Microsoft Corporation
    Annotations for Risky Locations

    View full-size slide

  9. © Microsoft Corporation
    "A recommendation system for software
    engineering is a software application that
    provides information items estimated to
    be valuable for a software engineering
    task in a given context."
    [Robillard, Walker, Zimmermann, 2009]
    B+

    View full-size slide

  10. © Microsoft Corporation
    Three Things I Think
    I Know About Software
    and are important to RSSEs

    View full-size slide

  11. © Microsoft Corporation
    software is
    diversity

    View full-size slide

  12. © Microsoft Corporation
    © Microsoft Corporation
    Developer Tester Builder Dev. Lead Test Lead Manager
    people projects knowledge

    View full-size slide

  13. © Microsoft Corporation
    Number of projects that are needed to cover the Ohloh universe with respect to
    seven dimensions (language, size, contributors, churn, commits, age, activity).
    Each point in the graph means that x projects can cover y percent of the universe.
    Meiyappan Nagappan, Thomas Zimmermann, Christian Bird: Diversity in software
    engineering research. ESEC/SIGSOFT FSE 2013: 466-476
    people projects knowledge

    View full-size slide

  14. © Microsoft Corporation
    people projects knowledge
    Build tool support for
    frequently needed
    knowledge
    Frequency
    Knowledge

    View full-size slide

  15. © Microsoft Corporation
    one size does
    not fit all

    View full-size slide

  16. © Microsoft Corporation
    developers are
    smart

    View full-size slide

  17. © Microsoft Corporation
    and software is
    complex

    View full-size slide

  18. © Microsoft Corporation

    View full-size slide

  19. © Microsoft Corporation
    My wish list:
    RSSEs for software analytics

    View full-size slide

  20. © Microsoft Corporation
    analytics is the use of analysis,
    data, and systematic reasoning
    to make decisions.
    Definition by Thomas H. Davenport, Jeanne G. Harris
    Analytics at Work – Smarter Decisions, Better Results
    software analytics is analytics
    on software data

    View full-size slide

  21. © Microsoft Corporation

    View full-size slide

  22. © Microsoft Corporation
    history of software analytics
    Tim Menzies, Thomas Zimmermann: Software Analytics: So What?
    IEEE Software 30(4): 31-37 (2013)

    View full-size slide

  23. © Microsoft Corporation
    Sharing Insights
    Sharing Methods
    Sharing Models
    Sharing Data

    View full-size slide

  24. © Microsoft Corporation
    Sharing
    Insights
    Sharing Insights Sharing Methods

    View full-size slide

  25. © Microsoft Corporation
    Example:
    Branch Analytics
    Christian Bird, Thomas Zimmermann: Assessing the value of branches with
    what-if analysis. SIGSOFT FSE 2012: 45
    Emad Shihab, Christian Bird, Thomas Zimmermann: The effect of branching
    strategies on software quality. ESEM 2012: 301-310
    Christian Bird, Thomas Zimmermann, Alex Teterev: A theory of branches as
    goals and virtual teams. CHASE 2011: 53-56

    View full-size slide

  26. © Microsoft Corporation

    View full-size slide

  27. © Microsoft Corporation
    main
    Branches at Microsoft

    View full-size slide

  28. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft

    View full-size slide

  29. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks

    View full-size slide

  30. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    integration

    View full-size slide

  31. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    integration
    integration

    View full-size slide

  32. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    integration
    integration

    View full-size slide

  33. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    Process overhead
    Time delay
    integration
    integration

    View full-size slide

  34. © Microsoft Corporation
    Code Flow for a Single File
    Blue nodes are
    edits to the file
    Orange nodes are
    move operations

    View full-size slide

  35. © Microsoft Corporation
    Branch Decisions
    How do we coordinate parallel
    development?
    How do we structure the branch
    hierarchy? Can we reduce the
    complexity of branching?

    View full-size slide

  36. © Microsoft Corporation
    Branch Analytics
    Techniques:
    • Survey developers to understand problems with branching
    • Mine source control for relationship of teams and branches
    • Simulate benefits and cost of alternative branch structures
    Actions/Tools:
    • Alert stakeholders about possible conflicts
    • Recommend branch structure (delete, create, fold branches)
    • Perform semi-automatic branch refactoring

    View full-size slide

  37. © Microsoft Corporation
    Which Branches Need Coordination?
    Compare all pairs of branches by file similarity and developer
    similarity. Dark areas mean many branch pairs in that area.
    Same files, but different team
    means potential problems
    Same files, but different team
    means potential problems
    Different Files Same Files
    Different
    Teams
    Same
    Teams

    View full-size slide

  38. © Microsoft Corporation
    Assessing a Branch
    Simulate alternate branch structure to assess cost and
    benefit of individual branches
    • Cost: Average Delay Increase per Edit
    How much delay does a branch introduce into development?
    • Cost: Integrations per Edit on a Branch
    What is the integration/edit within a branch?
    • Benefit: Provided Isolation per Edit
    How many conflicts does a branch prevent per edit?

    View full-size slide

  39. © Microsoft Corporation
    Simulating Removal of a Single Branch
    A
    B
    integration integration
    A
    B
    A
    B
    A
    Compare 1 with 4 to assess cost and benefit of branch B

    View full-size slide

  40. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    41

    View full-size slide

  41. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    To release
    branch
    42

    View full-size slide

  42. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    Parent Branch
    Victim Branch
    Child Branch
    43
    Simulation (what-if)

    View full-size slide

  43. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    faster
    code flow
    Parent Branch
    Victim Branch
    Child Branch
    44
    Simulation (what-if)

    View full-size slide

  44. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    faster
    code flow
    unneeded
    integrations removed
    Parent Branch
    Victim Branch
    Child Branch
    45
    Simulation (what-if)

    View full-size slide

  45. © Microsoft Corporation
    Parent Branch
    Victim Branch
    Child Branch
    no longer
    isolated
    faster
    code flow
    unneeded
    integrations removed
    Parent Branch
    Victim Branch
    Child Branch
    no longer
    isolated
    no longer
    isolated
    no longer
    isolated
    no longer
    isolated
    46
    Simulation (what-if)

    View full-size slide

  46. © Microsoft Corporation
    Assessing branches
    Delay
    (Cost)
    Provided Isolation
    (Benefit)
    Green dots
    are branches
    with high benefit
    and low cost
    Red dots
    are branches
    with high cost
    but low benefit
    Each dot
    is a branch

    View full-size slide

  47. © Microsoft Corporation
    Assessing branches
    Delay
    (Cost)
    Provided Isolation
    (Benefit)
    Green dots
    are branches
    with high benefit
    and low cost
    Red dots
    are branches
    with high cost
    but low benefit
    Each dot
    is a branch
    If high-cost-low-benefit had been removed,
    changes would each have saved 8.9 days of delay
    and only introduced 0.04 additional conflicts.

    View full-size slide

  48. © Microsoft Corporation
    Build tools for
    frequent questions
    Use data scientists for
    infrequent questions
    Why did I show you this?
    Make it easier for
    data scientist to build tools
    Frequency
    Questions

    View full-size slide

  49. © Microsoft Corporation
    http://aka.ms/145Questions
    Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists
    in Software Engineering. To appear ICSE 2014

    View full-size slide

  50. © Microsoft Corporation
    Microsoft’s Top 10 Questions Essential
    Essential +
    Worthwhile
    How do users typically use my application? 80.0% 99.2%
    What parts of a software product are most used and/or loved by
    customers?
    72.0% 98.5%
    How effective are the quality gates we run at checkin? 62.4% 96.6%
    How can we improve collaboration and sharing between teams? 54.5% 96.4%
    What are the best key performance indicators (KPIs) for
    monitoring services?
    53.2% 93.6%
    What is the impact of a code change or requirements change to
    the project and its tests?
    52.1% 94.0%
    What is the impact of tools on productivity? 50.5% 97.2%
    How do I avoid reinventing the wheel by sharing and/or searching
    for code?
    50.0% 90.9%
    What are the common patterns of execution in my application? 48.7% 96.6%
    How well does test coverage correspond to actual code usage by
    our customers?
    48.7% 92.0%

    View full-size slide

  51. © Microsoft Corporation
    RSSE for Software Analytics
    Opportunities
    • Provide recommendations
    – What analysis method to use and when?
    • How to understand results from data?
    • How to measure success/insight?
    • Provide tools to transform manual
    empirical analysis into reusable analysis

    View full-size slide

  52. © Microsoft Corporation
    Hello Clippy!
    Lessons Learned from RSSEs
    Thomas Zimmermann
    Microsoft Research
    © Microsoft Corporation © Microsoft Corporation
    Sharing Insights
    Sharing Methods
    Sharing Models
    Sharing Data
    © Microsoft Corporation
    http://aka.ms/145Questions
    Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists
    in Software Engineering. To appear ICSE 2014
    © Microsoft Corporation
    My wish list:
    RSSEs for software analytics

    View full-size slide

  53. © Microsoft Corporation
    Thank you!

    View full-size slide