Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hello Clippy! Lessons Learned from RSSEs

Hello Clippy! Lessons Learned from RSSEs

Keynote at the RSSE 2014 workshop

10b546a258d03212c0c73aae9603c095?s=128

Thomas Zimmermann

June 03, 2014
Tweet

Transcript

  1. © Microsoft Corporation Hello Clippy! Lessons Learned from RSSEs Thomas

    Zimmermann Microsoft Research
  2. © Microsoft Corporation University of Passau Saarland University University of

    Calgary Microsoft Research PhD Assistant Professor (2007-2008) Researcher (since 2008)
  3. © Microsoft Corporation

  4. © Microsoft Corporation

  5. © Microsoft Corporation

  6. © Microsoft Corporation

  7. © Microsoft Corporation

  8. © Microsoft Corporation Annotations for Risky Locations

  9. © Microsoft Corporation "A recommendation system for software engineering is

    a software application that provides information items estimated to be valuable for a software engineering task in a given context." [Robillard, Walker, Zimmermann, 2009] B+
  10. © Microsoft Corporation Three Things I Think I Know About

    Software and are important to RSSEs
  11. © Microsoft Corporation software is diversity

  12. © Microsoft Corporation © Microsoft Corporation Developer Tester Builder Dev.

    Lead Test Lead Manager people projects knowledge
  13. © Microsoft Corporation Number of projects that are needed to

    cover the Ohloh universe with respect to seven dimensions (language, size, contributors, churn, commits, age, activity). Each point in the graph means that x projects can cover y percent of the universe. Meiyappan Nagappan, Thomas Zimmermann, Christian Bird: Diversity in software engineering research. ESEC/SIGSOFT FSE 2013: 466-476 people projects knowledge
  14. © Microsoft Corporation people projects knowledge Build tool support for

    frequently needed knowledge Frequency Knowledge
  15. © Microsoft Corporation one size does not fit all

  16. © Microsoft Corporation developers are smart

  17. © Microsoft Corporation and software is complex

  18. © Microsoft Corporation

  19. © Microsoft Corporation My wish list: RSSEs for software analytics

  20. © Microsoft Corporation analytics is the use of analysis, data,

    and systematic reasoning to make decisions. Definition by Thomas H. Davenport, Jeanne G. Harris Analytics at Work – Smarter Decisions, Better Results software analytics is analytics on software data
  21. © Microsoft Corporation

  22. © Microsoft Corporation history of software analytics Tim Menzies, Thomas

    Zimmermann: Software Analytics: So What? IEEE Software 30(4): 31-37 (2013)
  23. © Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing

    Data
  24. © Microsoft Corporation Sharing Insights Sharing Insights Sharing Methods

  25. © Microsoft Corporation Example: Branch Analytics Christian Bird, Thomas Zimmermann:

    Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012: 45 Emad Shihab, Christian Bird, Thomas Zimmermann: The effect of branching strategies on software quality. ESEM 2012: 301-310 Christian Bird, Thomas Zimmermann, Alex Teterev: A theory of branches as goals and virtual teams. CHASE 2011: 53-56
  26. © Microsoft Corporation

  27. © Microsoft Corporation main Branches at Microsoft

  28. © Microsoft Corporation main networking multimedia Branches at Microsoft

  29. © Microsoft Corporation main networking multimedia Branches at Microsoft Changes

    are isolated => Less build and test breaks
  30. © Microsoft Corporation main networking multimedia Branches at Microsoft Changes

    are isolated => Less build and test breaks integration
  31. © Microsoft Corporation main networking multimedia Branches at Microsoft Changes

    are isolated => Less build and test breaks integration integration
  32. © Microsoft Corporation main networking multimedia Branches at Microsoft Changes

    are isolated => Less build and test breaks integration integration
  33. © Microsoft Corporation main networking multimedia Branches at Microsoft Changes

    are isolated => Less build and test breaks Process overhead Time delay integration integration
  34. © Microsoft Corporation Code Flow for a Single File Blue

    nodes are edits to the file Orange nodes are move operations
  35. © Microsoft Corporation Branch Decisions How do we coordinate parallel

    development? How do we structure the branch hierarchy? Can we reduce the complexity of branching?
  36. © Microsoft Corporation Branch Analytics Techniques: • Survey developers to

    understand problems with branching • Mine source control for relationship of teams and branches • Simulate benefits and cost of alternative branch structures Actions/Tools: • Alert stakeholders about possible conflicts • Recommend branch structure (delete, create, fold branches) • Perform semi-automatic branch refactoring
  37. © Microsoft Corporation Which Branches Need Coordination? Compare all pairs

    of branches by file similarity and developer similarity. Dark areas mean many branch pairs in that area. Same files, but different team means potential problems Same files, but different team means potential problems Different Files Same Files Different Teams Same Teams
  38. © Microsoft Corporation Assessing a Branch Simulate alternate branch structure

    to assess cost and benefit of individual branches • Cost: Average Delay Increase per Edit How much delay does a branch introduce into development? • Cost: Integrations per Edit on a Branch What is the integration/edit within a branch? • Benefit: Provided Isolation per Edit How many conflicts does a branch prevent per edit?
  39. © Microsoft Corporation Simulating Removal of a Single Branch A

    B integration integration A B A B A Compare 1 with 4 to assess cost and benefit of branch B
  40. © Microsoft Corporation Parent Branch Victim Branch Child Branch 41

  41. © Microsoft Corporation Parent Branch Victim Branch Child Branch To

    release branch 42
  42. © Microsoft Corporation Parent Branch Victim Branch Child Branch Parent

    Branch Victim Branch Child Branch 43 Simulation (what-if)
  43. © Microsoft Corporation Parent Branch Victim Branch Child Branch faster

    code flow Parent Branch Victim Branch Child Branch 44 Simulation (what-if)
  44. © Microsoft Corporation Parent Branch Victim Branch Child Branch faster

    code flow unneeded integrations removed Parent Branch Victim Branch Child Branch 45 Simulation (what-if)
  45. © Microsoft Corporation Parent Branch Victim Branch Child Branch no

    longer isolated faster code flow unneeded integrations removed Parent Branch Victim Branch Child Branch no longer isolated no longer isolated no longer isolated no longer isolated 46 Simulation (what-if)
  46. © Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit)

    Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch
  47. © Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit)

    Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch If high-cost-low-benefit had been removed, changes would each have saved 8.9 days of delay and only introduced 0.04 additional conflicts.
  48. © Microsoft Corporation Build tools for frequent questions Use data

    scientists for infrequent questions Why did I show you this? Make it easier for data scientist to build tools Frequency Questions
  49. © Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This!

    145 Questions for Data Scientists in Software Engineering. To appear ICSE 2014
  50. © Microsoft Corporation Microsoft’s Top 10 Questions Essential Essential +

    Worthwhile How do users typically use my application? 80.0% 99.2% What parts of a software product are most used and/or loved by customers? 72.0% 98.5% How effective are the quality gates we run at checkin? 62.4% 96.6% How can we improve collaboration and sharing between teams? 54.5% 96.4% What are the best key performance indicators (KPIs) for monitoring services? 53.2% 93.6% What is the impact of a code change or requirements change to the project and its tests? 52.1% 94.0% What is the impact of tools on productivity? 50.5% 97.2% How do I avoid reinventing the wheel by sharing and/or searching for code? 50.0% 90.9% What are the common patterns of execution in my application? 48.7% 96.6% How well does test coverage correspond to actual code usage by our customers? 48.7% 92.0%
  51. © Microsoft Corporation RSSE for Software Analytics Opportunities • Provide

    recommendations – What analysis method to use and when? • How to understand results from data? • How to measure success/insight? • Provide tools to transform manual empirical analysis into reusable analysis
  52. © Microsoft Corporation Hello Clippy! Lessons Learned from RSSEs Thomas

    Zimmermann Microsoft Research © Microsoft Corporation © Microsoft Corporation Sharing Insights Sharing Methods Sharing Models Sharing Data © Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists in Software Engineering. To appear ICSE 2014 © Microsoft Corporation My wish list: RSSEs for software analytics
  53. © Microsoft Corporation Thank you!