Keynote at the RSSE 2014 workshop
© Microsoft CorporationHello Clippy!Lessons Learned from RSSEsThomas ZimmermannMicrosoft Research
View Slide
© Microsoft CorporationUniversityof PassauSaarlandUniversityUniversityof CalgaryMicrosoftResearchPhDAssistantProfessor(2007-2008)Researcher(since 2008)
© Microsoft Corporation
© Microsoft CorporationAnnotations for Risky Locations
© Microsoft Corporation"A recommendation system for softwareengineering is a software application thatprovides information items estimated tobe valuable for a software engineeringtask in a given context."[Robillard, Walker, Zimmermann, 2009]B+
© Microsoft CorporationThree Things I ThinkI Know About Softwareand are important to RSSEs
© Microsoft Corporationsoftware isdiversity
© Microsoft Corporation© Microsoft CorporationDeveloper Tester Builder Dev. Lead Test Lead Managerpeople projects knowledge
© Microsoft CorporationNumber of projects that are needed to cover the Ohloh universe with respect toseven dimensions (language, size, contributors, churn, commits, age, activity).Each point in the graph means that x projects can cover y percent of the universe.Meiyappan Nagappan, Thomas Zimmermann, Christian Bird: Diversity in softwareengineering research. ESEC/SIGSOFT FSE 2013: 466-476people projects knowledge
© Microsoft Corporationpeople projects knowledgeBuild tool support forfrequently neededknowledgeFrequencyKnowledge
© Microsoft Corporationone size doesnot fit all
© Microsoft Corporationdevelopers aresmart
© Microsoft Corporationand software iscomplex
© Microsoft CorporationMy wish list:RSSEs for software analytics
© Microsoft Corporationanalytics is the use of analysis,data, and systematic reasoningto make decisions.Definition by Thomas H. Davenport, Jeanne G. HarrisAnalytics at Work – Smarter Decisions, Better Resultssoftware analytics is analyticson software data
© Microsoft Corporationhistory of software analyticsTim Menzies, Thomas Zimmermann: Software Analytics: So What?IEEE Software 30(4): 31-37 (2013)
© Microsoft CorporationSharing InsightsSharing MethodsSharing ModelsSharing Data
© Microsoft CorporationSharingInsightsSharing Insights Sharing Methods
© Microsoft CorporationExample:Branch AnalyticsChristian Bird, Thomas Zimmermann: Assessing the value of branches withwhat-if analysis. SIGSOFT FSE 2012: 45Emad Shihab, Christian Bird, Thomas Zimmermann: The effect of branchingstrategies on software quality. ESEM 2012: 301-310Christian Bird, Thomas Zimmermann, Alex Teterev: A theory of branches asgoals and virtual teams. CHASE 2011: 53-56
© Microsoft CorporationmainBranches at Microsoft
© Microsoft CorporationmainnetworkingmultimediaBranches at Microsoft
© Microsoft CorporationmainnetworkingmultimediaBranches at MicrosoftChanges are isolated=> Less build and test breaks
© Microsoft CorporationmainnetworkingmultimediaBranches at MicrosoftChanges are isolated=> Less build and test breaksintegration
© Microsoft CorporationmainnetworkingmultimediaBranches at MicrosoftChanges are isolated=> Less build and test breaksintegrationintegration
© Microsoft CorporationmainnetworkingmultimediaBranches at MicrosoftChanges are isolated=> Less build and test breaksProcess overheadTime delayintegrationintegration
© Microsoft CorporationCode Flow for a Single FileBlue nodes areedits to the fileOrange nodes aremove operations
© Microsoft CorporationBranch DecisionsHow do we coordinate paralleldevelopment?How do we structure the branchhierarchy? Can we reduce thecomplexity of branching?
© Microsoft CorporationBranch AnalyticsTechniques:• Survey developers to understand problems with branching• Mine source control for relationship of teams and branches• Simulate benefits and cost of alternative branch structuresActions/Tools:• Alert stakeholders about possible conflicts• Recommend branch structure (delete, create, fold branches)• Perform semi-automatic branch refactoring
© Microsoft CorporationWhich Branches Need Coordination?Compare all pairs of branches by file similarity and developersimilarity. Dark areas mean many branch pairs in that area.Same files, but different teammeans potential problemsSame files, but different teammeans potential problemsDifferent Files Same FilesDifferentTeamsSameTeams
© Microsoft CorporationAssessing a BranchSimulate alternate branch structure to assess cost andbenefit of individual branches• Cost: Average Delay Increase per EditHow much delay does a branch introduce into development?• Cost: Integrations per Edit on a BranchWhat is the integration/edit within a branch?• Benefit: Provided Isolation per EditHow many conflicts does a branch prevent per edit?
© Microsoft CorporationSimulating Removal of a Single BranchABintegration integrationABABACompare 1 with 4 to assess cost and benefit of branch B
© Microsoft CorporationParent BranchVictim BranchChild Branch41
© Microsoft CorporationParent BranchVictim BranchChild BranchTo releasebranch42
© Microsoft CorporationParent BranchVictim BranchChild BranchParent BranchVictim BranchChild Branch43Simulation (what-if)
© Microsoft CorporationParent BranchVictim BranchChild Branchfastercode flowParent BranchVictim BranchChild Branch44Simulation (what-if)
© Microsoft CorporationParent BranchVictim BranchChild Branchfastercode flowunneededintegrations removedParent BranchVictim BranchChild Branch45Simulation (what-if)
© Microsoft CorporationParent BranchVictim BranchChild Branchno longerisolatedfastercode flowunneededintegrations removedParent BranchVictim BranchChild Branchno longerisolatedno longerisolatedno longerisolatedno longerisolated46Simulation (what-if)
© Microsoft CorporationAssessing branchesDelay(Cost)Provided Isolation(Benefit)Green dotsare brancheswith high benefitand low costRed dotsare brancheswith high costbut low benefitEach dotis a branch
© Microsoft CorporationAssessing branchesDelay(Cost)Provided Isolation(Benefit)Green dotsare brancheswith high benefitand low costRed dotsare brancheswith high costbut low benefitEach dotis a branchIf high-cost-low-benefit had been removed,changes would each have saved 8.9 days of delayand only introduced 0.04 additional conflicts.
© Microsoft CorporationBuild tools forfrequent questionsUse data scientists forinfrequent questionsWhy did I show you this?Make it easier fordata scientist to build toolsFrequencyQuestions
© Microsoft Corporationhttp://aka.ms/145QuestionsAndrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientistsin Software Engineering. To appear ICSE 2014
© Microsoft CorporationMicrosoft’s Top 10 Questions EssentialEssential +WorthwhileHow do users typically use my application? 80.0% 99.2%What parts of a software product are most used and/or loved bycustomers?72.0% 98.5%How effective are the quality gates we run at checkin? 62.4% 96.6%How can we improve collaboration and sharing between teams? 54.5% 96.4%What are the best key performance indicators (KPIs) formonitoring services?53.2% 93.6%What is the impact of a code change or requirements change tothe project and its tests?52.1% 94.0%What is the impact of tools on productivity? 50.5% 97.2%How do I avoid reinventing the wheel by sharing and/or searchingfor code?50.0% 90.9%What are the common patterns of execution in my application? 48.7% 96.6%How well does test coverage correspond to actual code usage byour customers?48.7% 92.0%
© Microsoft CorporationRSSE for Software AnalyticsOpportunities• Provide recommendations– What analysis method to use and when?• How to understand results from data?• How to measure success/insight?• Provide tools to transform manualempirical analysis into reusable analysis
© Microsoft CorporationHello Clippy!Lessons Learned from RSSEsThomas ZimmermannMicrosoft Research© Microsoft Corporation © Microsoft CorporationSharing InsightsSharing MethodsSharing ModelsSharing Data© Microsoft Corporationhttp://aka.ms/145QuestionsAndrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientistsin Software Engineering. To appear ICSE 2014© Microsoft CorporationMy wish list:RSSEs for software analytics
© Microsoft CorporationThank you!