Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Confessions of an Industrial Researcher: A Typi...

Confessions of an Industrial Researcher: A Typical Bollywood Story

Keynote at the SER&IPs 2014 workshop. https://sites.google.com/site/serips2014/

Thomas Zimmermann

June 01, 2014
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. © Microsoft Corporation Confessions of an Industrial Researcher A Typical

    Bollywood Story Thomas Zimmermann, Microsoft Research
  2. © Microsoft Corporation In the end, how they unite is

    the story of many Bollywood movies.
  3. © Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning

    they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects”
  4. © Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning

    they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love.
  5. © Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning

    they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love. But when they fall in love their families do not accept. Difficult to publish results from industrial practice.
  6. © Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning

    they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love. But when they fall in love their families do not accept. Difficult to publish results from industrial practice. In the end, how they unite is the story of many Bollywood movies. THIS WORKSHOP
  7. © Microsoft Corporation University of Passau Saarland University University of

    Calgary Microsoft Research PhD Assistant Professor (2007-2008) Researcher (since 2008) “Primary responsibilities of a researcher [at Microsoft] include conducting basic and applied research on the most challenging computer science problems.”
  8. © Microsoft Corporation Refactoring Benefits – Beliefs Refactoring improves software

    quality and maintainability A lack of refactoring incurs technical debt vs. Refactoring does not provide immediate benefits unlike bug fixes and new features 18
  9. © Microsoft Corporation Refactoring Benefits – Evidence Bug fix time

    decreases after refactoring Defect density decreases after refactoring vs. Inconsistent refactoring causes bugs Code churn is correlated with defect density 19
  10. © Microsoft Corporation Refactoring at Microsoft A Survey of refactoring

    practices Interviews with the Windows refactoring team Quantitative analysis of Windows 7 version history 20
  11. © Microsoft Corporation Key findings • Refactoring is not confined

    to behavior preserving transformation. • Engineers face various challenges of doing refactoring • Refactoring engines are not used much. • Developers perceive that refactoring involves substantial cost and risk. • Refactoring is driven by immediate, concrete needs. • Refactored modules experienced significant reduction in dependencies and post-release defects. 21 Miryung Kim, Thomas Zimmermann, Nachiappan Nagappan: A field study of refactoring challenges and benefits. SIGSOFT FSE 2012: 50
  12. © Microsoft Corporation ONE BUG MANY FIXES FIND OUT THE

    TRUTH ABOUT HOW SOFTWARE ENGINEERS FIX BUGS Emerson R. Murphy-Hill, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan: The design of bug fixes. ICSE 2013: 332-341
  13. © Microsoft Corporation the design space: what are the different

    ways that bugs can be fixed? (RQ1) navigating the design space: what factors influence which fix an engineer chooses? (RQ2) implications
  14. © Microsoft Corporation opportunistic interviews firehouse interviews triage meetings survey

    goal qualitative, minimally obtrusive qualitative, “fresh in mind” qualitative, collaborative decisions quantify observations protocol pick engineers in a building who appeared to be available pick engineers in a building who just closed a bug report take notes and observe in silence limited value because teams rarely discussed how to fix a bug. 15-20 minute anonymous survey questions informed by qualitative findings. introductory exercise; ask about a/the most recent bug: software, symptoms, causes, more than one way to fix; if yes, explain in detail participants (dev + test) 32 participants (8 each for four product groups) 8 participants from a fifth product group 6 triage meetings 324 responses out of a random sample of 2000 data coding with Atlas.TI coding with Atlas.TI read notes descriptive statistics
  15. © Microsoft Corporation the design space: what are the different

    ways that bugs can be fixed? (RQ1) navigating the design space: what factors influence which fix an engineer chooses? (RQ2) implications
  16. © Microsoft Corporation data propagation (across components): how far is

    information allowed to propagate? fix at source away from source error surface: how much information is revealed to users? error not revealed detailed error behavioral alternatives: is a fix perceptible to the user? no change must change behavior functionality removal: how much of a feature is removed during a bug fix? nothing everything refactoring: degree to which code is restructured. no restructuring significant internal vs. external: how much internal/external code is changed? only internal only external accuracy: degree to which the fix utilizes accurate information. accurate heuristics hardcoding: degree to which a fix hardcodes data. data generated data specified
  17. © Microsoft Corporation data propagation (across components): how far is

    information allowed to propagate? fix at source away from source error surface: how much information is revealed to users? error not revealed detailed error behavioral alternatives: is a fix perceptible to the user? no change must change behavior functionality removal: how much of a feature is removed during a bug fix? nothing everything refactoring: degree to which code is restructured. no restructuring significant internal vs. external: how much internal/external code is changed? only internal only external accuracy: degree to which the fix utilizes accurate information. accurate heuristics hardcoding: degree to which a fix hardcodes data. data generated data specified same bug: fix A fix B
  18. © Microsoft Corporation the design space: what are the different

    ways that bugs can be fixed? (RQ1) navigating the design space: what factors influence which fix an engineer chooses? (RQ2) implications
  19. © Microsoft Corporation risk management/development phase: taking more risks in

    earlier phases; risk of new bugs and risk of spending too much time on one bug (development phase: 72% “usually”/“always” in survey) interface breakage: degree to what a fix breaks existing interfaces (89%) consistency: degree to what a fix will be consistent with original design of the code (78%) user behavior: effect that users have on the fix. (usage frequency: 41%) cause understanding: how thoroughly does an engineer understand why a bug occurs. social factors: communication, feedback from other people, finding knowledgeable people, code ownership
  20. © Microsoft Corporation Inductive engineering The Inductive Software Engineering Manifesto:

    Principles for Industrial Data Mining. Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine Learning Technologies in Software Engineering
  21. © Microsoft Corporation Principle #1: Users before algorithms Mining algorithms

    are only useful in industry if users fund their use in real-world applications.
  22. © Microsoft Corporation Principle #2: Plan for scale In any

    industrial application, the data mining method is repeated multiples time to either answer an extra user question, make some enhancement and/or bug fix to the method, or to deploy it to a different set of users.
  23. © Microsoft Corporation Principle #3: Early feedback Continuous and early

    feedback from users, allows needed changes to be made as soon as possible (e.g. when they find that assumptions don’t match the users’ perception) and without wasting heavy up- front investment.
  24. © Microsoft Corporation Principle #4: Be open-minded It is unwise

    to enter into an inductive study with fixed hypotheses or approaches particularly for data that has not been mined before. Don’t resist exploring additional avenues when a particular idea doesn’t work out.
  25. © Microsoft Corporation Principle #5: Do smart learning Important outcomes

    are riding on your conclusions. Make sure that you check and validate them.
  26. © Microsoft Corporation Principle #6: Live with the data you

    have You go mining with the data you have— not the data you might want or wish to have at a later time.
  27. © Microsoft Corporation Principle #7: Broad skill set, big toolkit

    Successful inductive engineers routinely try multiple inductive technologies.