Slide 1

Slide 1 text

© Microsoft Corporation Confessions of an Industrial Researcher A Typical Bollywood Story Thomas Zimmermann, Microsoft Research

Slide 2

Slide 2 text

© Microsoft Corporation There’s a guy. (Pictures from Kaho Naa Pyaar Hai)

Slide 3

Slide 3 text

© Microsoft Corporation There’s a girl.

Slide 4

Slide 4 text

© Microsoft Corporation In the beginning they don’t like each other. Then they fall in love

Slide 5

Slide 5 text

© Microsoft Corporation

Slide 6

Slide 6 text

© Microsoft Corporation But when they fall in love their families do not accept.

Slide 7

Slide 7 text

© Microsoft Corporation In the end, how they unite is the story of many Bollywood movies.

Slide 8

Slide 8 text

© Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry

Slide 9

Slide 9 text

© Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects”

Slide 10

Slide 10 text

© Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love.

Slide 11

Slide 11 text

© Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love. But when they fall in love their families do not accept. Difficult to publish results from industrial practice.

Slide 12

Slide 12 text

© Microsoft Corporation Bollywood Research Boy/Girl Academia/Industry In the beginning they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love. But when they fall in love their families do not accept. Difficult to publish results from industrial practice. In the end, how they unite is the story of many Bollywood movies. THIS WORKSHOP

Slide 13

Slide 13 text

© Microsoft Corporation University of Passau Saarland University University of Calgary Microsoft Research PhD Assistant Professor (2007-2008) Researcher (since 2008) “Primary responsibilities of a researcher [at Microsoft] include conducting basic and applied research on the most challenging computer science problems.”

Slide 14

Slide 14 text

© Microsoft Corporation

Slide 15

Slide 15 text

© Microsoft Corporation My role as a match maker

Slide 16

Slide 16 text

© Microsoft Corporation Three visiting researchers

Slide 17

Slide 17 text

© Microsoft Corporation Miryung Kim

Slide 18

Slide 18 text

© Microsoft Corporation Refactoring Benefits – Beliefs Refactoring improves software quality and maintainability A lack of refactoring incurs technical debt vs. Refactoring does not provide immediate benefits unlike bug fixes and new features 18

Slide 19

Slide 19 text

© Microsoft Corporation Refactoring Benefits – Evidence Bug fix time decreases after refactoring Defect density decreases after refactoring vs. Inconsistent refactoring causes bugs Code churn is correlated with defect density 19

Slide 20

Slide 20 text

© Microsoft Corporation Refactoring at Microsoft A Survey of refactoring practices Interviews with the Windows refactoring team Quantitative analysis of Windows 7 version history 20

Slide 21

Slide 21 text

© Microsoft Corporation Key findings • Refactoring is not confined to behavior preserving transformation. • Engineers face various challenges of doing refactoring • Refactoring engines are not used much. • Developers perceive that refactoring involves substantial cost and risk. • Refactoring is driven by immediate, concrete needs. • Refactored modules experienced significant reduction in dependencies and post-release defects. 21 Miryung Kim, Thomas Zimmermann, Nachiappan Nagappan: A field study of refactoring challenges and benefits. SIGSOFT FSE 2012: 50

Slide 22

Slide 22 text

© Microsoft Corporation Emerson Murphy-Hill

Slide 23

Slide 23 text

© Microsoft Corporation ONE BUG MANY FIXES FIND OUT THE TRUTH ABOUT HOW SOFTWARE ENGINEERS FIX BUGS Emerson R. Murphy-Hill, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan: The design of bug fixes. ICSE 2013: 332-341

Slide 24

Slide 24 text

© Microsoft Corporation the design space: what are the different ways that bugs can be fixed? (RQ1) navigating the design space: what factors influence which fix an engineer chooses? (RQ2) implications

Slide 25

Slide 25 text

© Microsoft Corporation opportunistic interviews firehouse interviews triage meetings survey goal qualitative, minimally obtrusive qualitative, “fresh in mind” qualitative, collaborative decisions quantify observations protocol pick engineers in a building who appeared to be available pick engineers in a building who just closed a bug report take notes and observe in silence limited value because teams rarely discussed how to fix a bug. 15-20 minute anonymous survey questions informed by qualitative findings. introductory exercise; ask about a/the most recent bug: software, symptoms, causes, more than one way to fix; if yes, explain in detail participants (dev + test) 32 participants (8 each for four product groups) 8 participants from a fifth product group 6 triage meetings 324 responses out of a random sample of 2000 data coding with Atlas.TI coding with Atlas.TI read notes descriptive statistics

Slide 26

Slide 26 text

© Microsoft Corporation the design space: what are the different ways that bugs can be fixed? (RQ1) navigating the design space: what factors influence which fix an engineer chooses? (RQ2) implications

Slide 27

Slide 27 text

© Microsoft Corporation data propagation (across components): how far is information allowed to propagate? fix at source away from source error surface: how much information is revealed to users? error not revealed detailed error behavioral alternatives: is a fix perceptible to the user? no change must change behavior functionality removal: how much of a feature is removed during a bug fix? nothing everything refactoring: degree to which code is restructured. no restructuring significant internal vs. external: how much internal/external code is changed? only internal only external accuracy: degree to which the fix utilizes accurate information. accurate heuristics hardcoding: degree to which a fix hardcodes data. data generated data specified

Slide 28

Slide 28 text

© Microsoft Corporation data propagation (across components): how far is information allowed to propagate? fix at source away from source error surface: how much information is revealed to users? error not revealed detailed error behavioral alternatives: is a fix perceptible to the user? no change must change behavior functionality removal: how much of a feature is removed during a bug fix? nothing everything refactoring: degree to which code is restructured. no restructuring significant internal vs. external: how much internal/external code is changed? only internal only external accuracy: degree to which the fix utilizes accurate information. accurate heuristics hardcoding: degree to which a fix hardcodes data. data generated data specified same bug: fix A fix B

Slide 29

Slide 29 text

© Microsoft Corporation the design space: what are the different ways that bugs can be fixed? (RQ1) navigating the design space: what factors influence which fix an engineer chooses? (RQ2) implications

Slide 30

Slide 30 text

© Microsoft Corporation risk management/development phase: taking more risks in earlier phases; risk of new bugs and risk of spending too much time on one bug (development phase: 72% “usually”/“always” in survey) interface breakage: degree to what a fix breaks existing interfaces (89%) consistency: degree to what a fix will be consistent with original design of the code (78%) user behavior: effect that users have on the fix. (usage frequency: 41%) cause understanding: how thoroughly does an engineer understand why a bug occurs. social factors: communication, feedback from other people, finding knowledgeable people, code ownership

Slide 31

Slide 31 text

© Microsoft Corporation Tim Menzies

Slide 32

Slide 32 text

© Microsoft Corporation Inductive engineering The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining. Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine Learning Technologies in Software Engineering

Slide 33

Slide 33 text

© Microsoft Corporation Principle #1: Users before algorithms Mining algorithms are only useful in industry if users fund their use in real-world applications.

Slide 34

Slide 34 text

© Microsoft Corporation Principle #2: Plan for scale In any industrial application, the data mining method is repeated multiples time to either answer an extra user question, make some enhancement and/or bug fix to the method, or to deploy it to a different set of users.

Slide 35

Slide 35 text

© Microsoft Corporation Principle #3: Early feedback Continuous and early feedback from users, allows needed changes to be made as soon as possible (e.g. when they find that assumptions don’t match the users’ perception) and without wasting heavy up- front investment.

Slide 36

Slide 36 text

© Microsoft Corporation Principle #4: Be open-minded It is unwise to enter into an inductive study with fixed hypotheses or approaches particularly for data that has not been mined before. Don’t resist exploring additional avenues when a particular idea doesn’t work out.

Slide 37

Slide 37 text

© Microsoft Corporation Principle #5: Do smart learning Important outcomes are riding on your conclusions. Make sure that you check and validate them.

Slide 38

Slide 38 text

© Microsoft Corporation Principle #6: Live with the data you have You go mining with the data you have— not the data you might want or wish to have at a later time.

Slide 39

Slide 39 text

© Microsoft Corporation Principle #7: Broad skill set, big toolkit Successful inductive engineers routinely try multiple inductive technologies.

Slide 40

Slide 40 text

© Microsoft Corporation ESE Group in Summer 2012 ESE Group in Summer 2013

Slide 41

Slide 41 text

© Microsoft Corporation Thank you!