they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects”
they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love.
they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love. But when they fall in love their families do not accept. Difficult to publish results from industrial practice.
they don’t like each other. “Practitioners are reluctant to share real industry data due to confidentiality agreements” “Researchers are mostly working on some dated or futuristic theoretical challenges” “Practitioners are looking for quick fixes to their problems instead of using systematic methods” “Case studies in research do not represent complexities of real projects” Then they fall in love. But when they fall in love their families do not accept. Difficult to publish results from industrial practice. In the end, how they unite is the story of many Bollywood movies. THIS WORKSHOP
Calgary Microsoft Research PhD Assistant Professor (2007-2008) Researcher (since 2008) “Primary responsibilities of a researcher [at Microsoft] include conducting basic and applied research on the most challenging computer science problems.”
quality and maintainability A lack of refactoring incurs technical debt vs. Refactoring does not provide immediate benefits unlike bug fixes and new features 18
decreases after refactoring Defect density decreases after refactoring vs. Inconsistent refactoring causes bugs Code churn is correlated with defect density 19
to behavior preserving transformation. • Engineers face various challenges of doing refactoring • Refactoring engines are not used much. • Developers perceive that refactoring involves substantial cost and risk. • Refactoring is driven by immediate, concrete needs. • Refactored modules experienced significant reduction in dependencies and post-release defects. 21 Miryung Kim, Thomas Zimmermann, Nachiappan Nagappan: A field study of refactoring challenges and benefits. SIGSOFT FSE 2012: 50
TRUTH ABOUT HOW SOFTWARE ENGINEERS FIX BUGS Emerson R. Murphy-Hill, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan: The design of bug fixes. ICSE 2013: 332-341
goal qualitative, minimally obtrusive qualitative, “fresh in mind” qualitative, collaborative decisions quantify observations protocol pick engineers in a building who appeared to be available pick engineers in a building who just closed a bug report take notes and observe in silence limited value because teams rarely discussed how to fix a bug. 15-20 minute anonymous survey questions informed by qualitative findings. introductory exercise; ask about a/the most recent bug: software, symptoms, causes, more than one way to fix; if yes, explain in detail participants (dev + test) 32 participants (8 each for four product groups) 8 participants from a fifth product group 6 triage meetings 324 responses out of a random sample of 2000 data coding with Atlas.TI coding with Atlas.TI read notes descriptive statistics
information allowed to propagate? fix at source away from source error surface: how much information is revealed to users? error not revealed detailed error behavioral alternatives: is a fix perceptible to the user? no change must change behavior functionality removal: how much of a feature is removed during a bug fix? nothing everything refactoring: degree to which code is restructured. no restructuring significant internal vs. external: how much internal/external code is changed? only internal only external accuracy: degree to which the fix utilizes accurate information. accurate heuristics hardcoding: degree to which a fix hardcodes data. data generated data specified
information allowed to propagate? fix at source away from source error surface: how much information is revealed to users? error not revealed detailed error behavioral alternatives: is a fix perceptible to the user? no change must change behavior functionality removal: how much of a feature is removed during a bug fix? nothing everything refactoring: degree to which code is restructured. no restructuring significant internal vs. external: how much internal/external code is changed? only internal only external accuracy: degree to which the fix utilizes accurate information. accurate heuristics hardcoding: degree to which a fix hardcodes data. data generated data specified same bug: fix A fix B
earlier phases; risk of new bugs and risk of spending too much time on one bug (development phase: 72% “usually”/“always” in survey) interface breakage: degree to what a fix breaks existing interfaces (89%) consistency: degree to what a fix will be consistent with original design of the code (78%) user behavior: effect that users have on the fix. (usage frequency: 41%) cause understanding: how thoroughly does an engineer understand why a bug occurs. social factors: communication, feedback from other people, finding knowledgeable people, code ownership
Principles for Industrial Data Mining. Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine Learning Technologies in Software Engineering
industrial application, the data mining method is repeated multiples time to either answer an extra user question, make some enhancement and/or bug fix to the method, or to deploy it to a different set of users.
feedback from users, allows needed changes to be made as soon as possible (e.g. when they find that assumptions don’t match the users’ perception) and without wasting heavy up- front investment.
to enter into an inductive study with fixed hypotheses or approaches particularly for data that has not been mined before. Don’t resist exploring additional avenues when a particular idea doesn’t work out.