Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Confessions of an Industrial Researcher: A Typical Bollywood Story

Confessions of an Industrial Researcher: A Typical Bollywood Story

Keynote at the SER&IPs 2014 workshop. https://sites.google.com/site/serips2014/

Thomas Zimmermann

June 01, 2014
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. © Microsoft Corporation
    Confessions of an
    Industrial Researcher
    A Typical Bollywood Story
    Thomas Zimmermann, Microsoft Research

    View Slide

  2. © Microsoft Corporation
    There’s a guy.
    (Pictures from Kaho Naa Pyaar Hai)

    View Slide

  3. © Microsoft Corporation
    There’s a girl.

    View Slide

  4. © Microsoft Corporation
    In the beginning they don’t like each other.
    Then they fall in love

    View Slide

  5. © Microsoft Corporation

    View Slide

  6. © Microsoft Corporation
    But when they fall in love
    their families do not accept.

    View Slide

  7. © Microsoft Corporation
    In the end, how they unite is the
    story of many Bollywood movies.

    View Slide

  8. © Microsoft Corporation
    Bollywood Research
    Boy/Girl Academia/Industry

    View Slide

  9. © Microsoft Corporation
    Bollywood Research
    Boy/Girl Academia/Industry
    In the beginning they
    don’t like each other.
    “Practitioners are reluctant to share real industry
    data due to confidentiality agreements”
    “Researchers are mostly working on some dated
    or futuristic theoretical challenges”
    “Practitioners are looking for quick fixes to their
    problems instead of using systematic methods”
    “Case studies in research do not represent
    complexities of real projects”

    View Slide

  10. © Microsoft Corporation
    Bollywood Research
    Boy/Girl Academia/Industry
    In the beginning they
    don’t like each other.
    “Practitioners are reluctant to share real industry
    data due to confidentiality agreements”
    “Researchers are mostly working on some dated
    or futuristic theoretical challenges”
    “Practitioners are looking for quick fixes to their
    problems instead of using systematic methods”
    “Case studies in research do not represent
    complexities of real projects”
    Then they fall in love.

    View Slide

  11. © Microsoft Corporation
    Bollywood Research
    Boy/Girl Academia/Industry
    In the beginning they
    don’t like each other.
    “Practitioners are reluctant to share real industry
    data due to confidentiality agreements”
    “Researchers are mostly working on some dated
    or futuristic theoretical challenges”
    “Practitioners are looking for quick fixes to their
    problems instead of using systematic methods”
    “Case studies in research do not represent
    complexities of real projects”
    Then they fall in love.
    But when they fall in love
    their families do not accept.
    Difficult to publish results from
    industrial practice.

    View Slide

  12. © Microsoft Corporation
    Bollywood Research
    Boy/Girl Academia/Industry
    In the beginning they
    don’t like each other.
    “Practitioners are reluctant to share real industry
    data due to confidentiality agreements”
    “Researchers are mostly working on some dated
    or futuristic theoretical challenges”
    “Practitioners are looking for quick fixes to their
    problems instead of using systematic methods”
    “Case studies in research do not represent
    complexities of real projects”
    Then they fall in love.
    But when they fall in love
    their families do not accept.
    Difficult to publish results from
    industrial practice.
    In the end, how they unite is
    the story of many Bollywood
    movies.
    THIS
    WORKSHOP

    View Slide

  13. © Microsoft Corporation
    University
    of Passau
    Saarland
    University
    University
    of Calgary
    Microsoft
    Research
    PhD
    Assistant
    Professor
    (2007-2008)
    Researcher
    (since 2008)
    “Primary responsibilities of a researcher
    [at Microsoft] include conducting basic
    and applied research on the most
    challenging computer science problems.”

    View Slide

  14. © Microsoft Corporation

    View Slide

  15. © Microsoft Corporation
    My role as a match maker

    View Slide

  16. © Microsoft Corporation
    Three visiting researchers

    View Slide

  17. © Microsoft Corporation
    Miryung Kim

    View Slide

  18. © Microsoft Corporation
    Refactoring Benefits – Beliefs
    Refactoring improves software quality and
    maintainability
    A lack of refactoring incurs technical debt
    vs.
    Refactoring does not provide immediate
    benefits unlike bug fixes and new features
    18

    View Slide

  19. © Microsoft Corporation
    Refactoring Benefits – Evidence
    Bug fix time decreases after refactoring
    Defect density decreases after refactoring
    vs.
    Inconsistent refactoring causes bugs
    Code churn is correlated with defect density
    19

    View Slide

  20. © Microsoft Corporation
    Refactoring at Microsoft
    A Survey of
    refactoring practices
    Interviews with the Windows
    refactoring team
    Quantitative analysis of
    Windows 7 version history
    20

    View Slide

  21. © Microsoft Corporation
    Key findings
    • Refactoring is not confined to behavior preserving
    transformation.
    • Engineers face various challenges of doing refactoring
    • Refactoring engines are not used much.
    • Developers perceive that refactoring involves
    substantial cost and risk.
    • Refactoring is driven by immediate, concrete needs.
    • Refactored modules experienced significant reduction
    in dependencies and post-release defects.
    21
    Miryung Kim, Thomas Zimmermann, Nachiappan Nagappan:
    A field study of refactoring challenges and benefits. SIGSOFT FSE 2012: 50

    View Slide

  22. © Microsoft Corporation
    Emerson Murphy-Hill

    View Slide

  23. © Microsoft Corporation
    ONE BUG MANY FIXES
    FIND OUT THE TRUTH ABOUT HOW
    SOFTWARE ENGINEERS FIX BUGS
    Emerson R. Murphy-Hill, Thomas Zimmermann, Christian Bird, Nachiappan
    Nagappan: The design of bug fixes. ICSE 2013: 332-341

    View Slide

  24. © Microsoft Corporation
    the design space:
    what are the different ways that
    bugs can be fixed? (RQ1)
    navigating the design space:
    what factors influence which fix an
    engineer chooses? (RQ2)
    implications

    View Slide

  25. © Microsoft Corporation
    opportunistic
    interviews
    firehouse
    interviews
    triage
    meetings survey
    goal
    qualitative,
    minimally
    obtrusive
    qualitative,
    “fresh in mind”
    qualitative,
    collaborative
    decisions
    quantify
    observations
    protocol
    pick engineers
    in a building
    who appeared to
    be available
    pick engineers
    in a building
    who just closed
    a bug report
    take notes
    and observe
    in silence
    limited value
    because teams
    rarely discussed
    how to fix a bug.
    15-20 minute
    anonymous
    survey
    questions
    informed by
    qualitative
    findings.
    introductory exercise; ask about a/the
    most recent bug: software,
    symptoms, causes, more than one
    way to fix; if yes, explain in detail
    participants
    (dev + test)
    32 participants
    (8 each for four
    product groups)
    8 participants
    from a fifth
    product group
    6 triage meetings
    324 responses
    out of a random
    sample of 2000
    data coding with
    Atlas.TI
    coding with
    Atlas.TI
    read notes
    descriptive
    statistics

    View Slide

  26. © Microsoft Corporation
    the design space:
    what are the different ways that
    bugs can be fixed? (RQ1)
    navigating the design space:
    what factors influence which fix an
    engineer chooses? (RQ2)
    implications

    View Slide

  27. © Microsoft Corporation
    data propagation (across components):
    how far is information allowed to propagate? fix at source away from source
    error surface:
    how much information is revealed to users? error not revealed detailed error
    behavioral alternatives:
    is a fix perceptible to the user? no change must change behavior
    functionality removal:
    how much of a feature is removed during a bug fix? nothing everything
    refactoring:
    degree to which code is restructured. no restructuring significant
    internal vs. external:
    how much internal/external code is changed? only internal only external
    accuracy:
    degree to which the fix utilizes accurate information. accurate heuristics
    hardcoding:
    degree to which a fix hardcodes data. data generated data specified

    View Slide

  28. © Microsoft Corporation
    data propagation (across components):
    how far is information allowed to propagate? fix at source away from source
    error surface:
    how much information is revealed to users? error not revealed detailed error
    behavioral alternatives:
    is a fix perceptible to the user? no change must change behavior
    functionality removal:
    how much of a feature is removed during a bug fix? nothing everything
    refactoring:
    degree to which code is restructured. no restructuring significant
    internal vs. external:
    how much internal/external code is changed? only internal only external
    accuracy:
    degree to which the fix utilizes accurate information. accurate heuristics
    hardcoding:
    degree to which a fix hardcodes data. data generated data specified
    same bug: fix A fix B

    View Slide

  29. © Microsoft Corporation
    the design space:
    what are the different ways that
    bugs can be fixed? (RQ1)
    navigating the design space:
    what factors influence which fix an
    engineer chooses? (RQ2)
    implications

    View Slide

  30. © Microsoft Corporation
    risk management/development phase:
    taking more risks in earlier phases; risk of new bugs and risk of spending too
    much time on one bug (development phase: 72% “usually”/“always” in survey)
    interface breakage:
    degree to what a fix breaks existing interfaces (89%)
    consistency:
    degree to what a fix will be consistent with original design of the code (78%)
    user behavior:
    effect that users have on the fix. (usage frequency: 41%)
    cause understanding:
    how thoroughly does an engineer understand why a bug occurs.
    social factors:
    communication, feedback from other people, finding knowledgeable people,
    code ownership

    View Slide

  31. © Microsoft Corporation
    Tim Menzies

    View Slide

  32. © Microsoft Corporation
    Inductive engineering
    The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining.
    Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and Ekrem
    Kocaganeli. In MALETS 2011: Proceedings International Workshop on Machine
    Learning Technologies in Software Engineering

    View Slide

  33. © Microsoft Corporation
    Principle #1:
    Users before algorithms
    Mining algorithms are only useful in industry
    if users fund their use in real-world
    applications.

    View Slide

  34. © Microsoft Corporation
    Principle #2:
    Plan for scale
    In any industrial application, the data mining
    method is repeated multiples time to either
    answer an extra user question, make some
    enhancement and/or bug fix to the method,
    or to deploy it to a different set of users.

    View Slide

  35. © Microsoft Corporation
    Principle #3:
    Early feedback
    Continuous and early feedback from users,
    allows needed changes to be made as soon
    as possible (e.g. when they find that
    assumptions don’t match the users’
    perception) and without wasting heavy up-
    front investment.

    View Slide

  36. © Microsoft Corporation
    Principle #4:
    Be open-minded
    It is unwise to enter into an inductive study
    with fixed hypotheses or approaches
    particularly for data that has not been mined
    before. Don’t resist exploring additional
    avenues when a particular idea doesn’t work
    out.

    View Slide

  37. © Microsoft Corporation
    Principle #5:
    Do smart learning
    Important outcomes are riding on your
    conclusions. Make sure that you check and
    validate them.

    View Slide

  38. © Microsoft Corporation
    Principle #6:
    Live with the data you have
    You go mining with the data you have—
    not the data you might want or wish to have
    at a later time.

    View Slide

  39. © Microsoft Corporation
    Principle #7:
    Broad skill set, big toolkit
    Successful inductive engineers routinely try
    multiple inductive technologies.

    View Slide

  40. © Microsoft Corporation
    ESE Group in
    Summer 2012
    ESE Group in
    Summer 2013

    View Slide

  41. © Microsoft Corporation
    Thank you!

    View Slide