Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Hard with a Vengeance

Data Hard with a Vengeance

Invited talk presented at the FSE 2014 conference.

Thomas Zimmermann

November 20, 2014
Tweet

More Decks by Thomas Zimmermann

Other Decks in Research

Transcript

  1. © Microsoft Corporation

    View Slide

  2. © Microsoft Corporation

    View Slide

  3. © Microsoft Corporation

    View Slide

  4. © Microsoft Corporation
    “On the fountain, there should be 2 jugs, do you see
    them? A 5 gallon and a 3 gallon. Fill one of the jug
    with exactly 4 gallons of water and place it on the
    scale and the timer will stop. It must be precise, one
    ounce of more or less will result in detonation.
    If you're still alive in 5 minutes, we'll speak.”

    View Slide

  5. © Microsoft Corporation

    View Slide

  6. © Microsoft Corporation

    View Slide

  7. © Microsoft Corporation
    Action movies Software development
    Heroes save the world Engineers build software
    Tight deadlines
    Yes Yes
    Wrong information can be disastrous
    Exploding bombs
    World domination
    Cancelled/delayed projects
    Low quality software
    Lost data
    The ending
    Usually happy end. Sometimes happy end.

    View Slide

  8. © Microsoft Corporation
    The personal health assistant Baymax
    from the Disney picture
    “Big Hero 6”

    View Slide

  9. © Microsoft Corporation
    Empower people involved with software
    to make sound data-driven decisions
    about software.
    Bug tracking
    Software analytics
    Games analytics
    Software quality
    Process improvement
    (branches, build)
    Productivity

    View Slide

  10. © Microsoft Corporation
    ESE Group in
    Summer 2014
    ESE Group in
    Summer 2013

    View Slide

  11. © Microsoft Corporation

    View Slide

  12. © Microsoft Corporation
    CodeMine

    View Slide

  13. © Microsoft Corporation
    Six years ago…

    View Slide

  14. © Microsoft Corporation
    Windows
    Brendan Murphy Nachi Nagappan

    View Slide

  15. © Microsoft Corporation
    Build
    Organization
    Source Code
    Work Item
    Code Review
    Test
    calls
    resolves
    opens
    resolves
    submits
    belongs to
    implements
    requests
    comments on
    submits
    as
    belongs to
    tests
    uses
    edits
    submitted
    into
    moves
    defines
    defines
    works
    with
    Process Information
    ships
    from
    ships
    created on
    built on
    Schedule
    Product
    Test Job
    Executable
    Integration
    Branch
    Change
    Source
    File
    Procedure /
    Method
    Class / Type
    Review
    Feature /
    Defect
    Person
    Jacek Czerwonka, Nachiappan Nagappan, Wolfram Schulte, Brendan Murphy:
    CODEMINE: Building a Software Development Data Analytics Platform at Microsoft.
    IEEE Software 30(4): 64-71 (2013)

    View Slide

  16. © Microsoft Corporation
    Risk Prediction

    View Slide

  17. © Microsoft Corporation

    View Slide

  18. © Microsoft Corporation
    main
    networking
    multimedia
    Branches at Microsoft
    Changes are isolated
    => Less build and test breaks
    Process overhead
    Time delay (velocity)
    integration
    integration

    View Slide

  19. © Microsoft Corporation
    Blue nodes are
    edits to the file
    Orange nodes are
    move operations

    View Slide

  20. © Microsoft Corporation
    Visualizing code velocity
    Mostly edits
    Mostly integrations
    Avg. time for CL to reach next branch:
    <= 1 week
    >= 3 weeks

    View Slide

  21. © Microsoft Corporation
    Assessing branches
    Delay
    (Cost)
    Provided Isolation
    (Benefit)
    Green dots
    are branches
    with high benefit
    and low cost
    Red dots
    are branches
    with high cost
    but low benefit
    Each dot
    is a branch
    Christian Bird, Thomas Zimmermann:
    Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012

    View Slide

  22. © Microsoft Corporation
    Assessing branches
    Delay
    (Cost)
    Provided Isolation
    (Benefit)
    Green dots
    are branches
    with high benefit
    and low cost
    Red dots
    are branches
    with high cost
    but low benefit
    Each dot
    is a branch
    Christian Bird, Thomas Zimmermann:
    Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012
    If high-cost-low-benefit had been removed,
    changes would each have saved 8.9 days of delay
    and only introduced 0.04 additional conflicts.

    View Slide

  23. © Microsoft Corporation
    Simplified branch trees
    Branching Taxonomy. B. Murphy, J. Czerwonka, and L. Williams. Microsoft Research
    Technical Report. MSR-TR-2014-23. http://research.microsoft.com/apps/pubs/?id=209683

    View Slide

  24. © Microsoft Corporation
    Field Studies

    View Slide

  25. © Microsoft Corporation

    View Slide

  26. © Microsoft Corporation
    Cowboys, ankle sprains, and keepers of quality: How is
    video game development different from
    software development? Emerson R. Murphy-Hill,
    Thomas Zimmermann, Nachiappan Nagappan. ICSE 2014
    Understanding and improving software
    build teams.
    Shaun Phillips, Thomas Zimmermann, Christian Bird.
    ICSE 2014
    A field study of
    refactoring challenges and benefits.
    Miryung Kim, Thomas Zimmermann, Nachiappan Nagappan.
    SIGSOFT FSE 2012.
    Refactoring

    View Slide

  27. © Microsoft Corporation
    http://aka.ms/145Questions
    Andrew Begel, Thomas Zimmermann.
    Analyze This! 145 Questions for Data Scientists in Software Engineering. ICSE 2014

    View Slide

  28. © Microsoft Corporation
    Microsoft’s Top 10 Questions Essential
    Essential +
    Worthwhile
    How do users typically use my application? 80.0% 99.2%
    What parts of a software product are most used and/or loved by
    customers?
    72.0% 98.5%
    How effective are the quality gates we run at checkin? 62.4% 96.6%
    How can we improve collaboration and sharing between teams? 54.5% 96.4%
    What are the best key performance indicators (KPIs) for
    monitoring services?
    53.2% 93.6%
    What is the impact of a code change or requirements change to
    the project and its tests?
    52.1% 94.0%
    What is the impact of tools on productivity? 50.5% 97.2%
    How do I avoid reinventing the wheel by sharing and/or searching
    for code?
    50.0% 90.9%
    What are the common patterns of execution in my application? 48.7% 96.6%
    How well does test coverage correspond to actual code usage by
    our customers?
    48.7% 92.0%
    More at http://aka.ms/145Questions

    View Slide

  29. © Microsoft Corporation
    Games

    View Slide

  30. © Microsoft Corporation
    Thanks to our collaborators in Xbox, Microsoft Games Studios, and Turn 10.
    Thanks to interns Ken Hullett, Sauvik Das, Jeff Huang, Gifford Cheung, Thomas Debeauvais,
    Erik Harpstead and visiting researchers Tim Menzies and Emerson Murphy-Hill.
    Xbox Live Influence of games and achievements on (paid) Xbox live memberships
    Influence of friends on titles played
    Characterizing players with Xbox Live data
    Gameplay Impact of social behavior on retention (Beta of a AAA title)
    Influence of gameplay on skill (Halo Reach) => CHI 2013
    Assists in a car racing game (Forza 4) => FDG 2014
    How to create a successful initial session in games => CHI Play 2014
    Engineering Differences between game and traditional software development => ICSE 2014
    Lessons learned from game development (ongoing)
    Mining software repositories from games (ongoing)
    Exploratory Personalization with Avatars in Xbox
    Geographic influence, temporal influence, and structural influence

    View Slide

  31. © Microsoft Corporation
    Driving skill in Forza Motorsports 4
    5% of player base,
    sampled randomly
    200k players who
    played 25M races
    Assist usage
    Assist transitions
    Thomas Debeauvais, Thomas Zimmermann, Nachiappan Nagappan, Kevin Carter, Ryan Cooper,
    Dan Greenawalt, Tyson Solberg: An Empirical Study of Driving Skill in Forza Motorsports 4.
    FDG 2014

    View Slide

  32. © Microsoft Corporation

    View Slide

  33. © Microsoft Corporation
    Approaching a turn in Forza 4
    – in EASY mode –

    View Slide

  34. © Microsoft Corporation
    Approaching a turn in Forza 4
    – in HARD mode –

    View Slide

  35. © Microsoft Corporation
    The assist bundles in Forza 4
    Easy Medium Hard Advanced Expert
    Stability
    prevents the car from spinning
    when cornering too fast
    ON OFF
    Traction
    prevents the car from spinning
    when accelerating
    ON OFF
    Braking
    supports the player when he/she
    brakes or should brake
    Assisted
    w/ ABS
    ABS OFF
    Shifting helps the player in passing gears
    Automatic
    w/o clutch
    Manual
    w/o clutch
    Manual
    w/ clutch
    Line
    overlays the optimal trajectory to
    follow on the track
    Full Brake OFF
    Damage
    determines how much the
    performance of the car can
    change during the race
    Cosmetic Limited Simulation

    View Slide

  36. © Microsoft Corporation
    number of races
    Assist usage over number of races
    career mode
    online multiplayer
    number of races

    View Slide

  37. © Microsoft Corporation
    Assist transitions
    enabled
    disabled
    race before
    race after
    yoyo
    failure
    success
    time
    time
    The player disables
    the assist

    View Slide

  38. © Microsoft Corporation
    Assist transitions
    enabled
    disabled
    race before
    race after
    yoyo
    failure
    success
    time
    time
    The player disables
    the assist

    View Slide

  39. © Microsoft Corporation
    0%
    10%
    20%
    30%
    40%
    50%
    60%
    70%
    80%
    90%
    100%
    success failure yoyo never disabled
    Assist transitions

    View Slide

  40. © Microsoft Corporation
    Factors that contribute to the success
    of disabling an assist
    Factor More likely to keep the assist disabled … Significant for …
    Number of races Players who disable an assist early All assists
    Races per day Players who race fewer games a day All assists
    Rear-wheel drive
    (race before)
    Players who drove a car with rear-wheel
    drive
    All assists
    Car Performance Index
    (race before)
    Players who drove a car with lower PI All assists
    Position
    (race before)
    Players who finished first All assists but
    Traction and Clutch
    Career mode
    (race before)
    Players who did not play career mode Autobrake, ABS,
    Autoshift, Full line,
    Brake line

    View Slide

  41. © Microsoft Corporation
    “Your work has been incredibly helpful to my
    team. Just this week, we’ve had 8 hours of
    meeting to design our core gameplay loops
    based directly on your data. Quite literally,
    we project your data and the player profiles
    on the wall while we design. Its awesome.”
    About our work on a different game title.

    View Slide

  42. © Microsoft Corporation
    How to measure insight?
    Amount of discussion the
    insight generates?
    Number of times the users
    invite you back?
    Number of issues visited and
    retired in a meeting?
    Number of
    hypotheses rejected?
    Tim Menzies

    View Slide

  43. © Microsoft Corporation
    Questions you
    want to ask
    Questions
    data
    supports
    Questions
    user cares
    about
    Inductive Engineering
    Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte, Ekrem Kocaganeli.
    The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining.
    MALETS 2011

    View Slide

  44. © Microsoft Corporation
    Inductive Engineering
    1. Users before algorithms
    2. Plan for scale
    3. Early feedback
    4. Be open-minded
    5. Do smart learning
    6. Live with the data you have
    7. Broad skill set, big toolkit

    View Slide