Data Hard with a Vengeance

Data Hard with a Vengeance

Invited talk presented at the FSE 2014 conference.

10b546a258d03212c0c73aae9603c095?s=128

Thomas Zimmermann

November 20, 2014
Tweet

Transcript

  1. © Microsoft Corporation

  2. © Microsoft Corporation

  3. © Microsoft Corporation

  4. © Microsoft Corporation “On the fountain, there should be 2

    jugs, do you see them? A 5 gallon and a 3 gallon. Fill one of the jug with exactly 4 gallons of water and place it on the scale and the timer will stop. It must be precise, one ounce of more or less will result in detonation. If you're still alive in 5 minutes, we'll speak.”
  5. © Microsoft Corporation

  6. © Microsoft Corporation

  7. © Microsoft Corporation Action movies Software development Heroes save the

    world Engineers build software Tight deadlines Yes Yes Wrong information can be disastrous Exploding bombs World domination Cancelled/delayed projects Low quality software Lost data The ending Usually happy end. Sometimes happy end.
  8. © Microsoft Corporation The personal health assistant Baymax from the

    Disney picture “Big Hero 6”
  9. © Microsoft Corporation Empower people involved with software to make

    sound data-driven decisions about software. Bug tracking Software analytics Games analytics Software quality Process improvement (branches, build) Productivity
  10. © Microsoft Corporation ESE Group in Summer 2014 ESE Group

    in Summer 2013
  11. © Microsoft Corporation

  12. © Microsoft Corporation CodeMine

  13. © Microsoft Corporation Six years ago…

  14. © Microsoft Corporation Windows Brendan Murphy Nachi Nagappan

  15. © Microsoft Corporation Build Organization Source Code Work Item Code

    Review Test calls resolves opens resolves submits belongs to implements requests comments on submits as belongs to tests uses edits submitted into moves defines defines works with Process Information ships from ships created on built on Schedule Product Test Job Executable Integration Branch Change Source File Procedure / Method Class / Type Review Feature / Defect Person Jacek Czerwonka, Nachiappan Nagappan, Wolfram Schulte, Brendan Murphy: CODEMINE: Building a Software Development Data Analytics Platform at Microsoft. IEEE Software 30(4): 64-71 (2013)
  16. © Microsoft Corporation Risk Prediction

  17. © Microsoft Corporation

  18. © Microsoft Corporation main networking multimedia Branches at Microsoft Changes

    are isolated => Less build and test breaks Process overhead Time delay (velocity) integration integration
  19. © Microsoft Corporation Blue nodes are edits to the file

    Orange nodes are move operations
  20. © Microsoft Corporation Visualizing code velocity Mostly edits Mostly integrations

    Avg. time for CL to reach next branch: <= 1 week >= 3 weeks
  21. © Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit)

    Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch Christian Bird, Thomas Zimmermann: Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012
  22. © Microsoft Corporation Assessing branches Delay (Cost) Provided Isolation (Benefit)

    Green dots are branches with high benefit and low cost Red dots are branches with high cost but low benefit Each dot is a branch Christian Bird, Thomas Zimmermann: Assessing the value of branches with what-if analysis. SIGSOFT FSE 2012 If high-cost-low-benefit had been removed, changes would each have saved 8.9 days of delay and only introduced 0.04 additional conflicts.
  23. © Microsoft Corporation Simplified branch trees Branching Taxonomy. B. Murphy,

    J. Czerwonka, and L. Williams. Microsoft Research Technical Report. MSR-TR-2014-23. http://research.microsoft.com/apps/pubs/?id=209683
  24. © Microsoft Corporation Field Studies

  25. © Microsoft Corporation

  26. © Microsoft Corporation Cowboys, ankle sprains, and keepers of quality:

    How is video game development different from software development? Emerson R. Murphy-Hill, Thomas Zimmermann, Nachiappan Nagappan. ICSE 2014 Understanding and improving software build teams. Shaun Phillips, Thomas Zimmermann, Christian Bird. ICSE 2014 A field study of refactoring challenges and benefits. Miryung Kim, Thomas Zimmermann, Nachiappan Nagappan. SIGSOFT FSE 2012. Refactoring
  27. © Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This!

    145 Questions for Data Scientists in Software Engineering. ICSE 2014
  28. © Microsoft Corporation Microsoft’s Top 10 Questions Essential Essential +

    Worthwhile How do users typically use my application? 80.0% 99.2% What parts of a software product are most used and/or loved by customers? 72.0% 98.5% How effective are the quality gates we run at checkin? 62.4% 96.6% How can we improve collaboration and sharing between teams? 54.5% 96.4% What are the best key performance indicators (KPIs) for monitoring services? 53.2% 93.6% What is the impact of a code change or requirements change to the project and its tests? 52.1% 94.0% What is the impact of tools on productivity? 50.5% 97.2% How do I avoid reinventing the wheel by sharing and/or searching for code? 50.0% 90.9% What are the common patterns of execution in my application? 48.7% 96.6% How well does test coverage correspond to actual code usage by our customers? 48.7% 92.0% More at http://aka.ms/145Questions
  29. © Microsoft Corporation Games

  30. © Microsoft Corporation Thanks to our collaborators in Xbox, Microsoft

    Games Studios, and Turn 10. Thanks to interns Ken Hullett, Sauvik Das, Jeff Huang, Gifford Cheung, Thomas Debeauvais, Erik Harpstead and visiting researchers Tim Menzies and Emerson Murphy-Hill. Xbox Live Influence of games and achievements on (paid) Xbox live memberships Influence of friends on titles played Characterizing players with Xbox Live data Gameplay Impact of social behavior on retention (Beta of a AAA title) Influence of gameplay on skill (Halo Reach) => CHI 2013 Assists in a car racing game (Forza 4) => FDG 2014 How to create a successful initial session in games => CHI Play 2014 Engineering Differences between game and traditional software development => ICSE 2014 Lessons learned from game development (ongoing) Mining software repositories from games (ongoing) Exploratory Personalization with Avatars in Xbox Geographic influence, temporal influence, and structural influence
  31. © Microsoft Corporation Driving skill in Forza Motorsports 4 5%

    of player base, sampled randomly 200k players who played 25M races Assist usage Assist transitions Thomas Debeauvais, Thomas Zimmermann, Nachiappan Nagappan, Kevin Carter, Ryan Cooper, Dan Greenawalt, Tyson Solberg: An Empirical Study of Driving Skill in Forza Motorsports 4. FDG 2014
  32. © Microsoft Corporation

  33. © Microsoft Corporation Approaching a turn in Forza 4 –

    in EASY mode –
  34. © Microsoft Corporation Approaching a turn in Forza 4 –

    in HARD mode –
  35. © Microsoft Corporation The assist bundles in Forza 4 Easy

    Medium Hard Advanced Expert Stability prevents the car from spinning when cornering too fast ON OFF Traction prevents the car from spinning when accelerating ON OFF Braking supports the player when he/she brakes or should brake Assisted w/ ABS ABS OFF Shifting helps the player in passing gears Automatic w/o clutch Manual w/o clutch Manual w/ clutch Line overlays the optimal trajectory to follow on the track Full Brake OFF Damage determines how much the performance of the car can change during the race Cosmetic Limited Simulation
  36. © Microsoft Corporation number of races Assist usage over number

    of races career mode online multiplayer number of races
  37. © Microsoft Corporation Assist transitions enabled disabled race before race

    after yoyo failure success time time The player disables the assist
  38. © Microsoft Corporation Assist transitions enabled disabled race before race

    after yoyo failure success time time The player disables the assist
  39. © Microsoft Corporation 0% 10% 20% 30% 40% 50% 60%

    70% 80% 90% 100% success failure yoyo never disabled Assist transitions
  40. © Microsoft Corporation Factors that contribute to the success of

    disabling an assist Factor More likely to keep the assist disabled … Significant for … Number of races Players who disable an assist early All assists Races per day Players who race fewer games a day All assists Rear-wheel drive (race before) Players who drove a car with rear-wheel drive All assists Car Performance Index (race before) Players who drove a car with lower PI All assists Position (race before) Players who finished first All assists but Traction and Clutch Career mode (race before) Players who did not play career mode Autobrake, ABS, Autoshift, Full line, Brake line
  41. © Microsoft Corporation “Your work has been incredibly helpful to

    my team. Just this week, we’ve had 8 hours of meeting to design our core gameplay loops based directly on your data. Quite literally, we project your data and the player profiles on the wall while we design. Its awesome.” About our work on a different game title.
  42. © Microsoft Corporation How to measure insight? Amount of discussion

    the insight generates? Number of times the users invite you back? Number of issues visited and retired in a meeting? Number of hypotheses rejected? Tim Menzies
  43. © Microsoft Corporation Questions you want to ask Questions data

    supports Questions user cares about Inductive Engineering Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte, Ekrem Kocaganeli. The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining. MALETS 2011
  44. © Microsoft Corporation Inductive Engineering 1. Users before algorithms 2.

    Plan for scale 3. Early feedback 4. Be open-minded 5. Do smart learning 6. Live with the data you have 7. Broad skill set, big toolkit