Debugging in the (Very) Large: Ten Years of Implementation and ExperienceKirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt
Experience Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt presented by raven 2019/7/1 This slides based on author’s slide.
supports debug for large system. • Authors: Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt • Authors’ institutions: Microsoft Corporation • Appear in: Symposium on Operating Systems Principles 2013 2
the number of software components in a single system grows to the hundreds and the number of deployed systems grows to the millions, strategies that worked in the small, like asking programmers to triage individual error reports, fail. With hundreds of components, it becomes much harder to isolate the root cause of an error. With millions of systems, the sheer volume of error reports for even obscure bugs can become overwhelming. Worse still, prioritizing error reports from millions of users becomes arbitrary and ad hoc. 3
How do we find out when things go wrong • They want to ◦ fix bugs on every Windows system. ◦ collect every error. ◦ prioritize bugs that affect the most users. ◦ generalize the solution to be used by any programmer 6
the Debugging Tools for Windows - input is a error report, output is bucket ID - runs on WER servers(and programmers desktops) • 500 heuristics - grows ~1 heuristics/week 14
• sends report to servers • !analyze buckets the error with similar reports • increments the bucket count • programmers prioritize buckets with highest count • Problems - only upload first few hits on a bucket. - programmers request additional data as needed 15
- up to 40% of error reports - extra server load - duplicate buckets must be triaged • Multiple bugs can hit one bucket - up to 4% of error reports - harder to isolate each bug 17
by WER grew by a factor of 30(2003-2009) • Finding Bugs - The Windows Vista programmers fixed 5,000 bugs. • Bucketing Effectiveness - The top 500 buckets account for 65% of all error reports for Vista. 18
reporting system with automatic diagnosis - the largest client-server system in the world (by installs) - helped 700 companies fix 1000s of bugs and billions of errors - fundamentally changed software development at MS 19