Exit Code • Service variables • Limited window into service internals • Can “track” variables to graph changes • Player Concurrency • Customer Service contacts • Player reports on forums
• Protocol Buffers only • All data is associated with registered Schema • Telemetry Development Kit • Multiple Datastores (Elastic, HDFS, Cassandra) • 7 day TTL for Elastic, much longer for HDFS • Cassandra for specific use cases
spread across all servers • Did any services or hosts unexpectedly terminate • Check for server crash emails • Compare concurrency to other Overwatch platforms • Compare concurrency to other Blizzard games • Spin up a bunch of resources to investigate if the drop was bad enough
of our data platform • Identify what is critical and focus there • Common flows like login, play a game • Critical flows like purchasing • Your instrumentation should get better over time • Define your KPIs
Overwatch • 78% were detected first by an Alert • 30% were recommended for review to improve monitoring • Did alert identify root cause • Time to detect incident • Time taken for ops staff to validate incident
Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 57 Please attribute Elastic with a link to elastic.co