Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Watching Over Overwatch: An Elastic Story

Elastic{ON} 2018 - Watching Over Overwatch: An Elastic Story

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Blizzard Entertainment February 28, 2018 Chris Burkhart: @ctide / Bill

    Warnecke: @ww Watching Overwatch at Activision Blizzard Chris Burkhart, Technical Lead, Principal I Bill Warnecke, Lead Software Engineer, Principal I
  2. Who are we? Chris Burkhart Technical Lead, Principal I Battle.net

    – Data Team William Warnecke Lead Software Engineer, Principal I Team 4 – Overwatch
  3. What This Talk Covers • Quick History • Blizzard’s Global

    Data Platform • Walkthrough of BEAM • Overwatch Monitoring • Future
  4. Quick History • 1991 - Founded as Silicon & Synapse

    • 1996 - Battle.net Classic • 2000 - Diablo II • 2004 - World of Warcraft • 2016 - Overwatch
  5. Earliest Monitoring • Host status • Physical or VM compute

    • Basic Hardware Utilization • CPU • Memory • Disk • OS Data • TCP Retransmit • File Descriptor Count
  6. Earliest Monitoring • Service Status • PID Monitoring • OS

    Exit Code • Service variables • Limited window into service internals • Can “track” variables to graph changes • Player Concurrency • Customer Service contacts • Player reports on forums
  7. Global Data Platform • 28 Person Team • 14 Software

    Engineers, 4 System Engineers • 6 PMs, 4 Tech Leads • 6 Production Datacenters • Telem-Telem – Monitoring Pipeline in each datacenter • 7 SDKs • Events, Logs, Metrics
  8. Global Data Platform • Microservices (Node.js / Scala / Java)

    • Protocol Buffers only • All data is associated with registered Schema • Telemetry Development Kit • Multiple Datastores (Elastic, HDFS, Cassandra) • 7 day TTL for Elastic, much longer for HDFS • Cassandra for specific use cases
  9. GDP Microservices Architecture Enrichment Ingest Topic Specific Topics Schema Reg

    ES Processor Cassandra Processor HDFS Processor Kafka
  10. GDP Microservices Architecture Enrichment Ingest Topic Specific Topics Schema Reg

    ES Processor Cassandra Processor HDFS Processor HDFS Cassandra Elasticsearch Kafka
  11. BEAM • Blizzard’s custom monitoring solution • Poll datasources periodically

    • Transform data • Check conditions • Perform actions
  12. Incident Response - Without Data Platform? • Was the drop

    spread across all servers • Did any services or hosts unexpectedly terminate • Check for server crash emails • Compare concurrency to other Overwatch platforms • Compare concurrency to other Blizzard games • Spin up a bunch of resources to investigate if the drop was bad enough
  13. With Data Platform • Pipeline • Supports client telemetry •

    Data • Metrics have more associated data • Reporting • Easy to discover and pivot
  14. Operationalizing Overwatch • Everyone was very excited about the potential

    of our data platform • Identify what is critical and focus there • Common flows like login, play a game • Critical flows like purchasing • Your instrumentation should get better over time • Define your KPIs
  15. Incident Management • 134 Major Incidents in 2017 that affected

    Overwatch • 78% were detected first by an Alert • 30% were recommended for review to improve monitoring • Did alert identify root cause • Time to detect incident • Time taken for ops staff to validate incident
  16. Future – BEAM • RPC Message • Autoremediation? • Autoscaling?

    • Rules templates • Better auditing • Stateful Alerts • Maintenance Mode
  17. Future – Leveraging Elasticsearch • Cross cluster search • Multitenancy

    Challenges in Kibana • Hundreds of broken visualizations and dashboards • Unified data access layer / Query Engine • Presto, SparkSQL, Query Grid, Drill, Qubole?
  18. Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/

    Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 57 Please attribute Elastic with a link to elastic.co
  19. Future Plans – Pipeline • Isolated pipelines for specific usecases

    • Higher guarantees, lower latencies • Still have lots of data flowing through old pipelines • Expanding esports initiatives • Self supporting Kafka