Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MZ Games: The Evolution of Log Insight

Elastic Co
March 07, 2017

MZ Games: The Evolution of Log Insight

Machine Zone is the company behind massively-multiplayer online games Game of War, Fire Age, and Mobile Strike. Real-time operations is essential for a great user experience. The MZ engineering team shares how they use the Elastic Stack to track player activity to support customer service and more.

Eric Liaw l Director of Engineering l Machine Zone
Reggie Meisler l Developer l Machine Zone

Elastic Co

March 07, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. MZ March 7th 2017 [email protected], [email protected] Evolution of log insight

    @ mz games Eric Liaw, Director of Game Engineering Reggie Meisler, Software Engineer
  2. Agenda 2 1 Who is MZ? 2 “Application log insight

    w/ Kibana” by Eric Liaw 3 “Player ledger for customer service” by Reggie Meisler 4 Learnings & Best Practices 5 Questions
  3. 2014 - pssh is all we need! # Finding a

    specific Error message instance $ pssh -h first50.txt -o logs grep ‘Unable to create instance’ /path/to/logs/application.log # Finding all logs for user_id ‘44273942’ $ pssh -h all_hosts.txt -o user_logs xzgrep ‘44273942’ /path/to/logs/application.log.gz
  4. pssh - Is it all we need? Couldn’t aggregate without

    writing code Couldn’t easily share the results Couldn’t explore the data Slow! (Not real time!)
  5. Elasticsearch/Logstash/Kibana a.k.a Elastic Stack Used by both engineers and non-engineers

    4+ Billion logs per day 100+ Dashboards created Used in 100% of incident response
  6. How to maximize insight from logs 1. Automatic Fields (5

    examples) 2. Case Specific Fields (2 examples)
  7. Automatic Fields (1/5): log_level What: 10 (Debug) - minutia (Not

    indexed) 20 (Info) - interesting information 30 (Warning) - unexpected, but recoverable 40 (Error) - incorrect state 50 (Critical) - catastrophic, unrecoverable events Sample Queries log_level: 50 log_level: [40 TO 50] NOT log_level: 20
  8. Automatic Fields (2/5): Controller/Action What: Public facing controller/action this log

    was triggered by Why: How does this impact the customer? Sample queries: controller: login AND action: register log_level: 50 AND controller: purchase
  9. Automatic Fields (3/5): hostname What: hostname that generated the log

    Why: Allows us to find bad hosts quickly Sample Queries: hostname: web-1-046 hostname: worker* hostname: web-1-0*
  10. Automatic Fields (4/5): fileline What: Concatenation of the filename &

    line number that generated the log Example: User.php:1337 Why: Logically groups logs by source Avoids searching for specific log terminology Bonus: Helps with log clean up
  11. Automatic Fields (5/5): request_id What: Generated uuid for a single

    controller/action instance Why: Get all logs from a specific API call Useful when backtracking from an error log state to see what lead to it
  12. Automatic Fields Review log_level: 50 AND controller: login AND action:

    register AND host: web* AND fileline: User.php:1337 AND NOT request_id: 6168f35d5a4ec33
  13. Case Specific Fields (1/2): user identification user_id What: Unique numerical

    identifier for a user Why: Track all logs generated by a particular player admin_user What: Email of a user using our administration tool Why: Track all logs generated by a particular CS agent or game operations manager
  14. Case Specific Fields(2/2): DB endpoints What: Data_source_name concatenates the hostname

    and database name for a MySQL log entry Why: Pinpoint application perceived issues on db endpoints
  15. Customer Service for Games • Player creates CS ticket ◦“I

    should have gotten X, but did not!” •CS opens admintool, looks up user history ◦True ▪Award them X ◦False ▪“Sorry, here’s why you did not get X”
  16. Customer Service for Games Game Servers Desk SQL Admintool Award

    user via game api Access to game dbs How does this piece work? Looks up history
  17. Tracking Player History • Leverages existing ORM ◦Define a model,

    including declarative metadata ◦Construct during player action, set fields, persist ◦Query models from admintool •Player state and history are first class data ◦Stored in SQL and cached in Redis ◦Accessible to runtime
  18. Tracking Player History class CombatReport extends Model { public $timestamp

    = [Type::timeuuid, INDEXKEY]; public $att_user_id = [Type::int64, INDEXKEY]; public $def_user_id = [Type::int64, INDEXKEY]; public $def_hp = [Type::int32]; public $def_starting_hp = [Type::int32]; // … } Describes SQL table schema Backed by SQL/Redis
  19. Tracking Player History public function performCombat() { $starting_hp = $defender->hp;

    // ...mutate data based on combat... $summary = new CombatReport(); $summary->timestamp = time(); $summary->att_user_id = $attacker->user_id; $summary->def_user_id = $defender->user_id; $summary->def_hp = $defender->hp; $summary->def_starting_hp = $starting_hp; $summary->save(); } Capture state and persist
  20. Tracking Player History public function getCombatReports($session, $from, $to) { $summaries

    = CombatReport::GetWhere([ “att_user_id” => $session->user_id, [“timestamp > %i”, $from], [“timestamp < %i”, $to] ]); // Return to admintool client for display return $summaries; } Query reports for display in CS admintool
  21. Tracking Player History - Challenges • Difficult to add new

    fields ◦Schema migration for every little change ◦Requires DBA oversight • Difficult to scale ◦Data grows rapidly for transactional data ◦Need to constantly add more DB shards 50,000 new records every second!
  22. Using Elastic Stack • Pros ◦High-throughput pipeline already set up

    for logs ◦Easy to add shards and fields ◦Rolling time series data •Cons ◦Impractical to perform updates ◦Lack of internal experts
  23. Using Elastic Stack class CombatReport extends ElasticsearchModel { public $timestamp

    = [Type::timeuuid, INDEXKEY]; public $att_user_id = [Type::int64, INDEXKEY]; public $def_user_id = [Type::int64, INDEXKEY]; public $def_hp = [Type::int32]; public $def_starting_hp = [Type::int32]; // … } Describes type mapping Backed by Elasticsearch
  24. curl -XPUT “$url/_template/$template” -d ‘{ “mappings”: { “type”: { “properties”:

    { “@timestamp”: {“type”: “date”}, “CombatReport”: { “properties”: { “att_user_id”: {“type”: “long”}, “def_user_id”: {“type”: “long”}, “def_hp”: {“type”: “int”, “index”: “no”}, “def_starting_hp”: {“type”: “int”, “index”:“no”}, ...more fields... } ... Using Elastic Stack
  25. Using Elastic Stack public function performCombat() { $starting_hp = $defender->hp;

    // ...mutate data based on combat... $summary = new CombatReport(); $summary->timestamp = time(); $summary->att_user_id = $attacker->user_id; $summary->def_user_id = $defender->user_id; $summary->def_hp = $defender->hp; $summary->def_starting_hp = $starting_hp; $summary->save(); } Same ORM api, no change necessary
  26. Using Elastic Stack public function getCombatReports($session, $from, $to) { $summaries

    = CombatReport::GetWhere([ “att_user_id” => $session->user_id, [“timestamp > %i”, $from], [“timestamp < %i”, $to] ]); // Return to admintool client for display return $summaries; } Equivalent to filter query
  27. Using Elastic Stack public function getCombatReports($session, $from, $to) { $summaries

    = CombatReport::GetWhere([ “att_user_id” => $session->user_id, [“timestamp > %i”, $from], [“timestamp < %i”, $to] ]); // Return to admintool client for display return $summaries; } “term”: { “att_user_id”:$user_id }
  28. Using Elastic Stack public function getCombatReports($session, $from, $to) { $summaries

    = CombatReport::GetWhere([ “att_user_id” => $session->user_id, [“timestamp > %i”, $from], [“timestamp < %i”, $to] ]); // Return to admintool client for display return $summaries; } “range”: { “@timestamp”:{“gt”:$date} }
  29. Using Elastic Stack public function getCombatReports($session, $from, $to) { $summaries

    = CombatReport::GetWhere([ “att_user_id” => $session->user_id, [“timestamp > %i”, $from], [“timestamp < %i”, $to] ]); // Return to admintool client for display return $summaries; } “range”: { “@timestamp”:{“lt”:$date} }
  30. •Ran some tests against a dev cluster ◦Determined we were

    bottlenecked on JVM memory ◦Hosts could fit 4x the JVM memory cap ◦Created 3 isolated clusters per ELS host •Initially overwhelmed our main cluster ◦> 50,000 index/sec (25,000 per game) ◦Game logs emit large indexed docs (30+ fields) Using Elastic Stack 1. Game of War 2. Mobile Strike 3. Player Ledger Number of Indexed Fields Index Rate
  31. •Elastic Stack was a great fit for this data ◦Easy

    to add fields ◦Easy to scale as throughput increased •The largest hurdle we faced was knowledge ◦Elastic.co has very good documentation ◦Company training was helpful Using Elastic Stack - Conclusion
  32. Learnings & Best Practices • Index only what is necessary

    ◦Indexed fields are determining factor for write speed •Loosen type restrictions ◦Less fragile to use largest type container •Improve visibility on issues with pipeline ◦Dump all error logs to Elastic Stack + ElastAlert
  33. Learnings & Best Practices •Reduce queries to “term” and “range”

    ◦Avoids overhead of scoring •Make pipeline idempotent ◦Use custom id and perform batch “create” • Some CS data should still go into SQL ◦Payments & permanent state changes