Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring of SmartNews

Monitoring of SmartNews

Nobutoshi Ogata

March 24, 2016
Tweet

More Decks by Nobutoshi Ogata

Other Decks in Technology

Transcript

  1. Self Introduction • Nobutoshi Ogata • Manager, Site Reliability Engineering

    • @nobu666 • ❤ Whiskey, Cat, Heavy Metal • Entrusted dev.(10y) ➡ GREE infrastructure devision(3y) ➡ Some startup(1y) ➡ SmartNews(2015/05-)
  2. Before Datadog • We used: • munin • growthforecast •

    cloudwatch • Wanted to centralized management !
  3. After Datadog - Phase1 • OK, we can manage centrally

    • But...? • We're respecting the free development of engineers ! • Problem that the monitoring setting is leaked out "
  4. Phase2 • Introduce Interferon • Datadog DSL • Well, we

    can monitor all resources automatically • But...? • Unmaintained in active ! • Can't feel free to mute from Web UI " • Lack of flexibility #
  5. Phase3 • Integrated itamae • Our engineers were used to

    write chef • Easy to override default settings • It's asynchronous. Feel free to mute from Web UI • Integrated dogaws @takus • Yet another Datadog CloudWatch Integragion • We are used in combination with itamae
  6. Datadog tips • Easiness anomary detection • Can't compared over

    24hours until quite recently • We request to be able to compare more longer period. Thank Datadog for implementing ! • This is a closed function. If you want to use it, ask Datadog support "
  7. For example • Comapare Kinesis records count EWMA pct_change(median(last_1h), 1w_ago):ewma_20(avg:aws.kinesis.incoming_records{env

    :production,cost:smartnews} by {name}) > 50 • Compare application warn log change(median(last_1h),1w_ago): sum:app.log.warn{env:production} by {autoscaling_group} > 25
  8. We're hiring! Only two people on Site Reliability Engineering Team

    ! • εϚχϡʔͷSite Reliability Engineer ืूʂ • http://about.smartnews.com/en/ careers/