Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring of SmartNews

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Monitoring of SmartNews

Avatar for Nobutoshi Ogata

Nobutoshi Ogata

March 24, 2016
Tweet

More Decks by Nobutoshi Ogata

Other Decks in Technology

Transcript

  1. Self Introduction • Nobutoshi Ogata • Manager, Site Reliability Engineering

    • @nobu666 • ❤ Whiskey, Cat, Heavy Metal • Entrusted dev.(10y) ➡ GREE infrastructure devision(3y) ➡ Some startup(1y) ➡ SmartNews(2015/05-)
  2. Before Datadog • We used: • munin • growthforecast •

    cloudwatch • Wanted to centralized management !
  3. After Datadog - Phase1 • OK, we can manage centrally

    • But...? • We're respecting the free development of engineers ! • Problem that the monitoring setting is leaked out "
  4. Phase2 • Introduce Interferon • Datadog DSL • Well, we

    can monitor all resources automatically • But...? • Unmaintained in active ! • Can't feel free to mute from Web UI " • Lack of flexibility #
  5. Phase3 • Integrated itamae • Our engineers were used to

    write chef • Easy to override default settings • It's asynchronous. Feel free to mute from Web UI • Integrated dogaws @takus • Yet another Datadog CloudWatch Integragion • We are used in combination with itamae
  6. Datadog tips • Easiness anomary detection • Can't compared over

    24hours until quite recently • We request to be able to compare more longer period. Thank Datadog for implementing ! • This is a closed function. If you want to use it, ask Datadog support "
  7. For example • Comapare Kinesis records count EWMA pct_change(median(last_1h), 1w_ago):ewma_20(avg:aws.kinesis.incoming_records{env

    :production,cost:smartnews} by {name}) > 50 • Compare application warn log change(median(last_1h),1w_ago): sum:app.log.warn{env:production} by {autoscaling_group} > 25
  8. We're hiring! Only two people on Site Reliability Engineering Team

    ! • εϚχϡʔͷSite Reliability Engineer ืूʂ • http://about.smartnews.com/en/ careers/