Monitoring of SmartNews

Monitoring of SmartNews

D93fb300519f17800d3fbc8119ed4bed?s=128

Nobutoshi Ogata

March 24, 2016
Tweet

Transcript

  1. Monitoring of SmartNews 2016/03/24 GREE Tech Talk #10

  2. Self Introduction • Nobutoshi Ogata • Manager, Site Reliability Engineering

    • @nobu666 • ❤ Whiskey, Cat, Heavy Metal • Entrusted dev.(10y) ➡ GREE infrastructure devision(3y) ➡ Some startup(1y) ➡ SmartNews(2015/05-)
  3. None
  4. SmartNews

  5. 16,000,000+ downloads worldwide

  6. None
  7. None
  8. None
  9. Before Datadog • We used: • munin • growthforecast •

    cloudwatch • Wanted to centralized management !
  10. After Datadog - Phase1 • OK, we can manage centrally

    • But...? • We're respecting the free development of engineers ! • Problem that the monitoring setting is leaked out "
  11. Phase2 • Introduce Interferon • Datadog DSL • Well, we

    can monitor all resources automatically • But...? • Unmaintained in active ! • Can't feel free to mute from Web UI " • Lack of flexibility #
  12. Phase3 • Integrated itamae • Our engineers were used to

    write chef • Easy to override default settings • It's asynchronous. Feel free to mute from Web UI • Integrated dogaws @takus • Yet another Datadog CloudWatch Integragion • We are used in combination with itamae
  13. Datadog tips • Event collect and easy overlay • Provisioning

    • Deploy • etc
  14. Datadog tips • Easiness anomary detection • Can't compared over

    24hours until quite recently • We request to be able to compare more longer period. Thank Datadog for implementing ! • This is a closed function. If you want to use it, ask Datadog support "
  15. For example • Comapare Kinesis records count EWMA pct_change(median(last_1h), 1w_ago):ewma_20(avg:aws.kinesis.incoming_records{env

    :production,cost:smartnews} by {name}) > 50 • Compare application warn log change(median(last_1h),1w_ago): sum:app.log.warn{env:production} by {autoscaling_group} > 25
  16. Talk more? • Join our free lunch in Tokyo office

    ! • Ask me later "
  17. We're hiring! Only two people on Site Reliability Engineering Team

    ! • εϚχϡʔͷSite Reliability Engineer ืूʂ • http://about.smartnews.com/en/ careers/
  18. None