Slide 1

Slide 1 text

Monitoring of SmartNews 2016/03/24 GREE Tech Talk #10

Slide 2

Slide 2 text

Self Introduction • Nobutoshi Ogata • Manager, Site Reliability Engineering • @nobu666 • ❤ Whiskey, Cat, Heavy Metal • Entrusted dev.(10y) ➡ GREE infrastructure devision(3y) ➡ Some startup(1y) ➡ SmartNews(2015/05-)

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

SmartNews

Slide 5

Slide 5 text

16,000,000+ downloads worldwide

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Before Datadog • We used: • munin • growthforecast • cloudwatch • Wanted to centralized management !

Slide 10

Slide 10 text

After Datadog - Phase1 • OK, we can manage centrally • But...? • We're respecting the free development of engineers ! • Problem that the monitoring setting is leaked out "

Slide 11

Slide 11 text

Phase2 • Introduce Interferon • Datadog DSL • Well, we can monitor all resources automatically • But...? • Unmaintained in active ! • Can't feel free to mute from Web UI " • Lack of flexibility #

Slide 12

Slide 12 text

Phase3 • Integrated itamae • Our engineers were used to write chef • Easy to override default settings • It's asynchronous. Feel free to mute from Web UI • Integrated dogaws @takus • Yet another Datadog CloudWatch Integragion • We are used in combination with itamae

Slide 13

Slide 13 text

Datadog tips • Event collect and easy overlay • Provisioning • Deploy • etc

Slide 14

Slide 14 text

Datadog tips • Easiness anomary detection • Can't compared over 24hours until quite recently • We request to be able to compare more longer period. Thank Datadog for implementing ! • This is a closed function. If you want to use it, ask Datadog support "

Slide 15

Slide 15 text

For example • Comapare Kinesis records count EWMA pct_change(median(last_1h), 1w_ago):ewma_20(avg:aws.kinesis.incoming_records{env :production,cost:smartnews} by {name}) > 50 • Compare application warn log change(median(last_1h),1w_ago): sum:app.log.warn{env:production} by {autoscaling_group} > 25

Slide 16

Slide 16 text

Talk more? • Join our free lunch in Tokyo office ! • Ask me later "

Slide 17

Slide 17 text

We're hiring! Only two people on Site Reliability Engineering Team ! • εϚχϡʔͷSite Reliability Engineer ืूʂ • http://about.smartnews.com/en/ careers/

Slide 18

Slide 18 text

No content