Slide 1

Slide 1 text

Seeing Inside Your Service Monitoring and logging in GCP Amir Hermelin Product Manager Asaph Zemach Engineering Manager

Slide 2

Slide 2 text

New logs pipeline and viewer The Time Series API in depth Putting it all together using Compute Engine 1 2 3 4 Cloud Platform vision for monitoring and logging Agenda

Slide 3

Slide 3 text

What we’ve learned • Catch problems early, before they turn into user visible outages. • Noise is bad: reduce false positives! • Alert on symptoms: will this affect users? Monitoring at Google

Slide 4

Slide 4 text

Monitoring Customer Perspective “I love being woken up at 3am to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...

Slide 5

Slide 5 text

Monitoring Customer Perspective “I love being woken up at 3am to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...

Slide 6

Slide 6 text

Monitoring Customer Perspective “I love being woken up at 3am to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...

Slide 7

Slide 7 text

Monitoring Customer Perspective “I love being woken up at 3am to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...

Slide 8

Slide 8 text

Monitoring Customer Perspective “I love being woken up at 3am to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from... Actionable Scalable Easy to use Smart You’re telling us we need to be...

Slide 9

Slide 9 text

We want dashboards and alerts that: • Surface only relevant metrics and events • Minimize false-positives • Automatically detect issues and help find related events Timely and scalable metrics gathering along with reliable, efficient logs collection and search means YOU can then connect the dots faster, minimize troubleshooting time and take immediate action. Where We Are Going

Slide 10

Slide 10 text

Cloud Platform vision for monitoring and logging New logs pipeline and viewer The Time Series API in depth Putting it all together using Compute Engine 2 4 Agenda 3 1

Slide 11

Slide 11 text

New Logs Pipeline and Viewer App Engine Cloud Storage BigQuery Logs viewer in Cloud Console Logs Pipeline Buffer

Slide 12

Slide 12 text

source: Google data Logs Viewer

Slide 13

Slide 13 text

Logs Viewer Improvements • Infinite scroll • Automatically searches through logs until enough results are found • Search supports both labels and regexp • Suggest labels as you type

Slide 14

Slide 14 text

Cloud Platform vision for monitoring and logging New logs pipeline and viewer The Time Series API in depth Putting it all together using Compute Engine 2 4 1 3 Agenda 1

Slide 15

Slide 15 text

Collection of System Metrics Periodically sample important system counters Monitoring API Monitoring Data Satisfied User Google Metrics Store

Slide 16

Slide 16 text

Time Series Data in Cloud Console source: Google data

Slide 17

Slide 17 text

Compute Engine App Engine Metrics Available for Query /http/server/pagespeed_response_count /http/server/response_count /http/server/response_latencies /http/server/response_style_count /http/server/dos_intercept_count /http/server/quota_denial_count /system/cpu/usage /system/network/pagespeed_sent_bytes_count /system/network/received_bytes_count /system/network/sent_bytes_count /instance/uptime /instance/cpu/usage_time /instance/cpu/reserved_cores /instance/disk/read_ops_count /instance/disk/write_ops_count /instance/disk/read_bytes_count /instance/disk/write_bytes_count /instance/disk/read_latencies /instance/disk/write_latencies /instance/network/received_bytes_count /instance/network/sent_bytes_count /instance/network/received_packets_count /instance/network/sent_packets_count /firewall/dropped_bytes_count /firewall/dropped_packets_count

Slide 18

Slide 18 text

Example of what a read request looks like GET https://www.googleapis.com/cloudmonitoring/ \ # Access monitoring API v2beta1/ \ # (that’s still in beta) projects/myproject/ \ # for myproject timeseries/ \ # to get a time series of points compute.googleapis.com/ \ # for the Compute Engine service /instance/cpu/usage_time # that has CPU usage by instance Metrics Read API Request

Slide 19

Slide 19 text

Example of what a read request looks like Metrics Read API Response { "kind": "cloudmonitoring#listTimeseriesResponse", ... "timeseries": [ { "timeseriesDesc": { "project": "1016230248573", "metric": "compute.googleapis.com/instance/cpu/usage_time", "labels": { "cloud.googleapis.com/service": "compute.googleapis.com", ... "compute.googleapis.com/instance_name": "ae-engine-1-03-0" } }, "points": [ { "start": "2014-03-07T18:57:09.000Z", "end": "2014-03-20T00:13:13.000Z", "singularValue": 13400.60009765625 },

Slide 20

Slide 20 text

Cloud Platform vision for monitoring and logging New logs pipeline and viewer The Time Series API in depth Putting it all together using Compute Engine 4 2 1 3 Agenda

Slide 21

Slide 21 text

Example using GCE CloudMemeBackEnd Request Metrics Google Metrics Store

Slide 22

Slide 22 text

Complete Example: Demo

Slide 23

Slide 23 text

Platform Metrics under development • Cloud SQL • API usage metrics • more coming... API Features: • Read • List • Create • Write Upcoming Metrics and APIs

Slide 24

Slide 24 text

4 z New logs pipeline and viewer The Time Series API in depth Putting it all together using Compute Engine 2 Summary 3 1 Cloud Platform vision for monitoring and logging 4 4 1 3 1

Slide 25

Slide 25 text

Thank You and Questions We’d LOVE your feedback and thoughts! [email protected]