Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seeing Inside Your Service

Seeing Inside Your Service

from Google Cloud Platform Live 2014
YouTube Video: https://www.youtube.com/watch?v=aAE4-oLsTUU

Kazunori Sato

April 24, 2014
Tweet

More Decks by Kazunori Sato

Other Decks in Technology

Transcript

  1. Seeing Inside Your Service Monitoring and logging in GCP Amir

    Hermelin Product Manager Asaph Zemach Engineering Manager
  2. New logs pipeline and viewer The Time Series API in

    depth Putting it all together using Compute Engine 1 2 3 4 Cloud Platform vision for monitoring and logging Agenda
  3. What we’ve learned • Catch problems early, before they turn

    into user visible outages. • Noise is bad: reduce false positives! • Alert on symptoms: will this affect users? Monitoring at Google
  4. Monitoring Customer Perspective “I love being woken up at 3am

    to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...
  5. Monitoring Customer Perspective “I love being woken up at 3am

    to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...
  6. Monitoring Customer Perspective “I love being woken up at 3am

    to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...
  7. Monitoring Customer Perspective “I love being woken up at 3am

    to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from...
  8. Monitoring Customer Perspective “I love being woken up at 3am

    to reboot my VM instance” “We had alerts but they were too noisy so we ended up just ignoring them.” “We produce too much logs volume to use the GAE logs viewer” “I know we need better monitoring and alerting for our production services but who has the time (or expertise) to set it up?” Absolutely nobody Lots of customers Too many customers Most customers Here’s what we’re hearing from... Actionable Scalable Easy to use Smart You’re telling us we need to be...
  9. We want dashboards and alerts that: • Surface only relevant

    metrics and events • Minimize false-positives • Automatically detect issues and help find related events Timely and scalable metrics gathering along with reliable, efficient logs collection and search means YOU can then connect the dots faster, minimize troubleshooting time and take immediate action. Where We Are Going
  10. Cloud Platform vision for monitoring and logging New logs pipeline

    and viewer The Time Series API in depth Putting it all together using Compute Engine 2 4 Agenda 3 1
  11. New Logs Pipeline and Viewer App Engine Cloud Storage BigQuery

    Logs viewer in Cloud Console Logs Pipeline Buffer
  12. Logs Viewer Improvements • Infinite scroll • Automatically searches through

    logs until enough results are found • Search supports both labels and regexp • Suggest labels as you type
  13. Cloud Platform vision for monitoring and logging New logs pipeline

    and viewer The Time Series API in depth Putting it all together using Compute Engine 2 4 1 3 Agenda 1
  14. Collection of System Metrics Periodically sample important system counters Monitoring

    API Monitoring Data Satisfied User Google Metrics Store
  15. Compute Engine App Engine Metrics Available for Query /http/server/pagespeed_response_count /http/server/response_count

    /http/server/response_latencies /http/server/response_style_count /http/server/dos_intercept_count /http/server/quota_denial_count /system/cpu/usage /system/network/pagespeed_sent_bytes_count /system/network/received_bytes_count /system/network/sent_bytes_count /instance/uptime /instance/cpu/usage_time /instance/cpu/reserved_cores /instance/disk/read_ops_count /instance/disk/write_ops_count /instance/disk/read_bytes_count /instance/disk/write_bytes_count /instance/disk/read_latencies /instance/disk/write_latencies /instance/network/received_bytes_count /instance/network/sent_bytes_count /instance/network/received_packets_count /instance/network/sent_packets_count /firewall/dropped_bytes_count /firewall/dropped_packets_count
  16. Example of what a read request looks like GET https://www.googleapis.com/cloudmonitoring/

    \ # Access monitoring API v2beta1/ \ # (that’s still in beta) projects/myproject/ \ # for myproject timeseries/ \ # to get a time series of points compute.googleapis.com/ \ # for the Compute Engine service /instance/cpu/usage_time # that has CPU usage by instance Metrics Read API Request
  17. Example of what a read request looks like Metrics Read

    API Response { "kind": "cloudmonitoring#listTimeseriesResponse", ... "timeseries": [ { "timeseriesDesc": { "project": "1016230248573", "metric": "compute.googleapis.com/instance/cpu/usage_time", "labels": { "cloud.googleapis.com/service": "compute.googleapis.com", ... "compute.googleapis.com/instance_name": "ae-engine-1-03-0" } }, "points": [ { "start": "2014-03-07T18:57:09.000Z", "end": "2014-03-20T00:13:13.000Z", "singularValue": 13400.60009765625 },
  18. Cloud Platform vision for monitoring and logging New logs pipeline

    and viewer The Time Series API in depth Putting it all together using Compute Engine 4 2 1 3 Agenda
  19. Platform Metrics under development • Cloud SQL • API usage

    metrics • more coming... API Features: • Read • List • Create • Write Upcoming Metrics and APIs
  20. 4 z New logs pipeline and viewer The Time Series

    API in depth Putting it all together using Compute Engine 2 Summary 3 1 Cloud Platform vision for monitoring and logging 4 4 1 3 1