Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Cloud Run Reliability/Observability at ソウゾウ
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Ryuzo Yamamoto
April 19, 2023
Technology
790
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Cloud Run Reliability/Observability at ソウゾウ
Ryuzo Yamamoto
April 19, 2023
More Decks by Ryuzo Yamamoto
See All by Ryuzo Yamamoto
Cloud Run CI/CD + QA at ソウゾウ - Cloud Run Casual Talk !
dragon3
2
500
福岡からニューヨークに転勤になったエンジニアの話
dragon3
1
77
Other Decks in Technology
See All in Technology
IaC コードを資産へ:AWS CDK 社内ライブラリと横断展開 / aws-summit-japan-2026
gotok365
10
1.6k
AIが自律的に回る開発ループを設計してチーム開発に組み込む
nekorush14
0
130
元・セキュリティ学習経験0大学生による業務紹介 / An Introduction to the Job by a Former College Student with Zero Security Training Experience
nttcom
0
200
【2026年版】 ベクトル検索とEmbedding最前線
mocobeta
24
7.6k
フィジカル版Github Onshapeの紹介
shiba_8ro
0
330
Lightning近況報告
kozy4324
0
220
OTel × Datadog で 「AI活用」を計測し、改善に繋げる
shihochan
2
730
【FinOps】データドリブンな意思決定を目指して
z63d
0
370
現場のトークンマネジメント
dak2
1
190
螺旋型キャリアの生存戦略 / kinoko-conf2026
rakus_dev
1
1k
「勝手に広まる」人気 AI エージェントを爆速で作ろう!(AWS Summit Japan 2026講演資料)
minorun365
PRO
10
2.5k
GitHub Copilot app最速の発信の裏側
tomokusaba
1
260
Featured
See All Featured
Darren the Foodie - Storyboard
khoart
PRO
3
3.4k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
170
Accessibility Awareness
sabderemane
1
140
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
260
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Done Done
chrislema
186
16k
The Curious Case for Waylosing
cassininazir
1
400
The Cult of Friendly URLs
andyhume
79
6.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Odyssey Design
rkendrick25
PRO
2
710
HDC tutorial
michielstock
2
720
Principles of Awesome APIs and How to Build Them.
keavy
128
18k
Transcript
1 Cloud Run Reliability/Observability at ソウゾウ Ryuzo Yamamoto Cloud Run
Casual Talk!#2
2 山本 竜三 自己紹介 @dragon3 Software Engineer Lead Architect /
SRE at Souzoh in Fukuoka
3 ソウゾウ / メルカリShops
4 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Agenda
5 Architecture Next.js Cloud Run GraphQL Cloud Run imgproxy Cloud
Run microservice Cloud Run microservice Cloud Run Cloud Storage Cloud Load Balancing Cloud SQL Memorystore Cloud Run (70~ services) microservice(s) Cloud Run
6 Tech Stack • Monorepo ◦ Go, TypeScript, Python, Java
◦ 70~ microservices • Bazel, Turborepo • GraphQL / gRPC • Serverless (Cloud Run) • PostgreSQL, Redis • Cloud PubSub, Tasks, Workflows, Scheduler, VertexAI
7 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Agenda
8 • Logs ◦ JSON structured logging ◦ Cloud Logging
-> BigQuery • Metrics ◦ Log-based Metrics ◦ Cloud Monitoring -> Datadog • Traces ◦ OpenTelemetry -> Datadog Observability
9 Observability - Logs microservice Cloud Run container log STDOUT
/ STDERR Logging BigQuery { "message": "failed to say hello", "something_id": "xxxxxxxx" "serviceContext": { "version": "1.0.1", "service": "echo" }, "metadata": { "user-agent": "graphql/1.0.0 grpc-node-js/1.7.3", } } Sink
10 Observability - Metrics microservice Cloud Run container log STDOUT
/ STDERR Logging { "message": "grpc: finished server unary /echo.EchoService/Hello", "grpc": { "type": "unary", "kind": "server", "latency": 0.002360152, "code": "OK", "method": "Hello", "service": "echo.EchoService" }, "serviceContext": { "version": "1.0.1", "service": "echo" }, "metadata": { "user-agent": "graphql/1.0.0 grpc-node-js/1.7.3", } } Log-based Metrics Monitoring Log-based Metrics + Other GCP Metrics gRPC interceptor
11 Observability - Traces Next.js Cloud Run GraphQL Cloud Run
microservice Cloud Run microservice Cloud Run microservice(s) Cloud Run datadog-agent Cloud Run OTLP (gRPC)
12 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Agenda
13 Reliability - SLOs & Monitors as Code • SLO
をすべての gRPC method 毎に設定 ◦ Availability (e.g. 99.9% / 30 days) ◦ Latency (e.g. p95 100ms) • 設定の自動化 ◦ protobuf plugin + Terraform module • Multiwindow, Multi-Burn-Rate Alerts ◦ https://docs.datadoghq.com/monitors/service_level_objectives/burn_rate/ ◦ https://sre.google/workbook/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts
14 Reliability - SLOs & Monitors as Code ... rpc
Hello(HelloRequest) returns (HelloResponse) { option (extension.v2.method_monitoring) = { availability: { goal: 99.5 } latency: { threshold_ms: 100 percentile: 95 } }; } ... Terraform configuration protoc apply by CI (GitHub Actions) SLO monitors
15 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Wrap Up