Upgrade to Pro — share decks privately, control downloads, hide ads and more …

fourkeys基盤を構築した話

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for uncle uncle
September 12, 2022

 fourkeys基盤を構築した話

Avatar for uncle

uncle

September 12, 2022
Tweet

More Decks by uncle

Other Decks in Technology

Transcript

  1. Agenda • 生産性の可視化 • fourkeys基盤の概要 • デプロイ頻度の計測 • リードタイムの計測 •

    Incident flowの整備 • incidents view • merged_pull_requests view • 変更障害率の計測 • サービス復元時間の計測 • さいごに
  2. そもそもなんで計測するの? 計測するにあたり指標をどうするかを考えfourkeysを採用した 理由は下記 • 定量的に計測できる ◦ 定義を明確にすれば計測できる • 世の中の水準で現状を評価できる ◦

    Google Cloudで実行されているDevOps組織の有効性を評価する にもあるように、それぞれの KeyごとにElite、High、Medium、Lowの4レベルで評価できる • 自動化できる ◦ GitHubやSlack、CircleCI等のAPIを使って自動計測が可能
  3. 計測対象 fourkeysのそれぞれの定義は各社・各Teamで決めてよいとされてるが、Kyashでは DORA(DevOps Research and Assessment)に合わせて下記のように定義してる 指標 概要 デプロイ頻度 組織による正常な本番へのリリース頻度

    変更のリードタイム commitから本番環境稼働までの所要時間 変更障害率 デプロイが原因で本番環境で障害が発生する割合 (%) サービス復元時間 組織が本番環境での障害から回復するのに掛かる時間
  4. SELECT day, IFNULL(ANY_VALUE(med_time_to_change)/60, 0) AS median_time_to_change, # Hours FROM (

    SELECT d.deploy_id, TIMESTAMP_TRUNC(d.time_created, DAY) AS day, PERCENTILE_CONT( # Ignore automated pushes IF(TIMESTAMP_DIFF(d.time_created, c.time_created, MINUTE) > 0, TIMESTAMP_DIFF(d.time_created, c.time_created, MINUTE), NULL), 0.5) # Median OVER (PARTITION BY TIMESTAMP_TRUNC(d.time_created, DAY)) AS med_time_to_change, # Minutes FROM four_keys.deployments d, d.changes LEFT JOIN four_keys.changes c ON changes = c.change_id ) GROUP BY day ORDER BY day; Query
  5. merged_pull_requests View # Merged Pull Request Table SELECT JSON_EXTRACT_SCALAR(metadata, '$.pull_request.html_url')

    AS url, PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ',JSON_EXTRACT_SCALAR(metadata, '$.pull_request.created_at')) AS time_created, PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ',JSON_EXTRACT_SCALAR(metadata, '$.pull_request.merged_at')) AS time_merged, JSON_EXTRACT_SCALAR(metadata, '$.pull_request.merge_commit_sha') AS commit_sha, JSON_EXTRACT_SCALAR(metadata, '$.pull_request.base.repo.name') AS repository FROM `four_keys.events_raw` WHERE source = 'github' AND event_type = 'pull_request' AND json_extract_scalar(metadata, '$.pull_request.merged') = 'true' AND json_extract_scalar(metadata, '$.action') = 'closed' GROUP BY 1,2,3,4,5
  6. Query SELECT TIMESTAMP_TRUNC(d.time_created, DAY) as day, IF(COUNT(DISTINCT d.deploy_id) = 0,

    0, SUM(IF(i.incident_id is NULL, 0, 1)) / COUNT(DISTINCT d.deploy_id)) as change_fail_rate, d.repository as metric, FROM four_keys.deployments d, d.changes LEFT JOIN four_keys.merged_pull_requests m ON changes = m.commit_sha and m.repository = d.repository LEFT JOIN four_keys.incidents i ON i.cause_pr_url = m.url GROUP BY day, d.repository ORDER BY day
  7. Query SELECT day, label as metric, daily_med_time_to_restore FROM ( SELECT

    TIMESTAMP_TRUNC(time_created, DAY) AS day, label, #### Median time to resolve PERCENTILE_CONT( TIMESTAMP_DIFF(time_resolved, time_created, HOUR), 0.5) OVER(PARTITION BY TIMESTAMP_TRUNC(time_created, DAY), label ) AS daily_med_time_to_restore, FROM four_keys.incidents, UNNEST(label_names) AS label ) GROUP BY day, label, daily_med_time_to_restore ORDER BY day