each stage Impact of failures on developers 24+ hours Run weekly Release impacted. Hot fix or delay release. Code not top of mind. Heavy context switch 6-24 hours Run nightly Requires triage to pin source of issues. Likely delays other features Fix flows into multiple days at the cost of other work for the release 1-6 hours Run continuously Main branch is red. Impact others on the team. 1-30 minutes Run pre-merge Same day fix. Impacts dev who wrote the code. Need “coffee-break” level context switch Quick iterative development. No context switch. Flag issues earlier
Test B (pass) Test C (pass) Test A (pass) Test B (pass) Test C (pass) Test A (fail) Test A (pass) Test B (pass) Test C (pass) Test A (pass) Test A (pass) Test B (pass) Test C (pass) Test B (pass) Test A (pass) Test B (pass) Test C (pass) Test A (pass) Test B (pass) Test C (pass) git push git push git push git push git push git merge git merge git merge git push time PRブランチで実行するテスト main ブランチで実行するテスト PullReqでPTSなし PullReqでPTSあり Test C (pass) git push
(pass) Test A (pass) Test B (pass) Test C (pass) Test B (pass) Test A (pass) Test B (pass) Test C (pass) git push git push git merge git merge git push time PRブランチで実行するテスト main ブランチで実行するテスト Test C (pass) git push PullReqでPTSあり(Shift Left) Test A (fail) Test B (pass) Test C (pass) Test A (pass) Test B (pass) Test C (pass) git push git merge git merge PullReqでPTSなし git push Test A (pass) Test C (pass) git merge Test B (pass) git push
(average) Pass Fail Flaky Infra outage Durationの時系列データ Failure rateの時系列データ Flakiness scoreの時系列データ Duration Failure rate Flakiness score Pre-Merge Test Metrics (average) Test result history
ID Test case 1 Test case 2 Test case 3 Test case 4 Test case 5 Test case 6 Test case 7 Test case 8 Test case 9 Test case 10 Test case 11 Test case 12 ID1 ID2 ID3 ID4 ID5 ID95 ID96 ID97 ID98 ID99 … … Pass Fail Flaky Infra outage 61
Test case 4 Test case 5 Test case 1 Test case 2 Test case 3 Test case 4 Test case 5 Flakiness score のランキングテーブル Duration のランキングテーブル ▶ Test run ID or 指定期間でランキングテーブルを表示 62
Scaling High Performing Technology Organizations”, ,Nicole Forsgren, Jez Humble, Gene Kim,2018. • “The SPACE of Developer Productivity”, https://queue.acm.org/detail.cfm?id=3454124 • "Introducing Developer Velocity Lab to improve developers’ work and well-being”, https://www.youtube.com/watch?v=t7SXM7njKXw • “エリート DevOps チームであることを Four Keys プロジェクトで確認する ”, https://cloud.google.com/blog/ja/products/gcp/using-the-four-keys-to-measure-your-devops-performance • “2022 年の Accelerate State of DevOps Report を発表: セキュリティに焦 点”,https://cloud.google.com/blog/ja/products/devops-sre/dora-2022-accelerate-state-of-devops-report-now-out Developer Productivity関連
Systems”, https://sre.google/sre-book/table-of-contents/ • “SRE NEXT 2020 [C6] Designing fault-tolerant microservices with SRE and circuit breaker centric architecture”, https://speakerdeck.com/takanabe/sre-next-2020-c6-designing-fault-tolerant-microservices-with-sre-and-circuit-breaker-c entric-architecture?slide=105 • “Challenges for Global Service from a Perspective of SRE”, https://speakerdeck.com/takanabe/challenges-for-global-service-from-a-perspective-of-sre?slide=109 Site Reliability Engineering関連
to ensure reliability of code changes”, https://engineering.fb.com/2018/11/21/developer-tools/predictive-test-selection/ • “Taming Google-Scale Continuous Testing”, Atif Memon Bao Nguyen Eric Nickell John Micco Sanjeev Dhanda Rob Siemborski Zebao Gao, Proceedings of the 39th International Conference on Software Engineering, 2017. • “Predictive Test Selection”, Mateusz Machalica, Alex Samylkin, Meredith Porth, Satish Chandra, International Conference on Software Engineering, 2019. • “Assessing Transition-based Test Selection Algorithms at Google”, Claire Leong Abhayendra Singh John Micco Mike Papadakis Yves le traon, International Conference on Software Engineering, 2019. • “A Survey on Regression Test-Case Prioritization”, Yiling Lou, Junjie Chen, Lingming Zhang, Dan Hao, Advances in Computers, Volume 113, 2019. 機械学習・データサイエンスを使ったテスト運用戦略関連
https://github.blog/2020-12-16-reducing-flaky-builds-by-18x/ • "The state of the art in tackling Flaky Tests”, https://www.youtube.com/watch?v=PjYIcCnkLhg • “Spotify R&D | Engineering Blog :Test Flakiness – Methods for identifying and dealing with flaky tests”, https://engineering.atspotify.com/2019/11/test-flakiness-methods-for-identifying-and-dealing-with-flaky-tests/ • “Engineering at Meta: How do you test your tests?”, https://engineering.fb.com/2020/12/10/developer-tools/probabilistic-flakiness/ • “Google Testing Blog: Flaky Tests at Google and How We Mitigate Them”, https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html • “Dropbox: Athena: Our automated build health management system”, https://dropbox.tech/infrastructure/athena-our-automated-build-health-management-system • “Cybozu Inside Out: Flaky Testとの戦い”, https://blog.cybozu.io/entry/2020/12/23/100000 Flaky test 関連
in an insecure world”, https://about.gitlab.com/developer-survey/ • “Announcing the 2022 Accelerate State of DevOps Report: A deep dive into security”, https://cloud.google.com/blog/products/devops-sre/dora-2022-accelerate-state-of-devops-report-now-out • “2021 State of DevOps Report presented by Puppet”, https://www.puppet.com/success/resources/state-of-devops-report • “State of DevOps Report 2023: Platform Engineering Edition”, https://www.puppet.com/success/resources/state-of-platform-engineering インダストリーレポート