SLOs for your service at design time. Document the availability expectations of external dependencies. Avoid single points of failures by not depending on a single global resource.
the outages of the external systems. Otherwise, the rollout-ability will always be capped by the least available external service and it wouldn’t be possible to rollout/rollback on demand. Deliverables: • Use X to ensure source tree is hosted internally. • Use Y for continuous integration. • Don’t mandate the external test coverage service before merges.
Use a configuration delivery service for everything else. Development configuration shouldn’t inherit from production. Document dynamic configuration capabilities.
Document how releases affect metrics. Document your canary release process. Document how to revert canaries. Ensure that rollbacks use the same process that rollouts use.
by your SLOs. Ensure client- and server-side of the data can be differentiated. Include (cloud) platform metrics in your dashboards. Setup alerting for your external service dependencies. Always propagate the incoming distributed trace context header.
all production projects have proper IAM configuration. Use subnetworks to isolate. Use VPN to connect to remote networks. Document and monitor user data access. Ensure debugging endpoints are limited by ACL.
for user input. Ensure your service can block incoming traffic selectively per user. Avoid external endpoints triggers a large number of internal fanouts.
requirements for your service. Document resource constraints: resource type, region, etc. Document quota restrictions to create new resources. Document load tests for performance regressions if possible.
team. • Research practices that apply, consult domain experts. • Start having production readiness discussions early. • Learn from failure and share knowledge widely. • Enforce production readiness practices.