The cloud-native approach has taken the DevOps world by a pleasant surprise by the welcome adoption of Kubernetes across all categories - from Developers to SREs to VP of digital transformation. As the huge mass of legacy applications move Cloud-Native platforms, an important problem arises. How do SREs make sure the systems do not have weaknesses and have the required level of resilience? A well thought out chaos engineering methodology is the right answer. And for a large number of fast-changing applications and infrastructure, finding the right set of chaos experiments and identifying if the impact of chaos has resulted in showing up a weakness in the system is almost an impossible task.
In Cloud-Native Chaos engineering, the developers develop chaos tests as an extension of the development process. These tests are developed using standard Kubernetes Custom Resources or CRs so that they are easier to manipulate according to the environment. These chaos experiments are groomed in CI pipelines and finally published in the Chaos Hub so that they are available to SREs using the Cloud-Native applications in production. SREs use such chaos experiments of various microservices to schedule chaos in a random fashion to find weaknesses in their deployments, which leads to increased reliability.