exactly what went wrong in an incident. A postmortem is a written record of an incident, its impact, the actions taken to mitigate it, the root cause, and the follow-up actions to prevent the incident.
Simian Army were launched 2016 Gremlin born 2019 1 Book Chaos massification 2017 SRE USenix Chaos IQ born ChaosConf 2018 1 Book Chaos Monkey for Spring Boot 2020 1 Book was published History of Chaos Engineering
metrics when the different Spark components fail. To simulate such failures, we employed a whack-a-mole approach and killed the various Spark components.
large amounts of data in an efficient way. • Ingest data from multiple sources without any lag. • Learn from new data and update constantly using the right learning algorithms. • Continue with tasks without getting tired or needing breaks. Post- mortems