Defensive Programming & Resilient systems in Real World (TM)

Defensive programming & resilient systems Don’t trust anyone not even
yourself A “not just testing” talk kini@tuenti.com @kinisoftware

Self-promotion

Why this talk? • 100% test coverage (as you told
us Kini) • Code reviews • Manual testing • So, after release, my job is done Right?  

Why this talk? No, it isn’t 

Why this talk? Inside every enterprise today is a mesh
of interconnected, interdependent systems. They cannot—must not—allow bugs to cause a chain of failures. Bugs will happen. They cannot be eliminated, so they must be survived instead. Production is the only place to learn how the software will respond to real-world Release 1.0 is the beginning of your software’s life, not the end of the project.

Why this talk? • Early detection • Reduce impact to
customers • Know why an issue happened • Don’t depend on somebody looking at error log, daily email, ... • Prevent different conditions in dev/testing & production

Glossary Defensive Programming is a form of defensive design intended
to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software. The idea can be viewed as reducing or eliminating the prospect of Murphy's Law having effect. Resilient system stays responsive in the face of failure, any system that is not resilient will be unresponsive after a failure. Resilience is achieved by replication, containment, isolation and delegation.

State of the art @Tuenti

State of the art @Tuenti Service Api - Monolith PHP
ChargingApi ProvisioningApi SubscriptionsApi EventHistoryApi µServices - Java Charging Provisioning Subscriptions Notifications BSS GW Providers WS Notifications / Files Mobile Apps Web Admin Tools

Real World(TM) • Avoid wrong dependencies • Feature disabling •
Detect unfinished processes • Go async (and retry) • Log inputs & outputs • Monitoring • Alarms

Real World(TM) Avoid wrong dependencies • Integration points are the
number-one killer of systems. • A subsystem should be as isolate as possible • Consider health checks • Design and architecture decisions are also financial decisions.

Real World(TM) • Allowing teams to modify system behavior without
changing code (no release needed) • Do “Dark launches” when possible • When replacing old code, always keep it until you know new one works fine • Configuration files (overriding & hot reloading) Feature Disabling

Real World(TM) Detect unfinished processes • Your business logic is
composed for many methods • Each one of them can fail by a lot of reasons • Depending of the underlying tech, not all of them may be catchable • How do you detect something is failing?

Real World(TM) Detect unfinished processes • Just detecting differences between
events started and ended tells you something is wrong • Integration points without timeouts is a surefire way to create cascading failures • Consider fail fast

Real World(TM) Go async (and retry) • Each system is
protected over other systems failures or service degradation • Be careful with operations that make changes • Don’t make request too quick • Check if operation is pending, even if previous call failed • Circuit Breaker • Limit the number of retries & Log them

Real World(TM) Log inputs & outputs If you don’t log…

Real World(TM) Log inputs & outputs • Logs service &
third parties input & outputs • As humans read (or even just scan) log files for a new system, they are learning what “normal” means for that system • Reserve “ERROR” for a serious system problem • Don’t leave log files on production systems. Copy them to a staging area for analysis • Log file rotation

Real World(TM) Monitoring • A system without transparency cannot survive
long in production • Good data enables good decision making • Logging and monitoring are both good for exposing and understanding the immediate behavior of an application or system

Real World(TM) Monitoring • Transparency: historical trending, predictive forecasting, present
status, and instantaneous behavior • Dashboards • Messages should include an identifier that can be used to trace the steps of a transaction

Real World(TM) Alarms • Independent alarms system • Log: event
happens, event data, error detected, … • Priorities: Critical, Error and Warning • Predicting the future • Document alarms • Reporting system

Q&A kini@tuenti.com @kinisoftware

Thanks!! kini@tuenti.com @kinisoftware

Extra Ball • Tools: • Graphite & Grafana (metrics &
monitoring) • Cabot (alarms) • Elasticsearch, Logstash and Kibana (logging & monitoring) • Hystrix (circuit breaker)

Extra Ball • Resources: • Release it! • Reactive Manifesto
• Feature Toggles • Microservices Guide • Publish-Subscriber pattern

Defensive Programming & Resilient systems in Re...

Defensive Programming & Resilient systems in Real World (TM)

Tuenti

More Decks by Tuenti

Other Decks in Programming

Featured

Transcript

Defensive programming & resilient systems Don’t trust anyone not even

Self-promotion

Why this talk? • 100% test coverage (as you told

Why this talk? No, it isn’t

Why this talk? Inside every enterprise today is a mesh

Why this talk? • Early detection • Reduce impact to

Glossary Defensive Programming is a form of defensive design intended

State of the art @Tuenti

State of the art @Tuenti

State of the art @Tuenti

State of the art @Tuenti Service Api - Monolith PHP

Real World(TM) • Avoid wrong dependencies • Feature disabling •

Real World(TM) Avoid wrong dependencies • Integration points are the

Real World(TM) • Allowing teams to modify system behavior without

Real World(TM) Detect unfinished processes • Your business logic is

Real World(TM) Detect unfinished processes • Just detecting differences between

Real World(TM) Go async (and retry) • Each system is

Real World(TM) Log inputs & outputs If you don’t log…

Real World(TM) Log inputs & outputs If you don’t log…

Real World(TM) Log inputs & outputs • Logs service &

Real World(TM) Monitoring • A system without transparency cannot survive

Real World(TM) Monitoring • Transparency: historical trending, predictive forecasting, present

Real World(TM) Alarms • Independent alarms system • Log: event

Q&A kini@tuenti.com @kinisoftware

Thanks!! kini@tuenti.com @kinisoftware

Extra Ball • Tools: • Graphite & Grafana (metrics &

Extra Ball • Resources: • Release it! • Reactive Manifesto