Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making the Impossible Impossible: Improving Rel...

Making the Impossible Impossible: Improving Reliability by Preventing Classes of Problems

This talk was given at SREcon EMEA 22, in Amsterdam: https://www.usenix.org/conference/srecon22emea/presentation/sinjakli

---

Service Level Objectives (SLOs) are a familiar topic in SRE circles. They provide a framework for measuring and thinking about the reliability of a service in terms of a percentage of successful operations, such as HTTP requests.

That key strength of SLOs - viewing reliability as a percentage game - can also also be a weakness. Within that framing, there are certain solutions we're likely to overlook.

This talk explores another lens for reliability - one that's complementary to SLOs: structuring software in a way that rules out entire classes of problem.

We'll explore this idea via three worked examples, and finish with some concrete take-aways, including how to spot problems that fit this shape.

Chris Sinjakli

October 26, 2022
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. Hi

  2. A refresher: Measuring the performance of a service as a

    percentage of successful operations
  3. Today's talk: - Another lens for reliability - Examples in

    the wild 
 - How to spot problems of this shape
  4. Today's talk: - Another lens for reliability - Examples in

    the wild 
 - How to spot problems of this shape
  5. Today's talk: - Another lens for reliability - Examples in

    the wild 
 - How to spot problems of this shape
  6. This is not: - An attack on SLOs 
 -

    One-size- fi ts all solution - Possible if you can't edit software
  7. This is not: - An attack on SLOs 
 -

    One-size- fi ts all solution - Possible if you can't edit software
  8. This is not: - An attack on SLOs 
 -

    One-size- fi ts all solution - Possible if you can't edit software
  9. Simple model id description state 1 Laptop submitted 2 Phone

    collected 3 Unused domain renewal collected
  10. Simple model id description state 1 Laptop submitted 2 Phone

    collected 3 Unused domain renewal collected
  11. Simple model id description state 1 Laptop collected 2 Phone

    collected 3 Unused domain renewal collected
  12. Simple model id description state 1 Laptop paid_out 2 Phone

    collected 3 Unused domain renewal collected
  13. Simple model id description state 1 Laptop submitted 2 Phone

    collected 3 Unused domain renewal collected
  14. Simple model id description state 1 Laptop failed 2 Phone

    collected 3 Unused domain renewal collected
  15. class Payment def fail() if state == "submitted" state =

    "failed" else raise "Cannot fail from state: #{state}" State restriction pseudocode
  16. class Payment def submit() if state == "created" state =

    "submitted" else raise "Cannot submit from state: #{state}" State restriction pseudocode
  17. class Payment def fail() if state in ["submitted", "payout_submitted"] state

    = "failed" else raise "Cannot fail from state: #{state}" State restriction pseudocode
  18. State machine: - A set of states - A set

    of allowed transitions between those states
  19. char *ptr = malloc(SIZE); do_stuff(ptr); free(ptr); // Many lines more

    code do_other_stuff(ptr); Use-after-free in C
  20. Garbage collection pseudocode def main() name = "Chris" greet(name) def

    greet(name) puts("Hello #{name}") Falls out of scope
  21. fn main() { let name = String::from("Chris"); greet(name); } fn

    greet(name: String) { println!("Hello {}", name); } Rust greetings
  22. fn main() { let name = String::from("Chris"); greet(name); } fn

    greet(name: String) { println!("Hello {}", name); } Rust greetings Owner transferred
  23. fn main() { let name = String::from("Chris"); greet(name); } fn

    greet(name: String) { println!("Hello {}", name); } Rust greetings Falls out of scope Owner transferred
  24. fn main() { let name = String::from("Chris"); greet(name); say_goodbye(name); }

    fn greet(name: String) { println!("Hello {}", name); } Rust greetings Compiler error
  25. fn main() { let name = String::from("Chris"); greet(&name); say_goodbye(name); }

    fn greet(name: &String) { println!("Hello {}", name); } Rust greetings Borrow
  26. -- Create a table CREATE TABLE payments ( id int

    NOT NULL, ... ) -- Realise `int` isn't large enough (232) -- You're going to run out of IDs ALTER TABLE payments MODIFY id bigint;
  27. -- Create a table CREATE TABLE payments ( id int

    NOT NULL, ... ) -- Realise `int` isn't large enough (232) -- You're going to run out of IDs ALTER TABLE payments MODIFY id bigint;
  28. -- Create a table CREATE TABLE payments ( id int

    NOT NULL, ... ) -- Realise `int` isn't large enough (232) -- You're going to run out of IDs ALTER TABLE payments MODIFY id bigint; Blocks all other queries
  29. -- Slow transaction START TRANSACTION; SELECT * FROM payments; --

    Forces this to queue ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these SELECT * FROM payments WHERE id = 123;
  30. -- Slow transaction START TRANSACTION; SELECT * FROM payments; --

    Forces this to queue ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these SELECT * FROM payments WHERE id = 123;
  31. -- Slow transaction START TRANSACTION; SELECT * FROM payments; --

    Forces this to queue ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these SELECT * FROM payments WHERE id = 123;
  32. id (bigint) description 1 Laptop ALTER TABLE payments MODIFY id

    bigint; id (int) description 1 Laptop 2 Phone
  33. id (bigint) description 1 Laptop ALTER TABLE payments MODIFY id

    bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  34. id (bigint) description 1 Laptop 2 Phone ALTER TABLE payments

    MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  35. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  36. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  37. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal User queries (via proxy)
  38. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal User queries (via proxy)
  39. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal User queries (via proxy)
  40. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot
  41. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot
  42. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot - But there are some tells
  43. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot - But there are some tells
  44. Examples: - State machines - Memory safety 
 - Database

    migrations 
 Add more unit tests Write better C Just hire
  45. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Write better C Just hire
  46. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Add more unit tests Write better C Just hire
  47. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Add more unit tests Write better C Just hire
  48. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Add more unit tests Write better C Just hire a DBA
  49. Image credits • Poker Winnings - slgckgc - CC-BY -

    https://www. fl ickr.com/photos/slgc/42157896194/ • Thinking Face - Twemoji - CC-BY - https://github.com/twitter/twemoji • Ferris (Extra-cute) - Unof fi cial Rust mascot - Copyright waived - https://rustacean.net/ • A350 Board - Mark Turnauckas - CC-BY - https://www. fl ickr.com/photos/marktee/ 17118767669/ • Play - Annie Roi - CC-BY - https://www. fl ickr.com/photos/annieroi/4421442720/
  50. Image credits • White jigsaw puzzle with missing piece -

    Marco Verch Professional Photographer - CC-BY - https://www. fl ickr.com/photos/30478819@N08/50605134766/ • Hedge maze - claumoho - CC-BY - https:// fl ickr.com/photos/claudiah/3929921991/ • photo_1405_20060410 - Robo Android - CC-BY - https://www. fl ickr.com/photos/ 49140926@N07/6798304070/ • Gears - Mustang Joe - Public Domain - https://www. fl ickr.com/photos/mustangjoe/ 20437315996/