Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevoxxFR 2021 - Systematic error management in application

91812763826e2319128c245b94dc78e5?s=47 fanf42
September 29, 2021

DevoxxFR 2021 - Systematic error management in application

"Our work as developers is mainly to discover and manage non nominal case of applications"

Under that stated simplicity lies a complex reality that is a burden for developers around the world.
You, too, already wondered "but that behavior, is it an error? Does I even care of it?"

That presentation try to explain what are error, and how they play a major signaling role for people who need to get them: users, ops, devs. We will see how you can make an inventory of errors by making your code contracts WYSIWIG with types and how you can leverage the compiler and a modern scala effect management library, ZIO, to make their handling automatic.

We will also see how systems analysis can be used to understand what are your model subsystems, what are their nominal cases and error, and how to can analyse their interactions throught the prism interfaces, protocols and promises.

And finaly, how we can use that today in common project, like our 10y old Rudder, to make error management a joy.

91812763826e2319128c245b94dc78e5?s=128

fanf42

September 29, 2021
Tweet

Transcript

  1. francois@rudder.io @fanf42 Systematic error management in application To make them

    useful #DevoxxFR
  2. Hi! devops automation/compliance app manage ten of thousands computers 2

    François ARMAND CTO Founder Free Software Company “Stay Up”
  3. Hi! devops automation/compliance app manage ten of thousands computers 3

    François ARMAND CTO Founder Free Software Company “Stay Up” Developer
  4. Developer ? • Model the world into code ◦ Try

    to make it useful 4
  5. Developer ? • Model the world into code ◦ Try

    to make it useful • Nominal case necessary (of course) 5
  6. Developer ? • Model the world into code ◦ Try

    to make it useful • Nominal case necessary (of course) • But not sufficient (models are false) ◦ Bugs ◦ Misunderstanding of needs ◦ Open world 6
  7. This talk systematic management of errors 7 (with the help

    of types, functional programing… And well, systems)
  8. This talk systematic management of errors • I’m a scala

    dev, mainly ▪ expect Scala terminologie (ask if unclear!) ▪ statically typed language with sum types, interfaces • examples use ZIO - https://zio.dev 8
  9. This talk • Scala pure functional programming framework • manage

    concurrency, asynchronicity, resources, errors, ... • state of the art "principled effect management for everyone" • made with non-FP developers in mind (Java, etc) • Smells like Spring framework in 2006 ◦ opinionated core framework + domain oriented projects ◦ tries to tackles hard problem of the time ◦ the "80% of dev" in mind 9 ?
  10. 10 Not so popular opinions - 4 Hills I would

    die on -
  11. Our work as developers is to discover and assess failure

    modes 11 Not so popular opinion 1/4
  12. It’s YOUR work to choose the SEMANTIC between nominal case

    and error and KEEP your PROMISES Not so popular opinion 2/4 12
  13. ERRORS are a SOCIAL construction to give AGENCY to the

    receiver of the error 13 Not so popular opinion 3/4
  14. An application has always at least 3 kinds of users:

    users ; ops ; and dev. Don’t forget any. 14 Not so popular opinion 4/4
  15. 15 4 principles: • Assess failure modes • You are

    responsible to keep promises made. • Give agency to your users • and don’t forget any of them. In that talk, we are looking for interaction between things (APIs, not internal logic, tests, etc)
  16. 16 You are responsible to keep promises made. Don't lie

    in your code, model with types I. Systematic error management - at micro scale, in code - at macro scale, in systems errors are a signal for users, ops, dev Assess failure modes Give agency to your users and don’t forget any of them. II. III. IV.
  17. I 17 Assess errors Discover where API lies. Understand your

    model. Assess its limits.
  18. 18 How to make contract WYSIWIG just with a naively

    descriptive signature I.1
  19. getUserFromDB(id: String): User 19 Assess failure mode: Don’t lie!

  20. getUserFromDB(id: String): User 20 Assess failure mode: Don’t lie! Where

    are the lies?
  21. 👍 rules of thumb: be naively explicit contract: • structure

    inputs • enumerate outputs • no hidden constraint, dependency, or side effects getUserFromDB(id: String): User 21 Where are the lies? Assess failure mode: Don’t lie!
  22. getUserFromDB(id: UserId): IO[RudderError, Option[User]] 22 In Rudder, we write it

    like that! Assess failure mode: make your contract WYSIWYG Yep, longer. But naively explicit. Let's see why, step by step
  23. 23 Assess failure mode: WYSIWYG contract 1: structured data types

    getUserFromDB(id: String): User
  24. 24 Assess failure mode: WYSIWYG contract 1: structured data types

    • we don't get user by any string, but by ID. • Don't lie in your code. getUserFromDB(id: String): User
  25. getUserFromDB(id: String): User 25 Assess failure mode: WYSIWYG contract 1:

    structured data types • we don't get user by any string, but by ID. • Don't lie in your code • make your contract WYSIWYG getUserFromDB(id: UserID): User "I give you an user by it's ID, no way I will succeed if you give a random sentence".
  26. 26 Assess failure mode: WYSIWYG contract 2: total function getUserFromDB(id:

    UserID): User
  27. 27 Assess failure mode: WYSIWYG contract 2: total function •

    for some valid id, there's no user. • function is not total. It's always a problem. getUserFromDB(id: UserID): User
  28. 28 Assess failure mode: WYSIWYG contract 2: total function getUserFromDB(id:

    UserID): User "I give you an user by it's ID only if it exists: sometime you will have to deal with nobody" getUserFromDB(id: UserID): Option[User] • for some valid id, there's no user. • function is not total. It's always a problem. • don't lie on your part of the contract:
  29. 29 Assess failure mode: WYSIWYG contract 3: control side effects

    getUserFromDB(id: UserID): Option[User]
  30. 30 Assess failure mode: WYSIWYG contract 3: control side effects

    • sometimes, the environment fails • side effects are always an hidden error waiting to happen getUserFromDB(id: UserID): Option[User]
  31. 31 Assess failure mode: WYSIWYG contract 3: control side effects

    • sometimes, the environment fails • side effects are always an hidden error waiting to happen • make explicit that sometime, thing fails "I give you an user by it's ID if it exists and nothing fails, else YOU deal with the error." getUserFromDB(id: UserID): Option[User]
  32. 32 Assess failure mode: WYSIWYG contract 3: control side effects

    • sometimes, the environment fails • side effects are always an hidden error waiting to happen • make explicit that sometime, thing fails "I give you an user by it's ID if it exists and nothing fails, else YOU deal with the error." getUserFromDB(id: UserID): Option[User] getUserFromDB(id: UserId): IO[RudderError, Option[User]]
  33. getUserFromDB(id: UserId): IO[RudderError, Option[User]] 33 Assess failure mode: make your

    contract WYSIWYG • Ask yourself: "what are all the cases I have no idea how to deal with ?" • Model with types. • Assess failure mode by making your code contract WYSIWYG.
  34. I.2 34 models are false by construction What are nominal

    case, errors, defects ?
  35. Model? Systems? 35 Code is a model of interacting systems.

    Interaction can be expected (nominal case) or not (error) How do you decide what is what? getUserFromDB(id: UserId): IO[RudderError, Option[User]] • why is "no user for that id" a nominal case ?
  36. Model? Systems? 36 Code is a model of interacting systems.

    Interaction can be expected (nominal case) or not (error) How do you decide what is what? getUserFromDB(id: UserId): IO[RudderError, Option[User]] • why is "no user for that id" a nominal case ? It depends of the system !
  37. Systems? 37 A school is a system

  38. Systems? 38 ◦ BOUNDED group of things ◦ with a

    NAME Interacting ◦ with others systems A school is a system
  39. System interactions: nominal cases, non nominal cases Expected interaction or

    error ? 39 ◦ Play marble: ◦ win or loose => both nominal cases ◦ marble broke => likely an error ◦ game interrupted => not sure ? A school is a system
  40. Nominal cases vs Errors 40 getUserFromDB(id: UserId): IO[RudderError, Option[User]] ▪

    why is "no user for that id" a nominal case ? It depends of the system !
  41. Nominal cases vs Errors 41 getUserFromDB(id: UserId): IO[RudderError, Option[User]] ▪

    why is "no user for that id" a nominal case ? It depends of the system ! You, the developer, decide what is nominal.
  42. Nominal cases vs Errors 42 Nominal cases • expected output

    NOT ONLY the "good one"! "the game can be lost or won" • reflected in types with enumeration Errors • expected non-nominal case "a teacher interrupted the game" • reflected in type with an error type
  43. Nominal cases vs Errors 43 Nominal cases • expected output

    NOT ONLY the "good one"! "the game can be lost or won" • reflected in types with enumeration Errors • expected non-nominal case "a teacher interrupted the game" • reflected in type with an error type Everything reflected in types?
  44. Model everything? 44 getUserFromDB(id: UserId): IO[RudderError, Option[User]]

  45. Model everything? 45 java.lang.SecurityException? (jvm permission to access network) getUserFromDB(id:

    UserId): IO[RudderError, Option[User]]
  46. Model everything? 46 ⟹ where do you put the limit?

    getUserFromDB(id: UserId): IO[RudderError, Option[User]] java.lang.SecurityException? (jvm permission to access network)
  47. Systems have horizon. 47 ◦ nothing exists beyond horizon

  48. Systems have horizon. Horrors lie beyond. 48 ◦ nothing exists

    beyond horizon ◦ Like with Lovecraft: if something from beyond interact with a system, the system becomes inconsistent
  49. Errors vs Defects 49 Errors • expected non nominal case

    • reflected in types • signal for users • social construction: you propose alternatives or error Defects • unexpected case: by definition, application is in an unknown state • not reflected in types • only choice is to stop as cleanly as possible (coredump)
  50. Nominal vs Errors vs Defects 50 Errors • expected (modeled)

    non nominal cases • reflected in types with error channel Defects • non expected (out of model) cases • not reflected in types Nominal cases • expected (modeled) nominal cases • reflected in types output with enumeration
  51. Nominal vs Errors vs Defects 51 Errors • expected (modeled)

    non nominal cases • reflected in types with error channel Defects • non expected (out of model) cases • not reflected in types Nominal cases • expected (modeled) nominal cases • reflected in types output with enumeration But who choose what is what?
  52. Nominal vs Errors vs Defects 52 Errors • expected (modeled)

    non nominal cases • reflected in types with error channel Defects • non expected (out of model) cases • not reflected in types Nominal cases • expected (modeled) nominal cases • reflected in types output with enumeration But who choose what is what? YOU
  53. Horizon limit is your choice - by definition 53 java.lang.SecurityException?

  54. 54 java.lang.SecurityException? execScript(js: String): IOResult[String] In Rudder, we have a

    JS engine (JS from users): Horizon limit is your choice - by definition
  55. 55 java.lang.SecurityException? execScript(js: String): IOResult[String] In Rudder, we have a

    JS engine (JS from users): ⟹ SecurityException is an expected error case here Horizon limit is your choice - by definition
  56. 56 java.lang.SecurityException? execScript(js: String): IOResult[String] In Rudder, we have a

    JS engine (JS from users): ⟹ SecurityException is an expected error case here … but nowhere else in Rudder. By our choice. Horizon limit is your choice - by definition
  57. Your code should paint a clean and understandable model: •

    Don't lie in your code: ◦ explicit data structure ◦ impossible state unrepresentable ◦ totale function ◦ control side effects • model code as systems with: ◦ nominal cases (enumeration) ◦ errors (either success "or") ◦ defects (out of model) 57 I. Take away: WYSIWYG-ify your code with data structures
  58. II 58 Systematic error management Code: get the help of

    the compiler Parse, don't validate Effects as first class citizens Dedicated error channels
  59. Systematic error management ? 59 Like ? Check error at

    each lines ?
  60. Systematic error management ? 60 kubernetes/pkg/controller/daemon/daemon_controller.go

  61. Systematic error management ? 61 kubernetes/pkg/controller/daemon/daemon_controller.go It's REALLY good error

    management
  62. Systematic error management ? 62 kubernetes/pkg/controller/daemon/daemon_controller.go • 23 lines •

    13 for error control • nothing automated: only rely on developer diligence It's REALLY good error management, but:
  63. 63 Systematic error management ? SEEMS EXTREMELY PAINFUL AND ERROR

    PRONE kubernetes/pkg/controller/daemon/daemon_controller.go
  64. 64 • We did progress in the last 25 years

    • Let the compiler do the job for you • two tools: ◦ parse, don't validate ▪ make unrepresentable state impossibles ◦ dedicated error channel ▪ effect as first class citizens Systematic error management ?
  65. II.1 65 Parse, don't validate Make your code more and

    more precise and simple
  66. Prevention 66 Make impossible states unrepresentable Refine iteratively getUserFromDB(id: String):

    User • not all strings are a valid user id. ◦ parse only one time at the edge ◦ let all your code know about it with a dedicated data structure
  67. Prevention 67 Make impossible states unrepresentable Refine iteratively getUserFromDB(id: String):

    User • not all strings are a valid user id. ◦ parse only one time at the edge ◦ let all your code know about it with a dedicated data structure getUserFromDB(id: UserID): User
  68. Prevention 68 https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Make impossible states unrepresentable Refine iteratively

  69. Example - real code from Rudder 69 • Need to

    parse license for plugin validation
  70. Example - real code from Rudder 70 • Need to

    parse license for plugin validation • we described all possible cases ◦ from unstructured (a binary blob) ◦ to checked license
  71. Example - real code from Rudder 71 • Need to

    parse license for plugin validation • we described all possible cases • then, iteratively refine case
  72. II.2 72 let the compiler helps you Effects as value

    with explicit error channel
  73. 73 • Use the type system to automate classification of

    errors? Effect as first class citizens - effect system
  74. 74 A type system is a tractable syntactic method for

    proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. Benjamin Pierce • Use the type system to automate classification of errors? Let the compiler helps you
  75. 75 A type system is a tractable syntactic method for

    proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. Benjamin Pierce Let the compiler helps you By definition, a type system automatically categorize results ⟹ need for a dedicated error chanel + a common error trait
  76. 76 trait MyAppError // common properties of errors type PureResult[A]

    = Either[MyAppError, A] def divide(a: Int, b: Int): PureResult[Int] Let the compiler helps you A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. Benjamin Pierce By definition, a type system automatically categorize results ⟹ need for a dedicated error chanel + a common error trait
  77. 77 Effect as first class citizens - effect system •

    data structure to capture side effects with a dedicated error channel ? Same for effectful functions!
  78. 78 Effect as first class citizens - effect system •

    data structure to capture side effects with a dedicated error channel ? • this problem is hard. • (but we have a 25-years old solution in prod in thousands of big shop) Same for effectful functions!
  79. 79 Effect as first class citizens - effect system •

    solution: effects as first class value (I will let you dig into effect systems, monads, referential transparency, etc - for now: let's just say we have tech so that:) getUserFromDB(id: UserId): IO[RudderError, Option[User]] • "IO" here means: your side effecting function is now pure, with an error channel.
  80. 80 • With a dedicated error channel ◦ ~ Either[E,

    A] for pure code, ◦ else ~ IO[E, A] for effect management • and parent trait for common error properties… • we get automatic categorization of errors by compiler Let the compiler helps you
  81. 81 Let the compiler helps you • Remember the example

    with refined inputs? • error in any line stop the process • error is automatically returned ◦ no boilerplate • effects are values - possibility to augment error: ◦ aggregate similar cases, ◦ add your own control structure, ◦ decompose as you need
  82. Systematic error management ? 82 kubernetes/pkg/controller/daemon/daemon_controller.go • 23 lines •

    13 for error control • nothing automated: only rely on developer diligence It's REALLY good error management, but:
  83. 83 Systematic error management ? • 15 lines • 2

    for errors (for) • error management automated: developer focus on nominal case It's REALLY good error management, and: daemon_controller.scala - idea about how it could be • automatic error management ! • getPod can be moved around without fear of unhandled error • catchAll is a built-in combinator of ZIO • notOptional() is a one-liner self-made combinator • contextualizeError() is an other
  84. Compilers are now very potent, they can help you systematically

    assess properties: • Parse, don't validate ◦ precise parameter types ◦ iteratively refine from unstructured to structured data • Effects as values with dedicated error channel ◦ unbloat your error management ◦ let the compiler check it ◦ build your own combinators 84 II. Take away: Use types for automatic help from compiler
  85. III 85 Systematic error management Macro: use systems to materialize

    promises Program to strict interfaces and protocols
  86. 86 A bit more about systems We don't have a

    compiler everywhere to help us.
  87. We don't have a compiler everywhere to help us. Then,

    we have system analysis. 87 A bit more about systems
  88. Need for a systematic approach to error management 88 ◦

    BOUNDED group of things ◦ with a NAME Interacting ◦ with others systems A school of systems A bit more about systems
  89. A bit more about systems Need for a systematic approach

    to error management 89 ◦ BOUNDED group of things ◦ with a NAME Interacting ◦ via INTERFACES ◦ by a PROTOCOL with other systems ◦ And PROMISING to have a behavior A school of systems
  90. A bit more about systems 90 Systematic error management possible

    with clear definition of consistent sub-system in interaction. Find out: • interfaces, protocoles, promises Write down expectations: • nominal cases, errors, out of model Look for consistency in: • lifecycle, constraints, actors, locations, dependencies, maturity, ...
  91. Example? 91 Typical web application.

  92. Example? 92 Typical web application. How to keep contradictory promises?

    Promises to third parties about REST behaviour Promises to business and developers about code manageability
  93. Make promises, Keep them 93 • systems allow to bound

    responsibilities Look for consistency in: • lifecycle • actors
  94. Make promises, Keep them 94 • systems allow to bound

    responsibilities
  95. Make promises, Keep them 95 • systems allow to bound

    responsibilities Business Core sub-system: • own ADT / logic (mostly pure) • lifecycle bounded to developers understanding of needs (rapid changes)
  96. Make promises, Keep them 96 • systems allow to bound

    responsibilities Business Core sub-system: • own ADT / logic (mostly pure) • lifecycle bounded to developers understanding of needs (rapid changes) Pattern: “A pure heart (core) surrounded by side effects”* * excuse my french
  97. Make promises, Keep them 97 • systems allow to bound

    responsibilities Users of the API want stability and to know what errors can happen Business Core sub-system: • own ADT / logic (mostly pure) • lifecycle bounded to developers understanding of needs (rapid changes)
  98. Make promises, Keep them 98 • systems allow to bound

    responsibilities Business Core sub-system: • own ADT / logic (mostly pure) • lifecycle bounded to developers understanding of needs (rapid changes) REST sub-system : • own ADT / logic (mostly effects) • lifecycle bounded to REST contract: strict versioning, changes are breaking changes Users of the API want stability and to know what errors can happen
  99. Make promises, Keep them 99 • systems allow to bound

    responsibilities Business Core sub-system: • own ADT / logic (mostly pure) • lifecycle bounded to developers understanding of needs (rapid changes) REST sub-system : • own ADT / logic (mostly effects) • lifecycle bounded to REST contract: strict versioning, changes are breaking changes Stable API : interface, strict protocol & promises (nominal cases + errors) Users of the API have agency (able to react efficiently)
  100. Make promises, Keep them 100 • systems allow to bound

    responsibilities Business Core sub-system: • own ADT / logic (mostly pure) • lifecycle bounded to developers understanding of needs (rapid changes) REST sub-system : • own ADT / logic (mostly effects) • lifecycle bounded to REST contract: strict versioning, changes are breaking changes Stable API : interface, strict protocol & promises (nominal cases + errors) Users of the API have agency (able to react efficiently) Translation between sub-systems: API: interface, protocol & promises!
  101. Make promises, Keep them 101 • discover sub systems and

    their limits ◦ explore how components are coupled ◦ find or create loosely coupled sub sustems • find nominal case and error, translate them between sub-systems ◦ make errors relevant to their users • It’s a model, it’s false - but useful ◦ there is NO definitive answer. ◦ discuss, share, iterate • the bigger the promises, the stricter the API
  102. Your code, the IT on which it runs are interactive

    systems: • look for their perimeters ◦ program to interface with protocols • understand life cycle ◦ parse at the edge ◦ core business need purity for rapid iteration ◦ use adapter subsystem to manage contradictory promises 102 III. Take away: look for system limits and contracts
  103. IV. 103 Errors are a social construction to give agency

    to dev, ops, users
  104. 104 It’s a signal

  105. 105 It’s a signal The only goal of an error

    is to be analyzed by someone who will have to deal with the problem. Make that person* life easier. * it could be you. In the middle of the night.
  106. • Don’t assume what’s obvious • It’s an open world

    out there • Don’t force users to revert-engineer possible cases 106 It’s a signal make it unambiguous
  107. Checked exceptions are a good signal for users 107 Unpopular

    opinion (sure)
  108. Checked exceptions are a good signal for users Are they

    ? 108 Unpopular opinion (sure)
  109. • exceptions* are often a pile of useless ambiguity ◦

    Error ? Fatal error ? Checked ? Unchecked ? ◦ most exceptions are just a message ◦ … or hidden behind a generic throws Exception • signal must be unambiguous and actionable 109 It’s a signal make it unambiguous * NPE anyones?
  110. • exceptions* are often a pile of useless ambiguity ◦

    Error ? Fatal error ? Checked ? Unchecked ? ◦ most exceptions are just a message ◦ … or hidden behind a generic throws Exception • signal must be unambiguous and actionable 110 It’s a signal make it unambiguous ➢ be precise with your contracts and errors
  111. • exceptions* are often a pile of useless ambiguity ◦

    Error ? Fatal error ? Checked ? Unchecked ? ◦ most exceptions are just a message ◦ … or hidden behind a generic throws Exception • signal must be unambiguous and actionable 111 It’s a signal make it unambiguous ➢ think "who will react to that case ?" User, ops or dev? ➢ be precise with your contracts and errors
  112. • It's OK to not know how to deal with

    a case at some point • give agency* to deal with it at the right time * capacity to influence environment 112 It’s a signal make it unambiguous give agency
  113. 113 • 👍 Rule of thumb ◦ app/service user: can

    influence inputs. Be precise with your function parameters. ◦ ops: concerned with environment and system interaction. Likely what is in the error channel. ◦ developers: make model hypothesis, contract and limit unambiguous. On defect, core dump info. It’s a signal make it unambiguous give agency users ops dev
  114. 114 IV.Take away: give agency to dev, ops, users with

    clear signals
  115. 115 Not so popular opinion 5/4 If it's NOT on

    the path of least resistance, it won't be done consistently
  116. What’s missing for good error management in code ? 116

  117. What’s missing for good error management in code ? •

    exceptions or Go errors are A PAIN to deal with ◦ nothing is automatable ▪ no help from compiler, no tooling, no inference, nothing ◦ no composition ▪ composition: • ability to build solutions to more complex problems from solutions to simpler ones. • and provably be sure that all properties checked in the small are kept in the result. ▪ loose referential transparency* 117 * the single biggest win regarding code comprehension
  118. Make it a joy! 118 • fearless refactoring: focus on

    domain logic, deconstruct problems as you need ◦ automatic error management ◦ composition (referential transparency…) • boilerplate free, makes the code extremely readable ◦ able to add all the combinators we need! ◦ it’s cheap and can fit your domains • give access to higher level compositional tools ! ◦ ex: automatically manage resources in ALL cases ◦ simple async & concurrent structure: queues, etc Today, we have the tooling to make managing error enjoyable!
  119. 119 Let correct error management be the path of least

    resistance. Make it a joy.
  120. Question? Contact me / Chat with me! https://twitter.com/fanf42 https://github.com/fanf irc/freenode:

    fanf @fanf42:matrix.org francois@rudder.io 120 Ressources ◦ Error management: future vs ZIO A much more detailed presentation of ZIO error management capabilities https://www.slideshare.net/jdegoes/error-management-future-vs-zio ◦ Understand Things As Interacting Systems More insights on systems. https://medium.com/@fanf42/understand-things-as-interacting-systems-b273bdba5dec ◦ Parse, don't validate ! the reference article about making impossible state unrepresentable and getting help from the compiler https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ ◦ Effect tracking is commercially worthless https://degoes.net/articles/no-effect-tracking ◦ Stay Up! Journey of a Free Software Company. One decade in search for a sustainable model https://medium.com/@fanf42/stay-up-5b780511109d Images ◦ scientists checking logs: https://www.quantamagazine.org/hope-rekindled-for-abc-proof-20151221 ◦ mountains: Game arts for Forest of Liars https://imgur.com/t/gaming/zOcPpG1
  121. 121 Don't lie in your code, model with types: -

    explicit data types - total and pure functions - knows your limits: Defects vs Errors Systematic error management: - At micro scale, in code: parse, don't validate; use dedicated error channel - At macro scale, in systems: program to strict interfaces and protocols errors are a signal for users, ops, dev: - users: agency to understand nominal case - ops: agency to correct errors - dev: agency to model deliberately (with joy) You are responsible to keep promises made. I. Assess failure modes Give agency to your users and don’t forget any of them. II. III. IV . Make it extremely convenient V.
  122. Full example - real code from Rudder 122 • inference

    just works • each sub-system add relevant information • simple combinators (in white) used as syntax sugar (None, msg) => Unexpected(msg) PureResult[A] => IOResult[A] (err: RudderError[A], msg) => Chained(msg, err) error contextualisation between systems
  123. • What about making impossible state unrepresentable from the beginning?

    ◦ That’s a very good point and you should ALWAYS try to do so. The idea is to change method’s domain definition (ie, the parameter’s shape) to only work on inputs that can’t rise errors. Typically, in my trivial “divide” example, we should have use “non zero integer” for denominator input. ◦ Alexis King (@lexy_lambda) wrote a wonderful article on that, so just go read it, she explains it better than I can: “Parse, don’t validate” https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ ◦ We use that technique a lot in Rudder to drive understanding of what is possible. Each time we can restrict domain definition, we try to keep that information for latter use. ◦ Typical example: parsing plugin license (we have 4 “xxxLicenses” classes depending what we now about its state); Validating user policy (again several “SomethingPolicyDraft” with different hypothesis needed to build the “Something”). ◦ the general goal is the same than with error management: assess failure mode, give agency to users to react efficiently. ◦ There’s still plenty of cases where that technique is hard to use (fluzzy business cases…) or not what you are looking for (you just want to tell users that something is the nominal case, or not, and give them agency to react accordingly). Some questions asked after the talk 123
  124. Some questions asked after the talk 124 • Is SystemError

    used to catch / materialize failure ? ◦ no, SystemError is here to translate Error that need to be dealts with (like connection error to DB, FS related problem, etc) but are encoded in Java with an Exception. SystemError is not used to catch Java “OutOfMemoryError”. These exception kills Rudder. We use the JVM Thread.setDefaultUncaughtExceptionHandler to try to give more information to dev/ops and clean things before killing the app.
  125. Some questions asked after the talk 125 • You have

    only one parent type for errors. Don’t you lose a lot of details with all special errors in subsystems losing the specificities when they are seen as RudderError? ◦ this is a very pertinent question, and we spend a log of time pondering between the current design and one where all sub-systems would have their own error type (with no common super type). In the end, we settled on the current design because: ▪ no common super type means no automatic inference. You need to guide it with transformer, and even if ZIO provide tooling to map errors, that means a lot of useless boilerplate that pollute the readability of your code. ▪ there is common tooling that you really want to have in all errors (Chained, SystemError, but also “notOptional”, etc). You don’t want to rewrite them. Yes type class could be a solution, but you still have to write them, for no clear gain here. ▪ you are fighting the automatic categorization done by the compiler in place of leveraging it. ▪ The gain (detailed error) is actually almost never needed. When we switched to “only one super class for all error”, we saw that “Chained” is sufficient to deals with general trans-system cases, and in some very, very rare cases, you case build ad-hoc combinators when needed, it’s cheap. ◦ So all in all, the wins in convenience and joy of just having evering working without boilerplate clearly outpaced the not clear gain of having different error hierarchies. ◦ The problem would have been different if Rudder was not one monolithic app with a need of separated compilation between services. I think we would have made an “error” lib in that case.
  126. Some questions asked after the talk 126 • We use

    Future[Either[E,A]] + MTL, why should we switch to ZIO? ◦ Well, the decision to switch is yours, and I don’t know the specific context of your company to give an advice on that. Nonetheless, here is my personal opinion: ▪ ZIO stack seems simpler (less concepts) and work perfectly with inference. Thus it may be simpler to teach it to new people, and to maintain. YMMV. ▪ ZIO perf are excellent, especially regarding concurrent code. Fibers are a very nice abstraction to work with. ▪ ZIO enforce pure code, which is generally simpler to compose/refactor. ▪ ZIO tooling and linked construction (Managed resources, Async Queues, STM, etc) are a joy to code with. It removes a lot of pains in tedious, boring, complicated tasks (closing resources correctly, sync between concurrent access, etc) ▪ pertinent stack trace in concurrent code is a major win • But at the end of the day, you decide!
  127. Some questions asked after the talk 127 • How long

    did it took to port Rudder to ZIO? ◦ It’s complicated :). 1 month of part time (me), plus lots more time for teaching, refactoring, understanding new paradigm limits, etc ▪ 1/ we didn’t started from nowhere. We were using Box from liftweb, and a lot of the code in Rudder was already “shaped” to deal with errors as explain in the talk (see https://issues.rudder.io/issues/14870 for context) ▪ 2/ we didn’t ported all Rudder to ZIO. I estimated that we ported ~ 40% of the code (60k-70k lines ?). ▪ 3/ we did some major refactoring along the lines, using new combinators and higher level structures (like async queues) ▪ 4/ we started in end of 2018, when ZIO code was still moving a lot and we switch to new things we when became available (ZIO 1.0.0 is around the corner and it as been quite stable for months now) ▪ we spent quite some time looking for the best choice for errors between sub-system (see other question)
  128. Some questions asked after the talk 128 • Your system

    part is very interesting thank you but what about hexagonal architecture / clean code / onion architecture / etc ? ◦ I don't really care of the exact name, what is important for me, the core idea that need to be internalized and shared, is that in a any complex construction, there is specialized subparts that communicate between them, and that high coupling really means "no subparts". ◦ and the first baby, huge step, is to identify that these subpart exists, and discussed what they are, and give them autonomy. ◦ and first practice that helps for that is: make as explicit as possible your system interfaces
  129. Some questions asked after the talk 129 • next time

  130. OK. But why does it matter for me? 130 ?

  131. Why does it matter for me ? 131 • Social

    aspects : ◦ "good citizenship" ▪ in de devops and other polycultural teams, don't be the bad guys ▪ you're likely to become the one on call at some point ◦ accountability ▪ growing tendency to make companies, and dev, legally responsible for critical bugs) • Tech aspects : ◦ efficiency of development ▪ don't become a liability, especially for your future-self ▪ get leverage on your code ◦ joy of development ▪ make the path of least resistance the correct one
  132. Paint me interested. But in concret terms ? 132 ?

  133. core idea: • les développeurs sont responsables de leur code

    et doivent rendre des comptes • ceci implique une compréhension fine de ce qui est géré, et de ce qui ne l'est pas • les cas d'erreurs sont très nombreux, donc on doit se faire aider: ◦ en réduisant leur nombre, ◦ en précisant les cas attendu, non attendus ◦ en utilisant les types et l'aide du compilateur, qui excelle à ce sujet ◦ on adoptant des méthodes de traçage systématique des erreurs
  134. Composable code 134 Composable, in functional programing: • The ability

    to build solutions to more complex problems from solutions to simpler ones. • and be provably be sure that all properties checked in the small are kept in the result. • use referential transparency, ◦ WYSIWYG of code: everything needed for the function is stated as input or output: no side effects, everything is exhaustive; global states passed as arguments •
  135. Which intent is less ambiguous? 135 blobzurg(a: Int, b: Int):

    Option[Int] blobzurg(a: Int, b: Int): PureResult[DivideByZero, Int] It’s a signal make it unambiguous give agency
  136. 136 Effect as first class citizens - Referential transparency •

    the single one thing that makes code simpler and more maintainable: ◦ referential transparency "An expression is called referentially transparent if it can be replaced with its corresponding value without changing the program's behavior"
  137. 137 Effect as first class citizens - Referential transparency •

    starting to use referential transparency is liberating and exulting • fearless refactoring, simpler code, autonomous snippets, no mutable global states, no side effects, focus on logic, … • MASSIVE win for systematic error management and actionable code
  138. 138 Effect as first class citizens - Referential transparency •

    Problem: effects ? • Breaks referencial transparency • force you to do this then do that
  139. 139 Effect as first class citizens - Referential transparency •

    Solution: effects as value • allows to deconstruct program as it fit
  140. 140 Empty list ? • non total functions are a

    lie ("total": all inputs lead to a well defined output, ie not to an exception) ◦ your promises are unsound ◦ your users can’t react appropriately head(l: List[Int]): Int Assess failure mode: Don’t lie!
  141. Don’t lie! Model output 141 Empty list ? • make

    functions totale • make it unambiguous & let other decide what is an error head(l: List[Int]): Either[EmptyListError, Int] head(l: List[Int]): Option[Int] head(l: List[Int]): Int
  142. Don’t lie! 142 getUserFromDB(id: UserId): User

  143. Don’t lie! 143 No such user ? (non total) getUserFromDB(id:

    UserId): User
  144. Don’t lie! 144 No such user ? (non total) DB

    connexion error? getUserFromDB(id: UserId): User
  145. Don’t lie! 145 No such user ? (non total) DB

    connexion error? • non pure functions are a lie ◦ your promises are unsound ◦ your users can’t react appropriately getUserFromDB(id: UserId): User
  146. Don’t lie! 146 • non pure functions are a lie

    ◦ your promises are unsound ◦ your users can’t react appropriately getUserFromDB(id: UserId): User • use effect systems: magically transform impure code to pure one
  147. Don’t lie! 147 • assess the fact that code is

    doing side effect (remember: WYSIWYG) • make effects first class citizen: ◦ they are just value that can be transformed, stored, etc ◦ referential transparency What gain ?
  148. Don’t lie! 148 • "The most disruptive simplification brought by

    FP" • WYSIWYG of code referential transparency changing a method by its result does not change program ▪ code only works with function inputs (no global environment) ▪ only changes are in outputs (no side effects)
  149. 149

  150. Sound promises 150 • use total functions ◦ or make

    them total with union return type • use pure functions ◦ or make them pure with effect systems • Don’t lie in your code, • allow dev to react efficiently: • make effects simple values and profit of referential transparency
  151. 151 In Rudder: Why ZIO?

  152. Why ZIO ? 152 • you still have to think

    in systems by yourself
  153. Why ZIO ? 153 • you still have to think

    in systems by yourself • then ZIO provides : ◦ effect management ◦ with an explicit error channel ◦ IO[+E, +A] val pureCode = IO.effect(effectfulCode)
  154. Why ZIO ? 154 • you still have to think

    in systems by yourself • then ZIO provides : ◦ debuggable failures Complex error composition Async code trace
  155. Why ZIO ? 155 • you still have to think

    in systems by yourself • then ZIO provides : ◦ tons of convenience to manipulate errors ▪ create: from Option, Either, value... ▪ transform: mapError, fold, foldM, .. ▪ recovery: total, partial, or else ◦ composable effects ▪ .bracket / Managed, asyncqueues, STM, etc • safe, composable resource management
  156. Why ZIO ? 156 • you still have to think

    in system by yourself • then ZIO provides : ◦ effect management ◦ with an explicit error channel ◦ debuggable failures ◦ tons of convenience to manipulate errors ◦ composable
  157. Why ZIO ? 157 • you still have to think

    in system by yourself • then ZIO provides : ◦ effect management ◦ with an explicit error channel ◦ debuggable failures ◦ tons of convenience to manipulate errors ◦ composable • Everything work in parallel, asynchronous code too! • Inference just work!
  158. Why ZIO ? 158 • you still have to think

    in system by yourself • then ZIO provides : ◦ effect management ◦ with an explicit error channel ◦ debuggable failures ◦ tons of convenience to manipulate errors ◦ composable • Everything work in parallel, concurrent code too! • Inference just work! Lots of details: “Error Management: Future vs ZIO” https://www.slideshare.net/jdegoes/error-management-future-vs-zio
  159. 159 In Rudder, with ZIO: we settled on that

  160. One error hierarchy 160 • One error type (trait) providing

    common tooling
  161. Unambiguous type 161 • one result type for pure terms,

    • one that encapsulates effects
  162. Generic, useful errors 162 • java exceptions are translated into

    SystemError • Chained allows to add context for humans • Accumulated groups several errors into one
  163. Specialized error for subsystems 163 • real code from rudder

    ⇒ specialized errors for the LDAP subsystem ⇒ adapt semantic from java lib (exceptions) to pure value that can be composed and behave as others errors in Rudder (printable information)
  164. 164 Pure, total functions don’t lie about your promises Explicit

    error channel make it unambiguous in your types Program to strict interfaces and protocols use systems to materialize promises Composition and tooling make it extremely convenient to use Assess failure modes. Give agency to your users and don’t forget any of them. You are responsible to keep promises made. 1. 2. 4. 5. Failures vs Errors models are false by construction 3.