Keep Your Data Safe With Refined Types

Keep Your Data Safe With Refined Types

Regardless of whether you use a statically or dynamically typed language, specifying your inputs and outputs is a very important step in system design. If you are not surgically precise in defining which data your program takes and produces, you are looking for trouble during the operation phase of the system lifecycle. Making guesses and undeclared assumptions might be easier when writing the code but will certainly bite you as your system lives in production.

Clojure, being dynamically typed, might not give you strong compile-time guarantees. But enforcing the shape of the data on system boundaries allows us to have an untyped data transformation layer and stay sane. Today, we will look into specific Clojure instruments for dealing with strongly shaped data, pitfalls, and hard lessons we’ve learned so far delivering reliable and maintainable systems in Clojure.

B9b7a5ffa24e2af6f877a7950461ba0f?s=128

Oleksii Kachaiev

June 19, 2018
Tweet

Transcript

  1. Keep Your Data Safe With Refined Types Doing Clojure, Sleeping

    Well ™ Oleksii Kachaiev, @kachayev
  2. @Me • CTO at Attendify • 5+ years with Clojure

    in production • Creator of Muse | Aleph & Netty contributor • More: protocols, algebras, Haskell, Idris • @kachayev on Twitter & Github
  3. What Am I Even Talking About?

  4. "No Types" ™ In The Wild I don't like the

    term "dynamic language" But you all know what I mean Almost no compile-time correctness guarantees
  5. "No Types" ™ In The Wild You still can do

    a lot and go really far Less data structures requires less checks, right? Kinda "banned" topic by the community
  6. What Can Possibly Go Wrong?™

  7. U Y No Types? We still need some kind of

    "types" • to model data in advance • to validate your data Otherwise you'll mess something up quickly
  8. a choice between “you want to take your pain up

    front or gradually over time” — Clojure the Devil…is in the detail
  9. Any data-intense application is built around the model that's being

    implemented in a dynamically typed language remains informally defined and requires the number of prays quadratic to the number of non-defined data types. At some point supporting such a system becomes indistinguishable from magic.
  10. I call this "Harry-from- Hogwarts isomorphism". — Oleksii Kachaiev

  11. To Take From This Talk • non-defined data shape =

    someone's assumption • simple when designing, impossible when operating • typing data with Int and String doesn't help a lot • you don't have to type data transformations • (as long as input & output are covered)
  12. When To Validate? • RPC request comes in • RPC

    response comes out • Reading from & writing to DB (disks, caches etc) • Reading from & writing to Kafka (queues, logs etc) • And more!
  13. Introducing schema

  14. (require '[schema.core :as s]) (def Event {:id s/Uuid :name s/Str

    :online? s/Bool :sits s/Num :tickets [{:id s/Uuid :title s/Str :description (s/maybe s/Str) :quantity s/Num (s/optional-key :price) s/Num :status (s/enum :open :closed)}]})
  15. Are Those Even "Types"? Checking things in runtime opens a

    lot of doors! Idris built on the idea that types are values Same goes for your runtime It's just data! You can operate it as you need.
  16. (def TicketStatus (s/enum :open :closed)) (def Ticket {:id s/Uuid :title

    s/Str :description (s/maybe s/Str) :quantity s/Num (s/optional-key :price) s/Num :status TicketStatus}) (def Event {:id s/Uuid :name s/Str :online? s/Bool :sits s/Num :tickets [Ticket]}) (def CreateTicketRequest (dissoc Ticket :id))
  17. (s/check Ticket {:id 42 :title "Early Bird" :text "Some randomness"

    :quantity 100 :status :skipped}) ;; => {:id (not (instance? java.util.UUID 42)) :description missing-required-key :status (not (#{:open :closed} :skipped)) :text disallowed-key}
  18. How Safe We Are Now?

  19. Our Goals Are • Readability and Soundness (harder than it

    seems) • Being as precise as we can • Avoid as many bugs as possible • Provide clean and useful error messages • Keep serialization and business logic separated
  20. Story #1 Fantastic nil And Where To Find It (shameless

    spoiler: everywhere)
  21. How To Define Optional? #1: s/maybe (def TicketDescription {:description (s/maybe

    s/Str)}) ;; works {:description nil} {:description "A lot of free places!"} ;; doesn't {:description 1457} {:text "Really a lot!"}
  22. How To Define Optional? #1: s/maybe Looks good! It's "type-safe"

    and it's functional! (functor, wheeee) But it's still error-prone ☹
  23. How To Define Optional? #1: s/maybe (let [draft (read-from-db db-conn

    ticket-id)] (rpc/call "createFreeTicket" {:id ticket-id :title (:title draft) :description (sanitize-html (:decsription draft)) :price default-price :quantity 100}))
  24. Clojure is the nil-tolerant language. You will miss something. Sooner

    or later.
  25. Tolerant-Reader They Say ;; clojure.spec has "open keys space" design

    ;; meaning unknown keys are OKay ;; combine with nil-able values (rpc/call "createFreeTicket" {:id ticket-id :title (:title draft) :decsription (sanitize-html (:description draft)) :price default-price :quantity 100}) ;; welcome to data hell ;; :trollface:
  26. How To Define Optional? #2: optional-key (def TicketDescription {(s/optional-key :description)

    s/Str}) ;; now you have more work to do (cond-> ticket (do-i-have-description?) (assoc :description "This would be amazing!")) (harder to mess up, but still... )
  27. How To Define Optional? #3: All The Above! (def TicketDescription

    {(s/optional-key :description) (s/maybe s/Str)}) ;; hmm... now you can pass whatever you want!
  28. How To Define Optional? #3: All The Above! Your API

    users will beg you for this! It's so super flexible! Please, just don't.
  29. How To Define Optional? Being "type-safe" is not a goal

    Being "functional" is not a goal Our goal is to reduce number of errors
  30. Define Optional With Own "Void" ;; domain specific voided value

    (def UnlimitedPurchase {:limited? (s/eq false)}) (def PurchaseLimit {:limited? (s/eq true) :limit s/Num}) ;; generic voided value (def NoTicketDescription {:description {:nothing (s/eq true)}}) (def TicketDescription {:description {:just s/Str}})
  31. Story #2 Staying Precise With Constraints

  32. Be Precise! (def PositiveInt (s/constrained s/Int pos? 'should-be-positive)) (def NonEmptyStr

    (s/constrained s/Str #(not (clojure.string/blank? %)) 'should-not-be-blank))
  33. Combine Things! (defn BoundedListOf [dt left right] (s/constrained [dt] #(<=

    left (count %) right) 'collection-length-should-conform-boundaries)) {:id s/Uuid :name NonEmptyStr :online? s/Bool :sits PositiveInt :tickets (BoundedListOf Ticket 1 25)}
  34. Express Business Rules (def -Event {:id s/Uuid :name s/Str :online?

    s/Bool :sits s/Num :tickets [Ticket]}) (def Event (-> -Event (s/constrained (fn [{:keys [sits tickets]}] (>= sits (apply + (map :quantity tickets)))) 'tickets-quantities-should-not-exceed-sits-count) (s/constrained ...)))
  35. Story #3 Sum Types ↠ Conditionals

  36. Sum Types data Result a b = Ok a |

    Error b enum Result<T, E> { Ok(T), Error(E), } type result('good, 'bad) = | Ok('good) | Error('bad);
  37. Sum Types In Clojure (defn Result [ok error] (s/either {:ok

    ok} {:error error})) WARN: Deprecated!
  38. Sum Types In Clojure (personal opinion) schema is designed for

    validation, not modeling No "difference by construction" Easy to mess up
  39. s/conditional Instead

  40. Just Specify Discriminator (defn Result [ok error] (s/conditional #(contains? %

    :ok) {:ok ok} #(contains? % :error) {:error error})) (better, but still... ! )
  41. Example #2.1 Free or Paid?

  42. This Is Bad :( (def Ticket {:id Id :type (s/enum

    "free" "paid") :name NonEmptyStr :quantity (TypedRange int 1 1e4) :description (Maybe NonEmptyStr) (s/optional-key :priceInCents) PositiveInt (s/optional-key :taxes) [Tax] (s/optional-key :fees) (s/enum :absorb :pass) :status (e/enum :open :closed)})
  43. Way Better! (def FreeTicket {:id Id :type (s/eq "free") :title

    NonEmptyStr :quantity (TypedRange int 1 1e4) :description (Maybe NonEmptyStr) :status (e/enum :open :closed)}) (def PaidTicket (assoc FreeTicket :type (s/eq "paid") :priceInCents PositiveInt :taxes [Tax] :fees (s/enum :absorb :pass)))
  44. After Cosmetic Changes... (def Ticket (s/conditional #(= "free" (:type %))

    FreeTicket #(= "paid" (:type %)) PaidTicket)) turned into (def Ticket (dispatch-on :type "free" FreeTicket "paid" PaidTicket))
  45. Example #2.2 Scroll API

  46. (def EmptyScrollableList {:items (s/eq []) :totalCount (s/eq 0) :hasNext (s/eq

    false) :hasPrev (s/eq false) :nextPageCursor (s/eq nil) :prevPageCursor (s/eq nil)}) (defn NonEmptyScrollableList [dt] (dispatch-on (juxt :hasNext :hasPrev) [false false] (SinglePage dt) [true false] (FirstPage dt) [false true] (LastPage dt) [true true] (ScrollableListSlice dt))) (defn ScrollableList [dt] (dispatch-on :totalCount 0 EmptyScrollableList :else (NonEmptyScrollableList dt)))
  47. Refine All The Types!

  48. How Are We Doing So Far?

  49. (def -Ticket {:id s/Uuid :title s/Str :description (s/maybe s/Str) :quantity

    s/Num (s/optional-key :price) s/Num :status (s/enum :open :closed)}) (def Ticket (s/constrained -Ticket (fn [{:keys [quantity status]}] (or (= :closed status) (< 0 quantity))))) (def CreateTicketRequest (dissoc Ticket :id :status))
  50. So Far So Good? (s/check CreateTicketRequest {:title "Works?" :description "Probably

    not :(" :quantity 10}) ;;=> {:id missing-required-key, :status missing-required-key} ;; but why? (class -Ticket) ;;=> clojure.lang.PersistentArrayMap (class Ticket) ;;=> schema.core.Constrained
  51. None
  52. [com.attendify/schema-refined "0.3.0-alpha4"]

  53. schema-refined • https://github.com/KitApps/schema-refined • schema on steroids (a lot of

    them) • refined: constrained on steroids • Struct: product types (maps) on steroids • StructDispatch: conditional on steroids
  54. Very Motivational Examples

  55. Predicates ;; "manually" with refined and predicates (def LatCoord (r/refined

    double (r/OpenClosedInterval -90.0 90.0))) ;; the same using built-in types ;; (or functions to create types from other types, a.k.a. generics) (def LngCoord (r/OpenClosedIntervalOf double -180.0 180.0)) ;; Product type using a simple map (def GeoPoint {:lat LatCoord :lng LngCoord}) ;; using built-in types (def Route (r/BoundedListOf GeoPoint 2 50))
  56. Now What? (def input [{:lat 48.8529 :lng 2.3499} {:lat 51.5085

    :lng -0.0762} {:lat 40.0086 :lng 28.9802}]) ;; Route now is a valid schema, ;; so you can use it as any other schema (schema/check Route input)
  57. Predicates... More! (def InZurich {:lat (r/refined double (r/OpenInterval 47.34 47.39))

    :lng (r/refined double (r/OpenInterval 8.51 8.57))}) (def InRome {:lat (r/refined double (r/OpenInterval 41.87 41.93)) :lng (r/refined double (r/OpenInterval 12.46 12.51))})
  58. Predicates... Compose! ;; you can use schemas as predicates ;;

    First/Last are good examples of predicate "generics" (def RouteFromZurich (r/refined Route (r/First InZurich))) (def RouteToRome (r/refined Route (r/Last InRome))) ;; And, Or, Not, On (def RouteFromZurichToRome (r/refined Route (r/And (r/First InZurich) (r/Last InRome))))
  59. Predicates... Compose! ;; or even more (def FromZurichToRome (r/And (r/First

    InZurich) (r/Last InRome))) (defn LessNHops [n] (r/BoundedSize 2 (+ 2 n))) (def RouteFromZurichToRomeWithLess3Hops (r/refined Route (r/And FromZurichToRome (LessNHops n))))
  60. Readability Matters. A Lot ;; following the rule ;; {v:

    T | P(v)} (def Coord (refined double (OpenClosedInterval -180.0 180.0))) ;; #Refined{v: double | v ∈ (-180.0, 180.0]} (def QuickRoute (BoundedListOf double 2 4)) ;; #Refined{v: [double] | (count v) ∈ [2, 4]} (refined [double] (Rest (OpenInterval 0 1))) ;; #Refined{v: [double] | ∀v' ∊ (rest v): v' ∈ (0, 1)}
  61. Products Sums Guards

  62. (def -FreeTicket (Struct :id Id :type (s/eq "free") :title NonEmptyStr

    :quantity (OpenIntervalOf 1 1e4) :description (s/maybe NonEmptyStr) :status (s/enum :open :closed))) (def FreeTicket (guard -FreeTicket '(:quantity :status) enough-sits-when-open)) ;; #<StructMap {:description (constrained Str should-not-be-blank) ;; :type (eq "free") ;; :title (constrained Str should-not-be-blank) ;; :status (enum :open :closed) ;; :id java.lang.String ;; :quantity (constrained int should-be-bounded-by-range-given)} ;; Guarded with ;; enough-sits-when-open over '(:quantity :status)>
  63. Carry Guards Carefully (def -PaidTicket (assoc FreeTicket :type (s/eq "paid")

    :priceInCents PositiveInt :taxes [Tax] :fees (s/enum :absorb :pass))) (def PaidTicket (guard -PaidTicket '(:taxes :fees) pass-tax-included)) ;; #<StructMap {...} ;; Guarded with ;; enough-sits-when-open over '(:quantity :status) ;; pass-tax-included over '(:taxes :fees)>
  64. Respectful Sum Type (def Ticket (StructDispatch :type "free" FreeTicket "paid"

    PaidTicket)) ;; #<StructDispatch on '(:type): ;; free => {...} ;; paid => {...}> (def CreateTicketRequest (dissoc Ticket :id :status)) ;; this works!
  65. Track Guards Applicability (dissoc PaidTicket :status) ;; #<StructMap {...} ;;

    Guarded with ;; pass-tax-included over '(:taxes :fees)> ;; (only one guard left)
  66. Prevent Self-Shooting

  67. Catch Modeling Issues In Advance (def CreateFreeTicket (dissoc Ticket :type))

    ;; CompilerException java.lang.IllegalArgumentException: ;; You are trying to dissoc key ':type' ;; that is used in dispatch function. ;; Even thought it's doable theoretically, ;; we are kindly encourage you ;; to avoid such kind of manipulations. ;; Otherwise it's gonna be a mess. ;; , compiling:(form-init467997445288647843.clj:1:23)
  68. Philosophical Extending type (assoc, merge etc) is simpler • by

    implementation • to catch mentally We still fully support reduction (dissoc) "Request" types are a perfect use case
  69. What's Inside StructMap ;; potemkin's helper creates type that acts

    like a map (def-map-type StructMap [data ;; <- key/value pairs by themself guards ;; <- guards appended mta] ;; <- meta information (meta [_] ...) (with-meta [_ m] ...) (keys [_] ...) (assoc [_ k v] ...) (dissoc [_ k] ...) (get [_ k default-value] ...))
  70. What's Inside (def-map-type StructDispatchMap [keys-slice downstream-slice ;; <- keys slices

    collected from options dispatch-fn options guards updates ;; <- delayed assoc, dissoc operations mta] ...)
  71. Protocol To Deal With Guards (defprotocol Guardable (append-guard [this guard])

    (get-guards [this]))
  72. Put Everything Together (extend-type StructMap ;; same for StructDispatch Guardable

    (append-guard [^StructMap this guard]) (get-guards [^StructMap this]) s/Schema (spec [this] this) (explain [^StructMap this]) schema-spec/CoreSpec (subschemas [^StructMap this]) (checker [^StructMap this params])) (defmethod print-method StructMap ;; same for StructDispatch [^StructMap struct ^java.io.Writer writer])
  73. So What?..

  74. What Do We Have Now? • Less tests (way-way-way less)

    • Less bugs (way-way-way less) • More confidence • Better sleep
  75. Next...

  76. Error Messages • Clean & friendly errors are hard •

    You should invest a lot from the very beginning • .. just this to make it happen • Context sensitivity is super useful • Machines and human craves for different messages
  77. More Features • Separated "business" and "serialization" logic • Catch

    more "unreasonable" predicates • Support for "generics" • ... functions are not always the best fit
  78. Thanks! q&a please