Refine your Scala code

Refine your Scala Code Tech Triveni 2.0 | New Delhi
Ajay Viswanathan

About me Sr. Software Engineer @ Agoda My FP journey
has taken me through Big Data, BackEnd and FrontEnd Learning FP and Scala for the past 5 years Reach out to me on my blog scala.ninja This Presentation was written in markdown and compiled using marp-cli Some code examples may require you to use Scala 2.13 or Dotty

Motivation for this talk In my experience of working with
teams building products in Scala, my #1 gripe has always been If you wanted to write it that way, why don't you do it in Java? Why even use Scala? Instead, I would rather write it using <insert XYZ pattern/library> Just like you can have multiple solutions to a math problem, but the elegant solutions are sparse Similarly, just idiomatic Scala code is not enough, especially if you don't utilize the full potential of what is available

Controversial statement alert! Scala is like a Liberal Oligarchy Liberal
- because it accepts a wide variety of coding styles without much complaint Oligarchy - because the core language is in the hands of few people and cannot cope up with the advancements all around

Typelevel 101 Before I head into the main topic of
this presentation, allow me to introduce a few terminologies that would come in handy in keeping up with some of the concepts. Scala is powered by an under-appreciated type-system Without getting jumbled in words like SKI Combinator Calculus, let me say that Scala's type system is Turing Complete This fact comes in handy when the compiler has to make inferences about your untyped code. There is nothing magical about working with Types. It is based on inductive logic I make no pretense of any mathematical formality in the following over simplifications I'm about to explain

A statically typed language like Scala has two domains -
a type domain, and a value domain. case class Name(firstName: String, lastName: String) val n1 = Name("ajay", "viswanathan") // Name -> Type // n1 -> Value Like classes, types can also have constructors type IntMap[T] = Map[Int, T]

The key concept to note is Types can be reasoned
about at Compile Time, whereas Values can only be reasoned about Runtime What this implies is Typelevel programming allows us to catch errors at Compile Time which would otherwise be dealt with at Runtime

Design Pattern #1: Phantom Types An abstract type that is
never initialized, hence has no effect on runtime It is only used to prove static properties using type evidences The compiler erases these once it can prove that the constraints hold sealed trait Status sealed trait Red extends Status sealed trait Orange extends Status sealed trait Green extends Status class TrafficSignal[T <: Status] { private def to[U <: Status]: TrafficSignal[U] = this.asInstanceOf[TrafficSignal[U]] def stop(implicit ev: T =:= Orange): TrafficSignal[Red] = to[Red] def start(implicit ev: T =:= Red): TrafficSignal[Orange] = to[Orange] def go(implicit ev: T =:= Orange): TrafficSignal[Green] = to[Green] def slow(implicit ev: T =:= Green): TrafficSignal[Orange] = to[Orange] } val signal = new TrafficSignal[Red] signal.stop // Compilation Error: Cannot prove that TrafficLight.Red =:= TrafficLight.Orange signal.start.go // Compilation Successful

Design Pattern #2: Path dependent types Scala allows you to
define types inside a type This is useful when only partial type information is available at compile time The runtime type can be encapsulated case class Cart(user: String, value: Int = 0) { case class Item(name: String, amount: Int) def add(item: this.Item): Cart = copy(value = value + item.amount) } val cart = Cart("suresh") val item: cart.Item = cart.Item("perk", 2) val valid = cart.add(item) // Cart(suresh, 2) val invalid = Cart("ramesh").add(item) // Compilation Error

Design Pattern #2 Path Dependent Types Redux: AUX pattern We
don't always have the luxury of knowing certain data types at compile time When we want a function to dependent on a runtime type, we can use Typeclasses to work around this problem Say there exists a function that looks for a typeclass based on the runtime value trait Param[T, U] { def convert(obj: T): Option[U] } trait Predicate[U] { def eval(values: List[U], value: U): Boolean } def evaluate[T, U](request: T, values: List[U])(implicit param: Param[T, U], predicate: Predicate[U]): Boolean = { param.convert(request).exists(v => predicate.eval(values, v)) }

implicit val stringPredicate = new Predicate[String] { def eval(values: List[String],
value: String) = values.contains(value) } implicit val intPredicate = new Predicate[Int] { def eval(values: List[Int], value: Int) = values.forall(_ > value) } implicit val stringIntParam = new Param[String, Int] { def convert(obj: String): Option[Int] = Try(obj.toInt).toOption } ... evaluate("3", List(2, 3, 4)) // false evaluate("3", List(4, 5, 6)) // true

We could rewrite the same using Path dependent types as
trait Param[T] { type U def convert(obj: T): Option[U] } def evaluate[T, U](request: T, values: List[U])(implicit param: Param[T], predicate: Predicate[param.U]) // Does not compile We introduce the Aux pattern here to help the compiler reason about the type inference object Param { type Aux[T0, U0] = Param[T0] { type U = U0 } } def evaluate[T, U](request: T, values: List[U])(implicit ev: Param.Aux[T, U], predicate: Predicate[U]): Boolean = { ev.convert(request).exists(v => predicate.eval(values, v)) }

Design Pattern #3: Singleton types Every Singleton object in Scala
is its own type Singleton types bridge the gap between the value level and the type level object Data { val v1: String = "data" } object FakeData { val v1: String = "virus" } def process(obj: Data.type) = println(obj.v1) // data

Design Pattern #3 (Experimental): Literal-Based Singleton Types SIP 23 -
Proposal to integrate with Scala 3 Will allow Literals to appear in type-position val one: 1 = 1 Sample use-case: Matrix Multiplication type Dim = Singleton & Int case class Matrix[A <: Dim, B <: Dim](a: A, b: B) { def *[C <: Dim](other: Matrix[B, C]): Matrix[A, C] = Matrix(a, other.b) }

Designing your Data Types Problems with Stringly defined code Prone
to bad inputs if not error-checked Have to handle plenty of meaningless inputs as edge-cases Much of enterprise code ends up relying on trust of developer that they will not break it Solution: Make bad input impossible to construct Why should we write code that handles what it shouldn't be handling? It only leads to more tests, more stupid tests What are the issues you may face with such a Data domain? case class Timestamp(date: String, hour: Int)

case class Timestamp(date: String, hour: Int) So many questions: What
is the date format expected? Are all formats handled? Day/Month/Year??? Hour in 12/24h format? Hour is 0-indexed?

Let's try to fix this // With Self-checks def verifyDate(date:
String, format: String = "yyyyMMdd"): Boolean case class Timestamp(date: String, hour: Int) { require(hour > 0 && hour <= 24) require(verifyDate(date)) } // With Structural Enforcement def apply(year: Int, month: Int, day: Int, hour: Int) = Timestamp(s"$year-$month-$day", hour)

Opaque vs Transparent Design The first consideration while designing a
DataType is the level of flexibility or encapsulation you wish to provide In other words, how opaque or transparent do you want your DataType to be? // opaque DataType class ParseError(index: Int, input: String) { def line(): Int def position(): Int def message(): String } // transparent DataType case class ParseError(line: Int, position: Int, message: String)

Opacity and Transparency is a Spectrum Opacity enforces invariants ParseError(-1,
-3, "random log") vs ParserError.message = "[ERROR] Message" Opacity can save on Defensiveness Transparent design can enforce invariants but by adding asserts and other checks using Self-checks or Structural Enforcements Transparency reduces complexity Is the opaque data computing at initialization? recomputing? caching? How much memory does it use in storing input?

Example of bad data domain case class Request(guid: String)

Approach #1: Using type alias type GUID = String case
class Request(guid: GUID) Does nothing more than making it easier to read

Approach 2: Using a (value) class case class GUID(value: String)
Compiler can check for the right instance, but still nothing in terms of safety of input Boxing overhead class GUID(val value: String) extends AnyVal Still doesn't solve the validation problem Value classes can still get initialized if used in collections, method signatures etc

Approach 3: Using Smart Constructors case class GUID private (value:
String) object GUID { def apply(value: String): Option[GUID] = if (value.startsWith("valid")) Some(new GUID(value)) else None } new GUID("valid") // Does not compile - Good! GUID("valid") // GUID(valid) - Validations work - Great!! GUID("valid").map(_.copy("invalid")) // GUID(invalid) - Copy constructor is vulnerable - Bad!!! By this time, we've already introduced immense complexity to a simple DataType Return type of Option is a bit frustrating to handle for developers

The Narrow-Widen approach String => GUID not all strings are
valid String => Option[GUID] widen the output to accommodate List[Char] => Option[GUID] NonEmptyList[Char] => Option[GUID] slight refinement to the input NonEmptyList[HexDigit] => Option[GUID] further refinining the input ThirtyTwoList[HexDigit] => Option[GUID] => GUID total refinement of the input, plus un-widening of the output When you want to narrow the input, you end up Widening the output, but if you narrow the input enough, you can un-widen the output

Approach #4: Refining the constructor A refinement type is basically
Base Type + Predicate Subtyping of refinements type T + P <: T type T + P <: type T + Q if forall t in T + P, Q(t) is true Eg. A - Set of Int > 10 B - Set of Int > 5 A <: B

type HexDigit = Char Refined LetterOrDigit type ThirtyTwoList[T] = List[T]
Refined Size[32] type GUID = ThirtyTwoList[HexDigit] case class Request(guid: GUID) val r1 = Request("2312k3j123...123dasd") // Compiles val r2 = Request("asdas_asda_21#!@##$@#$...2234") // Does not compile // At runtime refineV[GUID]("asdasd2323...12312asd") // Either[String, GUID] Types erased at compile time leaving you with only primitives At runtime, predicates computed on values as would be the case normally

You can do much more with Refined The library comes
with these predefined predicates Boolean True: constant predicate that is always true False: constant predicate that is always false Not[P]: negation of the predicate P And[A, B]: conjunction of the predicates A and B Or[A, B]: disjunction of the predicates A and B Xor[A, B]: exclusive disjunction of the predicates A and B Nand[A, B]: negated conjunction of the predicates A and B Nor[A, B]: negated disjunction of the predicates A and B AllOf[PS]: conjunction of all predicates in PS AnyOf[PS]: disjunction of all predicates in PS OneOf[PS]: exclusive disjunction of all predicates in PS

Char Digit: checks if a Char is a digit Letter:
checks if a Char is a letter LetterOrDigit: checks if a Char is a letter or digit LowerCase: checks if a Char is a lower case character UpperCase: checks if a Char is an upper case character Whitespace: checks if a Char is white space

Collection Contains[U]: checks if a Traversable contains a value equal
to U Count[PA, PC]: counts the number of elements in a Traversable which satisfy the predicate PA and passes the result to the predicate PC Empty: checks if a Traversable is empty NonEmpty: checks if a Traversable is not empty Forall[P]: checks if the predicate P holds for all elements of a Traversable Exists[P]: checks if the predicate P holds for some elements of a Traversable Head[P]: checks if the predicate P holds for the first element of a Traversable Index[N, P]: checks if the predicate P holds for the element at index N of a sequence Init[P]: checks if the predicate P holds for all but the last element of a Traversable Last[P]: checks if the predicate P holds for the last element of a Traversable Tail[P]: checks if the predicate P holds for all but the first element of a Traversable Size[P]: checks if the size of a Traversable satisfies the predicate P MinSize[N]: checks if the size of a Traversable is greater than or equal to N MaxSize[N]: checks if the size of a Traversable is less than or equal to N

Generic Equal[U]: checks if a value is equal to U

Numeric Less[N]: checks if a numeric value is less than
N LessEqual[N]: checks if a numeric value is less than or equal to N Greater[N]: checks if a numeric value is greater than N GreaterEqual[N]: checks if a numeric value is greater than or equal to N Positive: checks if a numeric value is greater than zero NonPositive: checks if a numeric value is zero or negative Negative: checks if a numeric value is less than zero NonNegative: checks if a numeric value is zero or positive Interval.Open[L, H]: checks if a numeric value is in the interval (L, H) Interval.OpenClosed[L, H]: checks if a numeric value is in the interval (L, H] Interval.ClosedOpen[L, H]: checks if a numeric value is in the interval [L, H) Interval.Closed[L, H]: checks if a numeric value is in the interval [L, H] Modulo[N, O]: checks if an integral value modulo N is O Divisible[N]: checks if an integral value is evenly divisible by N NonDivisible[N]: checks if an integral value is not evenly divisible by N Even: checks if an integral value is evenly divisible by 2 Odd: checks if an integral value is not evenly divisible by 2 NonNaN: checks if a floating-point number is not NaN

String EndsWith[S]: checks if a String ends with the suffix
S IPv4: checks if a String is a valid IPv4 IPv6: checks if a String is a valid IPv6 MatchesRegex[S]: checks if a String matches the regular expression S Regex: checks if a String is a valid regular expression StartsWith[S]: checks if a String starts with the prefix S Uri: checks if a String is a valid URI Url: checks if a String is a valid URL Uuid: checks if a String is a valid UUID ValidByte: checks if a String is a parsable Byte ValidShort: checks if a String is a parsable Short ValidInt: checks if a String is a parsable Int ValidLong: checks if a String is a parsable Long ValidFloat: checks if a String is a parsable Float ValidDouble: checks if a String is a parsable Double ValidBigInt: checks if a String is a parsable BigInt ValidBigDecimal: checks if a String is a parsable BigDecimal Xml: checks if a String is well-formed XML XPath: checks if a String is a valid XPath expression Trimmed: checks if a String has no leading or trailing whitespace HexStringSpec: checks if a String represents a hexadecimal number

Tagging Types Lets take another common example where a specific
type in a domain can be critical def getWeight(): Double def getHeight(): Double def bmi(weight: Double, height: Double): Double // Let's assume SI units as input bmi(getWeight(), getHeight()) // OK bmi(getHeight(), getWeight()) // Still OK, but not ideal behavior

Besides refining, tagging types offer a low-cost alternative to Value
classes type Tagged[U] = { type Tag = U } type @@[T, U] = T with Tagged[U] implicit class TaggedOps[T](val v: T) extends AnyVal { @inline def taggedWith[U]: T @@ U = v.asInstanceOf[T @@ U] @inline def @@[U]: T @@ U = taggedWith[U] }

trait Kilogram trait Metres type Weight = Double @@ Kilogram
type Height = Double @@ Metres def getWeight(): Weight def getHeight(): Height // Units are already encoded in the type now def bmi(weight: Weight, height: Height): Double bmi(getWeight(), getHeight()) // OK bmi(getHeight(), getWeight()) // Compilation error, just like we wanted No runtime cost of type tags Minor overhead of TaggedOps Code remains sane and self-documenting Scala introducing NewType and Opaque and Transparent types to Scala soon enough, where Tagging and Zero-overhead encapsulated constructors would become part of the standard library

Use Case: Loading Configs Why configs? Because we sanitize everything
else like DB, User input, but never config values Dangerous to blindly trust your developers. They are humans and mistakes can be made which can be hard to find and costly Validation of config should be more than value is present , but that value is usable case class Endpoint(host: String, port: Int) // based on what we've learnt so far, let's rewrite that as type Host = MatchesRegex["^[A-Z0-9-_]{1,255}$"] type Port = Interval.Closed[1024, 65535] case class Endpoint(host: String Refined Host, port: Int Refined Port)

Experimental: Adding side-effects to Refinement types case class OpenPort() implicit
val openPortValidate: Validate.Plain[Int, OpenPort] = Validate.fromPartial(new ServerSocket(_).close(), "OpenPort", OpenPort()) type AvailablePort = Int Refined OpenPort

Refined Types for the win Typesafety ++ Uniform validation Compile
time refinement of literal values Runtime safe extraction of values Self-documented code and easier to reason about and debug errors Fewer test cases to write Push primitives to the edge-of-the-system

Limitations of Refinement Types IntelliJ support sucks Avoid Infix notations
String Refined XXX And YYY String Refined And[XXX, YYY] String Refined (XXX And YYY) Refined primitives are always boxed, as is usual with normal Scala Validation errors not clear always Increased compile times Scala support still not very mature (Might change after v3)

References https://github.com/fthomas/refined https://slideslive.com/38908213/strings-are-evil-methods-to-hide-the-use-of-primitive- types http://www.lihaoyi.com/post/StrategicScalaStyleDesigningDatatypes.html https://slideslive.com/38907881/literal-types-what-they-are-good-for https://kwark.github.io/refined-in-practice-bescala/#94 https://meta.plasm.us/slides/circe/tour/#36 https://beyondthelines.net/programming/refined-types/ http://leifwickland.github.io/presentations/configBomb/#67
https://github.com/wjlow/blog/blob/3c27de716b40660801e68561252883fd0428395e /Tests.md https://gigiigig.github.io/posts/2015/09/13/aux-pattern.html https://docs.scala-lang.org/sips/42.type.html

Refine your Scala code

Refine your Scala code

More Decks by Ajay Viswanathan

Other Decks in Programming

Featured

Transcript