Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

Slide 1

Slide 1 text

Scala Implicits are Everywhere A Large-Scale Study of the Use of Scala Implicits in the Wild OOPSLA, October 24th 2019, Athens Greece Filip Křikava, Czech Technical University Heather Miller, Carnegie Mellon University Jan Vitek, Northeastern University

Slide 2

Slide 2 text

Scala Implicits are Everywhere A Large-Scale Study of the Use of Scala Implicits in the Wild OOPSLA, October 24th 2019, Athens Greece Heather Miller, Carnegie Mellon University Jan Vitek, Northeastern University Filip Křikava, Czech Technical University

Slide 3

Slide 3 text

A quick sense of implicits "Just like magic!".enEspanol

Slide 4

Slide 4 text

A quick sense of implicits This is a normal string. Clearly this shouldn’t compile… Surely String does not have a method called enEspanol! "Just like magic!".enEspanol

Slide 5

Slide 5 text

A quick sense of implicits Actually, this might compile! If the compiler is able to ﬁnd a method to convert a string object to an instance of a class that has the required method (which resolves the type error), then that conversion will be inserted silently by the compiler and, at runtime, the method will be invoked to return a value, perhaps "Como por arte de magia!" This is a normal string. Clearly this shouldn’t compile… Surely String does not have a method called enEspanol! "Just like magic!".enEspanol

Slide 6

Slide 6 text

“If there's one feature that makes Scala 'Scala', I would pick implicits” – M. Odersky, Scala Days Chicago 2017

Slide 7

Slide 7 text

A group of language features: Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler

Slide 8

Slide 8 text

Slide 9

Slide 9 text

A group of language features: This paper represents an attempt to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Implicits are Everywhere • 7,280 Scala projects (15% GH) • 18M lines of code • 29M call sites • 370K implicit declarations • 78% of projects define implicits • 8M implicit call sites • 98% of projects use implicits • 27% of call sites involve implicits

Slide 15

Slide 15 text

How many call sites are there? (1, 2) < (2, 1)

Slide 16

Slide 16 text

How many call sites are there? (1, 2) < (2, 1) 3?

Slide 17

Slide 17 text

How many call sites are there? (1, 2) < (2, 1) 3 11

Slide 18

Slide 18 text

How many call sites are there? orderingToOrdered((1, 2))(Tuple2(ordered(Int), ordered(Int))) (1, 2) < (2, 1) implicit conversion implicit parameters

Slide 19

Slide 19 text

How are Scala implicits used in the wild? SO,

Slide 20

Slide 20 text

How are Scala implicits used in the wild? • Collect real-world Scala projects from GitHub, split into: 1. libraries (projects with binaries in Maven Central / Bintray) 2. small apps (< 1000 SLOC) 3. large apps (>= 1000 SLOC) 4. test code (from both libraries and applications) • Find out how implicits are used • Do projects use them / deﬁne them? • Where do they come from (Scala standard library, dependencies, project local)? • What patterns are being used? • Is there a compilation overhead? • implicits

Slide 21

Slide 21 text

Slide 22

Slide 22 text

How many call sites are there? (1, 2) < (2, 1)

Slide 23

Slide 23 text

How many call sites are there? (1, 2) < (2, 1) implicit def Tuple2[T1, T2] (implicit ord1: Ordering[T1], ord2: Ordering[T2]): Ordering[(T1, T2)] orderingToOrdered((1, 2))(Tuple2(Int, Int)) < orderingToOrdered((2, 1))(Tuple2(Int, Int)) implicit def orderingToOrdered[T](x: T)(implicit ord: Ordering[T]): Ordered[T] implicit object Int extends IntOrdering

Slide 24

Slide 24 text

How many call sites are there? (1, 2) < (2, 1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch

Slide 25

Slide 25 text

How many call sites are there? (1, 2) < (2, 1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch → need to compile

Slide 26

Slide 26 text

Slide 27

Slide 27 text

ScalaMeta1 WE USED • Library to read, analyze, transform and generate Scala programs  • Stores information in SemanticDB ◦ a data model of semantic information extracted from Scala (and Java) code ◦ run as a compiler plugin ◦ contains symbol table ◦ contains synthetics - trees added by compilers that do not appear in the original source 1https://scalameta.org

Slide 28

Slide 28 text

Analysis Pipeline

Slide 29

Slide 29 text

Analysis Pipeline Intel Xeon 6140, 2.30GHz with 72 cores and 256GB of RAM • Project names from GHTorrent1 and Scaladex2 • Get Git and GitHub metadata (e.g., num. of stars, num. of commits, first push, last push) • Guess build system (e.g., SBT, Maven, Gradle, ...) • Count lines of code 1Queriable ofﬂine mirror of GitHub -- http://ghtorrent.org/ 2The Scala library index -- https://index.scala-lang.org/

Slide 30

Slide 30 text

Analysis Pipeline • unsupported projects (non-SBT projects or projects not  supported by ScalaMeta) • duplicate projects (using Déjàvu1) • uninteresting projects (less than 2 commits, less 2 months active) 1Lopes, Cristina V., Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. "DéjàVu: a map of code duplicates on GitHub.", OOPSLA 2017 37% of the code base was in 102 repositories containing copies of Apache Spark (biggest Scala project, 100K+ SLOC) Threw away over half of the projects, half of the code, yet kept 97% of GitHub stars FILTERED OUT:

Slide 31

Slide 31 text

Analysis Pipeline • SBT is difficult to parallelize • 4K / 11K failed to build • 2K missing dependencies (despite downloading 204K artifacts -- 110GB) • 1K compilation error • 1K broken build or empty build or semanticdb generation failed  Intel Xeon 6140, 2.30GHz with 72 cores and 256GB of RAM, yet only 11 projects in parallel Source code + generated classes

Slide 32

Slide 32 text

Analysis Pipeline • Convert semanticdb into our data model • Extract implicit declarations, callsites and parameters into CSV files • Analyze results in R • 500 lines of Makefile, 8K of Scala, 5K lines of R code

Slide 33

Slide 33 text

Extract implicits step EXAMPLE

Slide 34

Slide 34 text

Final Corpus RESULTS 15% of all publicly available Scala code on GitHub (1/2019) sum mean

Slide 35

Slide 35 text

What we have data on RESULTS • Overall usage • Usage of patterns • Compilation overhead • Complexity

Slide 36

Slide 36 text

Implicit call sites RESULTS How many call sites involving implicits are there? 8.1M / 29.6M (27%) call sites involve implicits

Slide 37

Slide 37 text

Implicit call sites RESULTS Where do the implicit declarations come from?

Slide 38

Slide 38 text

Implicit conversion RESULTS

Slide 39

Slide 39 text

Patterns: what we’ll cover RESULTS • Late trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts  • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions

Slide 40

Slide 40 text

Patterns: what we’ll cover RESULTS In the talk • Late trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts  • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions

Slide 41

Slide 41 text

Patterns: what we’ll cover RESULTS The rest is in the paper! • Late trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts  • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions

Slide 42

Slide 42 text

Patterns: Typeclasses RESULTS • 11K type classes • 30% of the implicit calls ◦ Scala standard library (42%)  scala.collection, scala.Predef and scala.math ◦ Testing libraries (15%) ◦ FP frameworks ▪ typelevel cats ▪ scalaz 40% of projects use one of them!

Slide 43

Slide 43 text

Antipatterns RESULTS • Unrelated conversions • a public, top-level definition defined outside of either from or to compilation units • 7.9K conversions defined in 1.2K projects (16%) • 1.6K from 552 projects involves Scala primitive types • only 81 in 47 projects convert just between primitive types.  • Bi-directional conversion • a pair of conversions from S -> T and T -> S • 1.1K conversions defined in 209 projects (2.9%) • used in 1.9K projects (26.5%). • Scala-Java collection conversions used in 728 (10%) projects

Slide 44

Slide 44 text

Overhead RESULTS case class Student(id: Int, name: String) case class Course(name: String, students: List[Student]) val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} Example: JSON serialization in Scala using circe.io1 1JSON library for Scala -- https://circe.github.io/circe/

Slide 45

Slide 45 text

Overhead RESULTS case class Student(id: Int, name: String) case class Course(name: String, students: List[Student]) val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} import io.circe.generic.auto._ import io.circe.syntax._ Example: JSON serialization in Scala using circe.io1 1JSON library for Scala -- https://circe.github.io/circe/

Slide 46

Slide 46 text

Overhead RESULTS 1Library for generic programming in Scala -- https://github.com/milessabin/shapeless • Implicit type class derivation using macros (using Shapeless1 library) • Can cause compilation overhead println(course.asJson) Lots of code gets generated when you use it!

Slide 47

Slide 47 text

Slide 48

Slide 48 text

Overhead RESULTS Can cache the generated encoder so as not to have to regenerate it again and again! 1JSON library for Scala -- https://circe.github.io/circe/ case class Student(id: Int, name: String) case class Course(name: String, students: List[Student]) implicit val studentEncoder: Encoder[Student] = deriveEncoder implicit val courseEncoder: Encoder[Course] = deriveEncoder val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} import io.circe.generic.auto._ import io.circe.syntax._

Slide 49

Slide 49 text

Overhead RESULTS • There are 1,969 (8.4M LOC) using Scala 2.12.4+ • 488 projects (2.8M LOC) use the shapeless for implicit type- class derivation

Slide 50

Slide 50 text

Complexity RESULTS • Majority of implicit declarations takes between 0-1 implicit parameters

Slide 51

Slide 51 text

Takeaways CONCLUSION Implicits have percolated to almost every nook and cranny of the Scala ecosystem. There is hardly any API without them as they enable elegant architectural design. They allow one to remove a lot of boilerplate by leveraging the compiler’s knowledge about the code. However, they can be also easily misused and if taken too far seriously hurt the readability of a code.

Slide 52

Slide 52 text

Takeaways CONCLUSION Implicits really are everywhere. For language designers: • compile-time performance! • coherence = complexity • consider limiting expressivity • avoid relying on names • tool support is key Advice: For library designers: • compile-time performance! • easy to over-engineer libraries • can good defaults be provided? then use implicits • do not use unrelated implicits! • do not use conversions that go both ways! • do not use conversions that might change semantics!

Slide 53

Slide 53 text

39 Results: Overhead

Slide 54

Slide 54 text

40 Results: Overhead https://github.com/mesosphere/marathon/commit/fbf7f29468bda2ec29b7fbf80b6864f46a825b7a

Slide 55

Slide 55 text

41 The Ugly - Understandability https://www.scala-lang.org/api/2.12.10/scala/collection/immutable/List.html