Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

Scala Implicits are Everywhere A Large-Scale Study of the Use
of Scala Implicits in the Wild OOPSLA, October 24th 2019, Athens Greece Filip Křikava, Czech Technical University Heather Miller, Carnegie Mellon University Jan Vitek, Northeastern University

Scala Implicits are Everywhere A Large-Scale Study of the Use
of Scala Implicits in the Wild OOPSLA, October 24th 2019, Athens Greece Heather Miller, Carnegie Mellon University Jan Vitek, Northeastern University Filip Křikava, Czech Technical University

A quick sense of implicits "Just like magic!".enEspanol

A quick sense of implicits This is a normal string.
Clearly this shouldn’t compile… Surely String does not have a method called enEspanol! "Just like magic!".enEspanol

A quick sense of implicits Actually, this might compile! If
the compiler is able to ﬁnd a method to convert a string object to an instance of a class that has the required method (which resolves the type error), then that conversion will be inserted silently by the compiler and, at runtime, the method will be invoked to return a value, perhaps "Como por arte de magia!" This is a normal string. Clearly this shouldn’t compile… Surely String does not have a method called enEspanol! "Just like magic!".enEspanol

“If there's one feature that makes Scala 'Scala', I would
pick implicits” – M. Odersky, Scala Days Chicago 2017

A group of language features: Implicits? Why do this study?
GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler

A group of language features: Implicits? Why do this study?
GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler • Extension methods • Type classes • Contexts • Proofs can be used for:

A group of language features: This paper represents an attempt
to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler

to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler As PL designers, shouldn’t we care more about how humans actually use our designs? PL designers/researchers Software engineers

to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler As PL designers, shouldn’t we care more about how humans actually use our designs? PL designers/researchers Software engineers Eureka! A new language/ technique that is more [correct/general/etc] than before! As PL researchers we more often do this:

to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler As PL designers, shouldn’t we care more about how humans actually use our designs? PL designers/researchers Software engineers Eureka! A new language/ technique that is more [correct/general/etc] than before! Hrmm… Does [language/technique] actually help people? But why don’t we do this more?:

to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler PL designers/researchers Software engineers Eureka! A new language/ technique that is more [correct/general/etc] than before! Hrmm… Does [language/technique] actually help people? But why don’t we do this more?:

Implicits are Everywhere • 7,280 Scala projects (15% GH) •
18M lines of code • 29M call sites • 370K implicit declarations • 78% of projects define implicits • 8M implicit call sites • 98% of projects use implicits • 27% of call sites involve implicits

How many call sites are there? (1, 2) < (2,
1)

1) 3?

1) 3 11

How many call sites are there? orderingToOrdered((1, 2))(Tuple2(ordered(Int), ordered(Int))) (1,
2) < (2, 1) implicit conversion implicit parameters

How are Scala implicits used in the wild? SO,

How are Scala implicits used in the wild? • Collect
real-world Scala projects from GitHub, split into: 1. libraries (projects with binaries in Maven Central / Bintray) 2. small apps (< 1000 SLOC) 3. large apps (>= 1000 SLOC) 4. test code (from both libraries and applications) • Find out how implicits are used • Do projects use them / deﬁne them? • Where do they come from (Scala standard library, dependencies, project local)? • What patterns are being used? • Is there a compilation overhead? • implicits

How are Scala implicits used in the wild? • Collect
real-world Scala projects from GitHub, split into: 1. libraries (projects with binaries in Maven Central / Bintray) 2. small apps (< 1000 SLOC) 3. large apps (>= 1000 SLOC) 4. test code (from both libraries and applications) • Find out how implicits are used • Do projects use them / deﬁne them? • Where do they come from (Scala standard library, dependencies, project local)? • What patterns are being used? • Is there a compilation overhead? • implicits We should be done in a few weeks…, right?

1)

1) implicit def Tuple2[T1, T2] (implicit ord1: Ordering[T1], ord2: Ordering[T2]): Ordering[(T1, T2)] orderingToOrdered((1, 2))(Tuple2(Int, Int)) < orderingToOrdered((2, 1))(Tuple2(Int, Int)) implicit def orderingToOrdered[T](x: T)(implicit ord: Ordering[T]): Ordered[T] implicit object Int extends IntOrdering

1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch

1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch → need to compile

1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch → need to compile → just need to compile

ScalaMeta1 WE USED • Library to read, analyze, transform and
generate Scala programs  • Stores information in SemanticDB ◦ a data model of semantic information extracted from Scala (and Java) code ◦ run as a compiler plugin ◦ contains symbol table ◦ contains synthetics - trees added by compilers that do not appear in the original source 1https://scalameta.org

Analysis Pipeline

Analysis Pipeline Intel Xeon 6140, 2.30GHz with 72 cores and
256GB of RAM • Project names from GHTorrent1 and Scaladex2 • Get Git and GitHub metadata (e.g., num. of stars, num. of commits, first push, last push) • Guess build system (e.g., SBT, Maven, Gradle, ...) • Count lines of code 1Queriable ofﬂine mirror of GitHub -- http://ghtorrent.org/ 2The Scala library index -- https://index.scala-lang.org/

Analysis Pipeline • unsupported projects (non-SBT projects or projects not 
supported by ScalaMeta) • duplicate projects (using Déjàvu1) • uninteresting projects (less than 2 commits, less 2 months active) 1Lopes, Cristina V., Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. "DéjàVu: a map of code duplicates on GitHub.", OOPSLA 2017 37% of the code base was in 102 repositories containing copies of Apache Spark (biggest Scala project, 100K+ SLOC) Threw away over half of the projects, half of the code, yet kept 97% of GitHub stars FILTERED OUT:

Analysis Pipeline • SBT is difficult to parallelize • 4K
/ 11K failed to build • 2K missing dependencies (despite downloading 204K artifacts -- 110GB) • 1K compilation error • 1K broken build or empty build or semanticdb generation failed  Intel Xeon 6140, 2.30GHz with 72 cores and 256GB of RAM, yet only 11 projects in parallel Source code + generated classes

Analysis Pipeline • Convert semanticdb into our data model •
Extract implicit declarations, callsites and parameters into CSV files • Analyze results in R • 500 lines of Makefile, 8K of Scala, 5K lines of R code

Extract implicits step EXAMPLE

Final Corpus RESULTS 15% of all publicly available Scala code
on GitHub (1/2019) sum mean

What we have data on RESULTS • Overall usage •
Usage of patterns • Compilation overhead • Complexity

Implicit call sites RESULTS How many call sites involving implicits
are there? 8.1M / 29.6M (27%) call sites involve implicits

Implicit call sites RESULTS Where do the implicit declarations come
from?

Implicit conversion RESULTS

Patterns: what we’ll cover RESULTS • Late trait implementation •
Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts  • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions

Patterns: what we’ll cover RESULTS In the talk • Late
trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts  • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions

Patterns: what we’ll cover RESULTS The rest is in the
paper! • Late trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts  • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions

Patterns: Typeclasses RESULTS • 11K type classes • 30% of
the implicit calls ◦ Scala standard library (42%)  scala.collection, scala.Predef and scala.math ◦ Testing libraries (15%) ◦ FP frameworks ▪ typelevel cats ▪ scalaz 40% of projects use one of them!

Antipatterns RESULTS • Unrelated conversions • a public, top-level definition
defined outside of either from or to compilation units • 7.9K conversions defined in 1.2K projects (16%) • 1.6K from 552 projects involves Scala primitive types • only 81 in 47 projects convert just between primitive types.  • Bi-directional conversion • a pair of conversions from S -> T and T -> S • 1.1K conversions defined in 209 projects (2.9%) • used in 1.9K projects (26.5%). • Scala-Java collection conversions used in 728 (10%) projects

Overhead RESULTS case class Student(id: Int, name: String) case class
Course(name: String, students: List[Student]) val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} Example: JSON serialization in Scala using circe.io1 1JSON library for Scala -- https://circe.github.io/circe/

Overhead RESULTS case class Student(id: Int, name: String) case class
Course(name: String, students: List[Student]) val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} import io.circe.generic.auto._ import io.circe.syntax._ Example: JSON serialization in Scala using circe.io1 1JSON library for Scala -- https://circe.github.io/circe/

Overhead RESULTS 1Library for generic programming in Scala -- https://github.com/milessabin/shapeless
• Implicit type class derivation using macros (using Shapeless1 library) • Can cause compilation overhead println(course.asJson) Lots of code gets generated when you use it!

Overhead RESULTS Can cache the generated encoder so as not
to have to regenerate it again and again! 1JSON library for Scala -- https://circe.github.io/circe/ case class Student(id: Int, name: String) case class Course(name: String, students: List[Student]) implicit val studentEncoder: Encoder[Student] = deriveEncoder implicit val courseEncoder: Encoder[Course] = deriveEncoder val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} import io.circe.generic.auto._ import io.circe.syntax._

Overhead RESULTS • There are 1,969 (8.4M LOC) using Scala
2.12.4+ • 488 projects (2.8M LOC) use the shapeless for implicit type- class derivation

Complexity RESULTS • Majority of implicit declarations takes between 0-1
implicit parameters

Takeaways CONCLUSION Implicits have percolated to almost every nook and
cranny of the Scala ecosystem. There is hardly any API without them as they enable elegant architectural design. They allow one to remove a lot of boilerplate by leveraging the compiler’s knowledge about the code. However, they can be also easily misused and if taken too far seriously hurt the readability of a code.

Takeaways CONCLUSION Implicits really are everywhere. For language designers: •
compile-time performance! • coherence = complexity • consider limiting expressivity • avoid relying on names • tool support is key Advice: For library designers: • compile-time performance! • easy to over-engineer libraries • can good defaults be provided? then use implicits • do not use unrelated implicits! • do not use conversions that go both ways! • do not use conversions that might change semantics!

39 Results: Overhead

40 Results: Overhead https://github.com/mesosphere/marathon/commit/fbf7f29468bda2ec29b7fbf80b6864f46a825b7a

41 The Ugly - Understandability https://www.scala-lang.org/api/2.12.10/scala/collection/immutable/List.html

42 The Ugly - Understandability http://scalapuzzlers.com/#pzzlr-054

43 The Ugly - Understandability

Scala Implicits Are Everywhere: A Large-Scale S...

Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

More Decks by Heather Miller

Other Decks in Programming

Featured

Transcript