Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

49a4815846825cd1334fa080c6e71c5d?s=128

Heather Miller

October 24, 2019
Tweet

Transcript

  1. Scala Implicits are Everywhere A Large-Scale Study of the Use

    of Scala Implicits in the Wild OOPSLA, October 24th 2019, Athens Greece Filip Křikava, Czech Technical University Heather Miller, Carnegie Mellon University Jan Vitek, Northeastern University
  2. Scala Implicits are Everywhere A Large-Scale Study of the Use

    of Scala Implicits in the Wild OOPSLA, October 24th 2019, Athens Greece Heather Miller, Carnegie Mellon University Jan Vitek, Northeastern University Filip Křikava, Czech Technical University
  3. A quick sense of implicits "Just like magic!".enEspanol

  4. A quick sense of implicits This is a normal string.

    Clearly this shouldn’t compile… Surely String does not have a method called enEspanol! "Just like magic!".enEspanol
  5. A quick sense of implicits Actually, this might compile! If

    the compiler is able to find a method to convert a string object to an instance of a class that has the required method (which resolves the type error), then that conversion will be inserted silently by the compiler and, at runtime, the method will be invoked to return a value, perhaps "Como por arte de magia!" This is a normal string. Clearly this shouldn’t compile… Surely String does not have a method called enEspanol! "Just like magic!".enEspanol
  6. “If there's one feature that makes Scala 'Scala', I would

    pick implicits” – M. Odersky, Scala Days Chicago 2017
  7. A group of language features: Implicits? Why do this study?

    GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler
  8. A group of language features: Implicits? Why do this study?

    GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler • Extension methods • Type classes • Contexts • Proofs can be used for:
  9. A group of language features: This paper represents an attempt

    to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler
  10. A group of language features: This paper represents an attempt

    to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler As PL designers, shouldn’t we care more about how humans actually use our designs? PL designers/researchers Software engineers
  11. A group of language features: This paper represents an attempt

    to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler As PL designers, shouldn’t we care more about how humans actually use our designs? PL designers/researchers Software engineers Eureka! A new language/ technique that is more [correct/general/etc] than before! As PL researchers we more often do this:
  12. A group of language features: This paper represents an attempt

    to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler As PL designers, shouldn’t we care more about how humans actually use our designs? PL designers/researchers Software engineers Eureka! A new language/ technique that is more [correct/general/etc] than before! Hrmm… Does [language/technique] actually help people? But why don’t we do this more?:
  13. A group of language features: This paper represents an attempt

    to catalog, for language designers and software engineers, how this feature really is used in the wild. It’s meant as a retrospective on: • the result of introducing this feature into the wild, and • how practicing software engineers tend to use and misuse them This paper is meant to inform language designers of future languages interested in similar features of the good and the bad that people do with them. Implicits? Why do this study? GOAL: Reduce boilerplate by leveraging what compilers know about your code. • implicit parameters • implicit conversions • implicit classes • implicit objects Make it easier to embed DSLs, implement what look like language features outside of the compiler PL designers/researchers Software engineers Eureka! A new language/ technique that is more [correct/general/etc] than before! Hrmm… Does [language/technique] actually help people? But why don’t we do this more?:
  14. Implicits are Everywhere • 7,280 Scala projects (15% GH) •

    18M lines of code • 29M call sites • 370K implicit declarations • 78% of projects define implicits • 8M implicit call sites • 98% of projects use implicits • 27% of call sites involve implicits
  15. How many call sites are there? (1, 2) < (2,

    1)
  16. How many call sites are there? (1, 2) < (2,

    1) 3?
  17. How many call sites are there? (1, 2) < (2,

    1) 3 11
  18. How many call sites are there? orderingToOrdered((1, 2))(Tuple2(ordered(Int), ordered(Int))) (1,

    2) < (2, 1) implicit conversion implicit parameters
  19. How are Scala implicits used in the wild? SO,

  20. How are Scala implicits used in the wild? • Collect

    real-world Scala projects from GitHub, split into: 1. libraries (projects with binaries in Maven Central / Bintray) 2. small apps (< 1000 SLOC) 3. large apps (>= 1000 SLOC) 4. test code (from both libraries and applications) • Find out how implicits are used • Do projects use them / define them? • Where do they come from (Scala standard library, dependencies, project local)? • What patterns are being used? • Is there a compilation overhead? • implicits
  21. How are Scala implicits used in the wild? • Collect

    real-world Scala projects from GitHub, split into: 1. libraries (projects with binaries in Maven Central / Bintray) 2. small apps (< 1000 SLOC) 3. large apps (>= 1000 SLOC) 4. test code (from both libraries and applications) • Find out how implicits are used • Do projects use them / define them? • Where do they come from (Scala standard library, dependencies, project local)? • What patterns are being used? • Is there a compilation overhead? • implicits We should be done in a few weeks…, right?
  22. How many call sites are there? (1, 2) < (2,

    1)
  23. How many call sites are there? (1, 2) < (2,

    1) implicit def Tuple2[T1, T2] (implicit ord1: Ordering[T1], ord2: Ordering[T2]): Ordering[(T1, T2)] orderingToOrdered((1, 2))(Tuple2(Int, Int)) < orderingToOrdered((2, 1))(Tuple2(Int, Int)) implicit def orderingToOrdered[T](x: T)(implicit ord: Ordering[T]): Ordered[T] implicit object Int extends IntOrdering
  24. How many call sites are there? (1, 2) < (2,

    1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch
  25. How many call sites are there? (1, 2) < (2,

    1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch → need to compile
  26. How many call sites are there? (1, 2) < (2,

    1) • Implicits use-sites are not present in syntax, we need semantic data • Implicits are resolved statically at compile time, no dynamic dispatch → need to compile → just need to compile
  27. ScalaMeta1 WE USED • Library to read, analyze, transform and

    generate Scala programs
 • Stores information in SemanticDB ◦ a data model of semantic information extracted from Scala (and Java) code ◦ run as a compiler plugin ◦ contains symbol table ◦ contains synthetics - trees added by compilers that do not appear in the original source 1https://scalameta.org
  28. Analysis Pipeline

  29. Analysis Pipeline Intel Xeon 6140, 2.30GHz with 72 cores and

    256GB of RAM • Project names from GHTorrent1 and Scaladex2 • Get Git and GitHub metadata (e.g., num. of stars, num. of commits, first push, last push) • Guess build system (e.g., SBT, Maven, Gradle, ...) • Count lines of code 1Queriable offline mirror of GitHub -- http://ghtorrent.org/ 2The Scala library index -- https://index.scala-lang.org/
  30. Analysis Pipeline • unsupported projects (non-SBT projects or projects not


    supported by ScalaMeta) • duplicate projects (using Déjàvu1) • uninteresting projects (less than 2 commits, less 2 months active) 1Lopes, Cristina V., Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. "DéjàVu: a map of code duplicates on GitHub.", OOPSLA 2017 37% of the code base was in 102 repositories containing copies of Apache Spark (biggest Scala project, 100K+ SLOC) Threw away over half of the projects, half of the code, yet kept 97% of GitHub stars FILTERED OUT:
  31. Analysis Pipeline • SBT is difficult to parallelize • 4K

    / 11K failed to build • 2K missing dependencies (despite downloading 204K artifacts -- 110GB) • 1K compilation error • 1K broken build or empty build or semanticdb generation failed
 Intel Xeon 6140, 2.30GHz with 72 cores and 256GB of RAM, yet only 11 projects in parallel Source code + generated classes
  32. Analysis Pipeline • Convert semanticdb into our data model •

    Extract implicit declarations, callsites and parameters into CSV files • Analyze results in R • 500 lines of Makefile, 8K of Scala, 5K lines of R code
  33. Extract implicits step EXAMPLE

  34. Final Corpus RESULTS 15% of all publicly available Scala code

    on GitHub (1/2019) sum mean
  35. What we have data on RESULTS • Overall usage •

    Usage of patterns • Compilation overhead • Complexity
  36. Implicit call sites RESULTS How many call sites involving implicits

    are there? 8.1M / 29.6M (27%) call sites involve implicits
  37. Implicit call sites RESULTS Where do the implicit declarations come

    from?
  38. Implicit conversion RESULTS

  39. Patterns: what we’ll cover RESULTS • Late trait implementation •

    Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts
 • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions
  40. Patterns: what we’ll cover RESULTS In the talk • Late

    trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts
 • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions
  41. Patterns: what we’ll cover RESULTS The rest is in the

    paper! • Late trait implementation • Extension methods • Type classes • Extension Syntax Methods • Type proofs • Contexts
 • Implicit conversions anti-patterns • unrelated conversions • bi-directional conversions
  42. Patterns: Typeclasses RESULTS • 11K type classes • 30% of

    the implicit calls ◦ Scala standard library (42%)
 scala.collection, scala.Predef and scala.math ◦ Testing libraries (15%) ◦ FP frameworks ▪ typelevel cats ▪ scalaz 40% of projects use one of them!
  43. Antipatterns RESULTS • Unrelated conversions • a public, top-level definition

    defined outside of either from or to compilation units • 7.9K conversions defined in 1.2K projects (16%) • 1.6K from 552 projects involves Scala primitive types • only 81 in 47 projects convert just between primitive types.
 • Bi-directional conversion • a pair of conversions from S -> T and T -> S • 1.1K conversions defined in 209 projects (2.9%) • used in 1.9K projects (26.5%). • Scala-Java collection conversions used in 728 (10%) projects
  44. Overhead RESULTS case class Student(id: Int, name: String) case class

    Course(name: String, students: List[Student]) val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} Example: JSON serialization in Scala using circe.io1 1JSON library for Scala -- https://circe.github.io/circe/
  45. Overhead RESULTS case class Student(id: Int, name: String) case class

    Course(name: String, students: List[Student]) val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} import io.circe.generic.auto._ import io.circe.syntax._ Example: JSON serialization in Scala using circe.io1 1JSON library for Scala -- https://circe.github.io/circe/
  46. Overhead RESULTS 1Library for generic programming in Scala -- https://github.com/milessabin/shapeless

    • Implicit type class derivation using macros (using Shapeless1 library) • Can cause compilation overhead println(course.asJson) Lots of code gets generated when you use it!
  47. Overhead RESULTS 1Library for generic programming in Scala -- https://github.com/milessabin/shapeless

    • Implicit type class derivation using macros (using Shapeless1 library) • Can cause compilation overhead println(course.asJson) Lots of code gets generated when you use it!
  48. Overhead RESULTS Can cache the generated encoder so as not

    to have to regenerate it again and again! 1JSON library for Scala -- https://circe.github.io/circe/ case class Student(id: Int, name: String) case class Course(name: String, students: List[Student]) implicit val studentEncoder: Encoder[Student] = deriveEncoder implicit val courseEncoder: Encoder[Course] = deriveEncoder val course = Course("FP", List(Student(1, "Alicia Sophía")) println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]} import io.circe.generic.auto._ import io.circe.syntax._
  49. Overhead RESULTS • There are 1,969 (8.4M LOC) using Scala

    2.12.4+ • 488 projects (2.8M LOC) use the shapeless for implicit type- class derivation
  50. Complexity RESULTS • Majority of implicit declarations takes between 0-1

    implicit parameters
  51. Takeaways CONCLUSION Implicits have percolated to almost every nook and

    cranny of the Scala ecosystem. There is hardly any API without them as they enable elegant architectural design. They allow one to remove a lot of boilerplate by leveraging the compiler’s knowledge about the code. However, they can be also easily misused and if taken too far seriously hurt the readability of a code.
  52. Takeaways CONCLUSION Implicits really are everywhere. For language designers: •

    compile-time performance! • coherence = complexity • consider limiting expressivity • avoid relying on names • tool support is key Advice: For library designers: • compile-time performance! • easy to over-engineer libraries • can good defaults be provided? then use implicits • do not use unrelated implicits! • do not use conversions that go both ways! • do not use conversions that might change semantics!
  53. 39 Results: Overhead

  54. 40 Results: Overhead https://github.com/mesosphere/marathon/commit/fbf7f29468bda2ec29b7fbf80b6864f46a825b7a

  55. 41 The Ugly - Understandability https://www.scala-lang.org/api/2.12.10/scala/collection/immutable/List.html

  56. 42 The Ugly - Understandability http://scalapuzzlers.com/#pzzlr-054

  57. 43 The Ugly - Understandability