Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild

The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

Heather Miller

October 24, 2019
Tweet

More Decks by Heather Miller

Other Decks in Programming

Transcript

  1. Scala Implicits are Everywhere
    A Large-Scale Study of the Use of Scala Implicits in the Wild
    OOPSLA, October 24th 2019, Athens Greece
    Filip Křikava, Czech Technical University
    Heather Miller, Carnegie Mellon University
    Jan Vitek, Northeastern University

    View full-size slide

  2. Scala Implicits are Everywhere
    A Large-Scale Study of the Use of Scala Implicits in the Wild
    OOPSLA, October 24th 2019, Athens Greece
    Heather Miller, Carnegie Mellon University
    Jan Vitek, Northeastern University
    Filip Křikava, Czech Technical University

    View full-size slide

  3. A quick sense of implicits
    "Just like magic!".enEspanol

    View full-size slide

  4. A quick sense of implicits
    This is a normal string. Clearly this shouldn’t compile…
    Surely String does not have a method called enEspanol!
    "Just like magic!".enEspanol

    View full-size slide

  5. A quick sense of implicits
    Actually, this might compile!
    If the compiler is able to find a method to convert a string object to
    an instance of a class that has the required method (which resolves
    the type error), then that conversion will be inserted silently by the
    compiler and, at runtime, the method will be invoked to return a
    value, perhaps "Como por arte de magia!"
    This is a normal string. Clearly this shouldn’t compile…
    Surely String does not have a method called enEspanol!
    "Just like magic!".enEspanol

    View full-size slide

  6. “If there's one feature that makes Scala
    'Scala', I would pick implicits”
    – M. Odersky, Scala Days Chicago 2017

    View full-size slide

  7. A group of language features:
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler

    View full-size slide

  8. A group of language features:
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler
    • Extension methods
    • Type classes
    • Contexts
    • Proofs
    can be
    used for:

    View full-size slide

  9. A group of language features:
    This paper represents an attempt to
    catalog, for language designers and
    software engineers, how this feature
    really is used in the wild.
    It’s meant as a retrospective on:
    • the result of introducing this
    feature into the wild, and
    • how practicing software engineers
    tend to use and misuse them
    This paper is meant to inform
    language designers of future
    languages interested in similar
    features of the good and the bad that
    people do with them.
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler

    View full-size slide

  10. A group of language features:
    This paper represents an attempt to
    catalog, for language designers and
    software engineers, how this feature
    really is used in the wild.
    It’s meant as a retrospective on:
    • the result of introducing this
    feature into the wild, and
    • how practicing software engineers
    tend to use and misuse them
    This paper is meant to inform
    language designers of future
    languages interested in similar
    features of the good and the bad that
    people do with them.
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler
    As PL designers, shouldn’t we
    care more about how humans
    actually use our designs?
    PL designers/researchers
    Software engineers

    View full-size slide

  11. A group of language features:
    This paper represents an attempt to
    catalog, for language designers and
    software engineers, how this feature
    really is used in the wild.
    It’s meant as a retrospective on:
    • the result of introducing this
    feature into the wild, and
    • how practicing software engineers
    tend to use and misuse them
    This paper is meant to inform
    language designers of future
    languages interested in similar
    features of the good and the bad that
    people do with them.
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler
    As PL designers, shouldn’t we
    care more about how humans
    actually use our designs?
    PL designers/researchers
    Software engineers
    Eureka!
    A new language/
    technique that is more
    [correct/general/etc]
    than before!
    As PL researchers we more often do this:

    View full-size slide

  12. A group of language features:
    This paper represents an attempt to
    catalog, for language designers and
    software engineers, how this feature
    really is used in the wild.
    It’s meant as a retrospective on:
    • the result of introducing this
    feature into the wild, and
    • how practicing software engineers
    tend to use and misuse them
    This paper is meant to inform
    language designers of future
    languages interested in similar
    features of the good and the bad that
    people do with them.
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler
    As PL designers, shouldn’t we
    care more about how humans
    actually use our designs?
    PL designers/researchers
    Software engineers
    Eureka!
    A new language/
    technique that is more
    [correct/general/etc]
    than before!
    Hrmm… Does
    [language/technique]
    actually help people?
    But why don’t we do this more?:

    View full-size slide

  13. A group of language features:
    This paper represents an attempt to
    catalog, for language designers and
    software engineers, how this feature
    really is used in the wild.
    It’s meant as a retrospective on:
    • the result of introducing this
    feature into the wild, and
    • how practicing software engineers
    tend to use and misuse them
    This paper is meant to inform
    language designers of future
    languages interested in similar
    features of the good and the bad that
    people do with them.
    Implicits? Why do this study?
    GOAL:
    Reduce boilerplate by leveraging
    what compilers know about your
    code.
    • implicit parameters
    • implicit conversions
    • implicit classes
    • implicit objects
    Make it easier to embed DSLs,
    implement what look like language
    features outside of the compiler
    PL designers/researchers
    Software engineers
    Eureka!
    A new language/
    technique that is more
    [correct/general/etc]
    than before!
    Hrmm… Does
    [language/technique]
    actually help people?
    But why don’t we do this more?:

    View full-size slide

  14. Implicits are Everywhere
    • 7,280 Scala projects (15% GH)
    • 18M lines of code
    • 29M call sites
    • 370K implicit declarations
    • 78% of projects define
    implicits
    • 8M implicit call sites
    • 98% of projects use implicits
    • 27% of call sites involve
    implicits

    View full-size slide

  15. How many call sites are there?
    (1, 2) < (2, 1)

    View full-size slide

  16. How many call sites are there?
    (1, 2) < (2, 1)
    3?

    View full-size slide

  17. How many call sites are there?
    (1, 2) < (2, 1)
    3
    11

    View full-size slide

  18. How many call sites are there?
    orderingToOrdered((1, 2))(Tuple2(ordered(Int), ordered(Int)))
    (1, 2) < (2, 1)
    implicit
    conversion
    implicit
    parameters

    View full-size slide

  19. How are Scala
    implicits used
    in the wild?
    SO,

    View full-size slide

  20. How are Scala implicits used in the wild?
    • Collect real-world Scala projects from GitHub, split into:
    1. libraries (projects with binaries in Maven Central / Bintray)
    2. small apps (< 1000 SLOC)
    3. large apps (>= 1000 SLOC)
    4. test code (from both libraries and applications)
    • Find out how implicits are used
    • Do projects use them / define them?
    • Where do they come from (Scala standard library, dependencies,
    project local)?
    • What patterns are being used?
    • Is there a compilation overhead?
    • implicits

    View full-size slide

  21. How are Scala implicits used in the wild?
    • Collect real-world Scala projects from GitHub, split into:
    1. libraries (projects with binaries in Maven Central / Bintray)
    2. small apps (< 1000 SLOC)
    3. large apps (>= 1000 SLOC)
    4. test code (from both libraries and applications)
    • Find out how implicits are used
    • Do projects use them / define them?
    • Where do they come from (Scala standard library, dependencies,
    project local)?
    • What patterns are being used?
    • Is there a compilation overhead?
    • implicits
    We should be done in a few weeks…, right?

    View full-size slide

  22. How many call sites are there?
    (1, 2) < (2, 1)

    View full-size slide

  23. How many call sites are there?
    (1, 2) < (2, 1)
    implicit def Tuple2[T1, T2]
    (implicit ord1: Ordering[T1], ord2: Ordering[T2]):
    Ordering[(T1, T2)]
    orderingToOrdered((1, 2))(Tuple2(Int, Int)) < orderingToOrdered((2, 1))(Tuple2(Int, Int))
    implicit def orderingToOrdered[T](x: T)(implicit ord: Ordering[T]): Ordered[T]
    implicit object Int
    extends IntOrdering

    View full-size slide

  24. How many call sites are there?
    (1, 2) < (2, 1)
    • Implicits use-sites are not present in syntax, we need
    semantic data
    • Implicits are resolved statically at compile time, no
    dynamic dispatch

    View full-size slide

  25. How many call sites are there?
    (1, 2) < (2, 1)
    • Implicits use-sites are not present in syntax, we need
    semantic data
    • Implicits are resolved statically at compile time, no
    dynamic dispatch
    → need to compile

    View full-size slide

  26. How many call sites are there?
    (1, 2) < (2, 1)
    • Implicits use-sites are not present in syntax, we need
    semantic data
    • Implicits are resolved statically at compile time, no
    dynamic dispatch
    → need to compile
    → just need to compile

    View full-size slide

  27. ScalaMeta1
    WE USED
    • Library to read, analyze, transform and generate
    Scala programs

    • Stores information in SemanticDB
    ◦ a data model of semantic information extracted from
    Scala (and Java) code
    ◦ run as a compiler plugin
    ◦ contains symbol table
    ◦ contains synthetics - trees added by compilers that do
    not appear in the original source
    1https://scalameta.org

    View full-size slide

  28. Analysis Pipeline

    View full-size slide

  29. Analysis Pipeline
    Intel Xeon 6140, 2.30GHz with 72
    cores and 256GB of RAM
    • Project names from GHTorrent1 and Scaladex2
    • Get Git and GitHub metadata (e.g., num. of
    stars, num. of commits, first push, last push)
    • Guess build system (e.g., SBT, Maven,
    Gradle, ...)
    • Count lines of code
    1Queriable offline mirror of GitHub -- http://ghtorrent.org/
    2The Scala library index -- https://index.scala-lang.org/

    View full-size slide

  30. Analysis Pipeline
    • unsupported projects (non-SBT projects or projects not

    supported by ScalaMeta)
    • duplicate projects (using Déjàvu1)
    • uninteresting projects (less than 2 commits, less 2 months active)
    1Lopes, Cristina V., Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. "DéjàVu: a map of code duplicates on GitHub.", OOPSLA 2017
    37% of the code base was
    in 102 repositories
    containing copies of
    Apache Spark (biggest
    Scala project, 100K+ SLOC)
    Threw away over half of
    the projects, half of the
    code, yet kept 97% of
    GitHub stars
    FILTERED OUT:

    View full-size slide

  31. Analysis Pipeline
    • SBT is difficult to parallelize
    • 4K / 11K failed to build
    • 2K missing dependencies (despite downloading 204K artifacts -- 110GB)
    • 1K compilation error
    • 1K broken build or empty build or semanticdb generation failed

    Intel Xeon
    6140, 2.30GHz
    with 72 cores
    and 256GB of
    RAM, yet only
    11 projects in
    parallel
    Source code +
    generated
    classes

    View full-size slide

  32. Analysis Pipeline
    • Convert semanticdb into our data model
    • Extract implicit declarations, callsites and parameters into CSV files
    • Analyze results in R
    • 500 lines of Makefile, 8K of Scala, 5K lines of R code

    View full-size slide

  33. Extract implicits step
    EXAMPLE

    View full-size slide

  34. Final Corpus
    RESULTS
    15% of all publicly available Scala code on GitHub (1/2019)
    sum mean

    View full-size slide

  35. What we have data on
    RESULTS
    • Overall usage
    • Usage of patterns
    • Compilation overhead
    • Complexity

    View full-size slide

  36. Implicit call sites
    RESULTS
    How many call sites involving implicits are there?
    8.1M / 29.6M (27%) call sites involve implicits

    View full-size slide

  37. Implicit call sites
    RESULTS
    Where do the implicit
    declarations come
    from?

    View full-size slide

  38. Implicit conversion
    RESULTS

    View full-size slide

  39. Patterns: what we’ll cover
    RESULTS
    • Late trait implementation
    • Extension methods
    • Type classes
    • Extension Syntax Methods
    • Type proofs
    • Contexts

    • Implicit conversions anti-patterns
    • unrelated conversions
    • bi-directional conversions

    View full-size slide

  40. Patterns: what we’ll cover
    RESULTS
    In the talk
    • Late trait implementation
    • Extension methods
    • Type classes
    • Extension Syntax Methods
    • Type proofs
    • Contexts

    • Implicit conversions anti-patterns
    • unrelated conversions
    • bi-directional conversions

    View full-size slide

  41. Patterns: what we’ll cover
    RESULTS
    The rest is in
    the paper!
    • Late trait implementation
    • Extension methods
    • Type classes
    • Extension Syntax Methods
    • Type proofs
    • Contexts

    • Implicit conversions anti-patterns
    • unrelated conversions
    • bi-directional conversions

    View full-size slide

  42. Patterns: Typeclasses
    RESULTS
    • 11K type classes
    • 30% of the implicit calls
    ◦ Scala standard library (42%)

    scala.collection, scala.Predef and scala.math
    ◦ Testing libraries (15%)
    ◦ FP frameworks
    ▪ typelevel cats
    ▪ scalaz
    40% of projects use
    one of them!

    View full-size slide

  43. Antipatterns
    RESULTS
    • Unrelated conversions
    • a public, top-level definition defined outside of either from
    or to compilation units
    • 7.9K conversions defined in 1.2K projects (16%)
    • 1.6K from 552 projects involves Scala primitive types
    • only 81 in 47 projects convert just between primitive types.

    • Bi-directional conversion
    • a pair of conversions from S -> T and T -> S
    • 1.1K conversions defined in 209 projects (2.9%)
    • used in 1.9K projects (26.5%).
    • Scala-Java collection conversions used in 728 (10%) projects

    View full-size slide

  44. Overhead
    RESULTS
    case class Student(id: Int, name: String)
    case class Course(name: String, students: List[Student])
    val course = Course("FP", List(Student(1, "Alicia Sophía"))
    println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]}
    Example: JSON serialization in Scala using circe.io1
    1JSON library for Scala -- https://circe.github.io/circe/

    View full-size slide

  45. Overhead
    RESULTS
    case class Student(id: Int, name: String)
    case class Course(name: String, students: List[Student])
    val course = Course("FP", List(Student(1, "Alicia Sophía"))
    println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]}
    import io.circe.generic.auto._
    import io.circe.syntax._
    Example: JSON serialization in Scala using circe.io1
    1JSON library for Scala -- https://circe.github.io/circe/

    View full-size slide

  46. Overhead
    RESULTS
    1Library for generic programming in Scala -- https://github.com/milessabin/shapeless
    • Implicit type class derivation using macros (using Shapeless1 library)
    • Can cause compilation overhead
    println(course.asJson)
    Lots of code gets generated when you use it!

    View full-size slide

  47. Overhead
    RESULTS
    1Library for generic programming in Scala -- https://github.com/milessabin/shapeless
    • Implicit type class derivation using macros (using Shapeless1 library)
    • Can cause compilation overhead
    println(course.asJson)
    Lots of code gets generated when you use it!

    View full-size slide

  48. Overhead
    RESULTS
    Can cache the generated encoder so as not to have to regenerate it again and again!
    1JSON library for Scala -- https://circe.github.io/circe/
    case class Student(id: Int, name: String)
    case class Course(name: String, students: List[Student])
    implicit val studentEncoder: Encoder[Student] = deriveEncoder
    implicit val courseEncoder: Encoder[Course] = deriveEncoder
    val course = Course("FP", List(Student(1, "Alicia Sophía"))
    println(course.asJson) // {"name": "FP", "students": [{"id": 1, "name": "Alicia Sophía"}]}
    import io.circe.generic.auto._
    import io.circe.syntax._

    View full-size slide

  49. Overhead
    RESULTS
    • There are 1,969
    (8.4M LOC) using
    Scala 2.12.4+
    • 488 projects
    (2.8M LOC) use
    the shapeless for
    implicit type-
    class derivation

    View full-size slide

  50. Complexity
    RESULTS
    • Majority of
    implicit
    declarations
    takes between
    0-1 implicit
    parameters

    View full-size slide

  51. Takeaways
    CONCLUSION
    Implicits have percolated to almost every nook and cranny of the
    Scala ecosystem.
    There is hardly any API without them as they enable elegant
    architectural design.
    They allow one to remove a lot of boilerplate by leveraging the
    compiler’s knowledge about the code.
    However, they can be also easily misused and if taken too far
    seriously hurt the readability of a code.

    View full-size slide

  52. Takeaways
    CONCLUSION
    Implicits really are everywhere.
    For language designers:
    • compile-time performance!
    • coherence = complexity
    • consider limiting expressivity
    • avoid relying on names
    • tool support is key
    Advice: For library designers:
    • compile-time performance!
    • easy to over-engineer libraries
    • can good defaults be provided?
    then use implicits
    • do not use unrelated implicits!
    • do not use conversions that go
    both ways!
    • do not use conversions that
    might change semantics!

    View full-size slide

  53. 39
    Results: Overhead

    View full-size slide

  54. 40
    Results: Overhead
    https://github.com/mesosphere/marathon/commit/fbf7f29468bda2ec29b7fbf80b6864f46a825b7a

    View full-size slide

  55. 41
    The Ugly - Understandability
    https://www.scala-lang.org/api/2.12.10/scala/collection/immutable/List.html

    View full-size slide

  56. 42
    The Ugly - Understandability
    http://scalapuzzlers.com/#pzzlr-054

    View full-size slide

  57. 43
    The Ugly - Understandability

    View full-size slide