Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scala collection methods flatMap and flatten ar...

Scala collection methods flatMap and flatten are more powerful than monadic flatMap and flatten

The flatten and flatMap methods of Scala's monad instances do not support mixing of collection types, such as flattening a List of Sets. However, the native flatten and flatMap methods of Scala collections like List, Vector, and Set do support mixing types. They are implemented using traversals, builders, and implicit CanBuildFrom builder factories that can build collections with heterogeneous element types. When flattening or flatMapping a collection, the appropriate CanBuildFrom is used to get a builder that adds elements of the nested collection to the output collection being built.

Avatar for Philip Schwarz

Philip Schwarz

February 18, 2018
Tweet

More Decks by Philip Schwarz

Other Decks in Programming

Transcript

  1. Scala collection methods flatMap and flatten are more powerful than

    monadic flatMap and flatten A monad is an implementation of one of the minimal sets of monadic combinators, satisfying the laws of associativity and identity. Here is a monad trait implementing monadic combinators unit and flatMap trait Monad[F[_]]{ def unit[A](a: ⇒ A): F[A] def flatMap[A,B](ma: F[A])(f: A ⇒ F[B]): F[B] def map[A,B](m: F[A])(f: A ⇒ B): F[B] = flatMap(m)(a ⇒ unit(f(a))) def flatten[A](mma: F[F[A]]): F[A] = flatMap(mma)(ma ⇒ ma) … } And here is a monad trait implementing monadic combinators unit, map and flatten trait Monad[F[_]]{ def unit[A](a: ⇒ A): F[A] def map[A,B](m: F[A])(f: A ⇒ B): F[B] def flatten[A](mma: F[F[A]]): F[A] def flatMap[A,B](ma: F[A])(f: A ⇒ F[B]): F[B] = flatten(map(ma)(f)) … }
  2. The flatten function takes an F[F[A]] and returns an F[A]

    def flatten[A](mma: F[F[A]]): F[A] What it does is “remove a layer” of F. The flatMap function takes an F[A] and a function from A to F[B] and returns an F[B] def flatMap[A,B](ma: F[A])(f: A ⇒ F[B]): F[B] What it does is apply to each A element of ma a function f producing an F[B], but instead of returning the resulting F[F[B]], it flattens it and returns an F[B]. In the first monad trait, flatten is defined in terms of flatMap: def flatten[A](mma: F[F[A]]): F[A] = flatMap(mma)(ma => ma) So flattening is just flatMapping the identity function x => x. In the second monad trait, flatMap is defined in terms of map and flatten: def flatMap[A,B](ma: F[A])(f: A ⇒ F[B]): F[B] = flatten(map(ma)(f)) So flatMapping a function is just mapping the function first and then flattening the result. flattening is just flatMapping identity – flatMapping is mapping and then flattening
  3. trait Monad[F[A]]{ def unit[A](a: ⇒ A): F[A] def flatMap[A,B](ma: F[A])(f:

    A ⇒ F[B]): F[B] def map[A,B](m: F[A])(f: A ⇒ B): F[B] = flatMap(m)(a ⇒ unit(f(a))) def flatten[A](mma: F[F[A]]): F[A] = flatMap(mma)(ma ⇒ ma) } val listMonad = new Monad[List] { override def unit[A](a: ⇒ A) = List(a) override def flatMap[A,B](ma: List[A])(f: A ⇒ List[B]): List[B] = ma flatMap f } We can now use listMonad’s flatten method to flatten a List of Lists : assert(listMonad.flatten(List(List(1,2,3),List[Int](),List(4,5,6))) == List(1,2,3,4,5,6)) Similarly for other collections like Set, Vector, etc: val setMonad = new Monad[Set] { … } val vectorMonad = new Monad[Vector] { … } assert(setMonad.flatten(Set(Set(1,2,3),Set[Int](),Set(4,5,6))) == Set(1,2,3,4,5,6)) assert(vectorMonad.flatten(Vector(Vector(1,2,3),Vector[Int](),Vector(4,5,6))) == Vector(1,2,3,4,5,6)) But what we cannot do is mix types. E.g. we can’t flatten a List of Sets: assert(listMonad.flatten(List(Set(1,2,3),Set[Int](),Set(4,5,6))) == List(1,2,3,4,5,6)) ^ error: type mismatch; found: List[scala.collection.immutable.Set[Int]] required: List[List[?]] The reason is that the signature of flatten expects an F[F[A]], not an F[G[A]]. E.g. it expects a List[List[A]] or a Set[Set[A]], not a List[Set[A]] If we instantiate the first Monad trait using List’s own flatMap method then the trait gives us a flatten method for free We can flatten F[F[A]], but not F[G[A]]. e.g we can flatten List[List[A]], not List[Set[A]]. A monad can flatten F[F[A]], but not F[G[A]]
  4. trait Monad[F[A]]{ def unit[A](a: ⇒ A): F[A] def map[A,B](m: F[A])(f:

    A ⇒ B): F[B] def flatten[A](mma: F[F[A]]): F[A] def flatMap[A,B](ma: F[A])(f: A ⇒ F[B]): F[B] = flatten(map(ma)(f)) } val listMonad = new Monad[List] { def unit[A](a: ⇒ A): List[A] = List(a) def map[A,B](m: List[A])(f: A ⇒ B): List[B] = m map f def flatten[A](mma: List[List[A]]): List[A] = mma.flatten } We can now use listMonad’s flatMap method, to flatmap a list with a function that creates a list: assert(listMonad.flatMap(List(1,0,4)){case 0 => List[Int]() case x => List(x,x+1,x+2)} == List(1,2,3,4,5,6)) Similarly for other collections like Set, Vector, etc: val setMonad = new Monad[Set] { … } val vectorMonad = new Monad[Vector] { … } assert(setMonad.flatMap(Set(1,0,4)){case 0 => Set[Int]() case x => Set(x,x+1,x+2)} == Set(1,2,3,4,5,6)) assert(vectorMonad.flatMap(Vector(1,0,4)){case 0 => Vector[Int]() case x => Vector(x,x+1,x+2)} == Vector(1,2,3,4,5,6)) But what we cannot do is mix types. E.g. we can’t flatmap a List with a function that creates a Set : assert(listMonad.flatMap(List(1,0,4)){case 0 => Set[Int]() case x => Set(x,x+1,x+2)} == Set(1,2,3,4,5,6)) ^ ^ error: type mismatch; found: scala.collection.immutable.Set[Int] required: List[?] The reason is that the signature of flatMap operates on an F[A] and a function that creates an F[B], not a G[B]. E.g. it operates on a List[A] and a function that creates a List[A], not a Set[A]. If we instantiate the second Monad trait, using List’s own map and flatten methods then the trait gives us a flatMap method for free We can flatMap F[A] with a function that creates an F[B], not one that creates a G[B]. e.g. we can flatMap List[A] with a function that creates a List[B], not one that creates a Set[B]. A monad can flatMap F[A] with a function returning F[B], but not with a function returning G[B]
  5. The flatten and flatMap methods of our monad instances don’t

    support mixing of types. e.g. the following does not compile: assert(listMonad.flatten(List(Set(1,2,3),Set[Int](),Set(4,5,6))) == List(1,2,3,4,5,6)) assert(listMonad.flatMap(List(1,0,4)){case 0 => Set[Int]() case x => Set(x,x+1,x+2)} == List(1,2,3,4,5,6)) The flatten and flatMap methods of List, on the other hand, do allow mixing of types. e.g. the following works: assert(List(Set(1,2,3),Set[Int](),Set(4,5,6)).flatten == List(1,2,3,4,5,6)) assert(List(1,0,4).flatMap{case 0 => Set[Int]() case x => Set(x,x+1,x+2)} == List(1,2,3,4,5,6)) In fact List supports even more mixing of types: assert(List(Set(1,2,3),Vector[Int](),List(4,5,6)).flatten == List(1,2,3,4,5,6)) assert(List(1,0,4).flatMap{case 0 => Set[Int]() case x => Vector(x,x+1,x+2)} == List(1,2,3,4,5,6)) How do the flatten and flatMap methods of Scala collections support mixing of types? The monadic flatten and flatMap methods don’t support mixing of types, but the flatten and flatMap methods of Scala collections do
  6. From https://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html Almost all collection operations are implemented in terms

    of traversals and builders. Traversals are handled by Traversable’s foreach method, and building new collections is handled by instances of class Builder. trait Builder[-Elem, +To] Builders are generic in both the element type, Elem, and in the type, To, of collections they return. … You can add an element x to a builder b with b += x. There’s also syntax to add more than one element at once, for instance b += (x, y). Adding another collection with b ++= xs works as for buffers. The result() method returns a collection from a builder. … [flatMap] uses a builder factory that’s passed as an additional implicit parameter of type CanBuildFrom. def flatMap[B, That](f: A => scala.collection.GenTraversableOnce[B]) (implicit bf: CanBuildFrom[Repr, B, That]): That … CanBuildFrom is a factory for a builder: trait CanBuildFrom[-From, -Elem, +To] CanBuildFrom represents builder factories. It has three type parameters: • From indicates the type for which this builder factory applies • Elem indicates the element type of the collection to be built • To indicates the type of collection to build From https://www.scala-lang.org/blog/2017/05/30/tribulations-canbuildfrom.html CanBuildFrom is probably the most infamous abstraction of the current collections. It is mainly criticised for making scary type signatures. The flatMap and flatten methods of Scala collections rely on traversals, collection builders and builder factories
  7. List(1,0,4).flatMap{case 0 => Set[Int]() case x => Set(x,x+1,x+2)} The following

    implicit builder factory in the List companion object is selected: implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, List[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]] def newBuilder[A]: Builder[A, List[A]] = new ListBuffer[A] The following flatMap definition in List is selected final override def flatMap[B,That](f:A=>GenTraversableOnce[B])(implicit bf: CanBuildFrom[List[A],B,That]):That bf: builder factory creating a builder (a ListBuffer), that can be used to build a List. If the selected list builder factory bf is ReusableCBF, then List’s flatMap doesn’t use the factory at all! Instead, it does its own list building: For each list element, flatMap adds (to the list it is building) the elements of the traversable created by applying f to the list element. If the selected builder factory is some other factory, then flatMap delegates to the flatMap in trait TraversableLike: def flatMap[B, That](f: A => GenTraversableOnce[B]) (implicit bf: CanBuildFrom[Repr, B, That]): That = { def builder = bf(repr) val b = builder for (x <- this) b ++= f(x).seq b.result } flatMap first uses builder factory bf to create a list builder. For each list element, flatMap then gets the builder to add (to the list it is building) the elements of the traversable (e.g. a Set) created by applying f to the list element. How List’s flatMap method builds a List ReusableCBF delegates creation of a builder to this method NOTE: f does not return a List, it returns something that can be traversed using its foreach method. List’s flatMap method NOTE: f returns something that can be traversed using its foreach method
  8. Vector(1,0,4).flatMap{case 0=>Set[Int]() case x=>Set(x,x+1,x+2)} The following implicit builder factory in

    the Vector companion object is selected: def newBuilder[A]: Builder[A, Vector[A]] = new VectorBuilder[A] implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Vector[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]] The following flatMap definition in TraversableLike is selected: def flatMap[B, That](f: A => GenTraversableOnce[B]) (implicit bf: CanBuildFrom[Repr, B, That]): That = { def builder = bf(repr) val b = builder for (x <- this) b ++= f(x).seq b.result } bf: builder factory creating a builder (a VectorBuilder) that can be used to build a Vector. For each vector element, flatMap gets VectorBuilder to add (to the vector it is building) the elements of the traversable (e.g. a Set) created by applying f to the vector element. How Vector’s flatMap method builds a Vector ReusableCBF delegates creation of a builder to this method
  9. Set(1,0,4).flatMap{case 0=>Vector[Int]() case x=>Vector(x,x+1,x+2)} The following flatMap definition in TraversableLike

    is selected: def flatMap[B, That](f: A => GenTraversableOnce[B]) (implicit : CanBuildFrom[Repr, B, That]): That = { def builder = (repr) val b = builder for (x <- this) b ++= f(x).seq b.result } which delegates creation of a builder to the following method in abstract class ImmutableSetFactory def newBuilder[A]: Builder[A, CC[A]] = new SetBuilder[A, CC[A]](empty[A]) which delegates to the following method in abstract class GenSetFactory def setCanBuildFrom[A] = new CanBuildFrom[CC[_], A, CC[A]] { def apply(from: CC[_]) = from match { case from: Set[_] => from.genericBuilder.asInstanceOf[Builder[A, CC[A]]] case _ => newBuilder[A] } def apply() = newBuilder[A] } For each set element, flatMap gets SetBuilder to add (to the set it is building) the elements of the traversable (e.g. a Vector) created by applying f to the set element. How Set’s flatMap method builds a Set The following implicit builder factory in the Set companion object is selected: implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Set[A]] = setCanBuildFrom[A] bf: builder factory creating a builder (a SetBuilder) that can be used to build a Set.
  10. Unlike the flatten method of our monad instances, the flatten

    method of Scala collections, e.g. List/Vector/Set, can do the following: assert(List(Set(1,2,3),Set[Int](),Set(4,5,6)).flatten == List(1,2,3,4,5,6)) assert(Vector(Set(1,2,3),Set[Int](),Set(4,5,6)).flatten == Vector(1,2,3,4,5,6)) assert(Set(Vector(1,2,3),Vector[Int](),Vector(4,5,6)).flatten == Set(1,2,3,4,5,6)) How does the flatten method convert the nested collections, whose types differ from the type of the enclosing collection, to collections of the same type as the enclosing collection? In all the above three cases, flatten is implemented in trait GenericTraversableTemplate: def flatten[B](implicit asTraversable: A => GenTraversableOnce[B]): CC[B] = { val b = genericBuilder[B] for (xs <- sequential) b ++= asTraversable(xs).seq b.result() } When the enclosing collection is List, the following builder in the List companion object is used: def newBuilder[A]: Builder[A, List[A]] = new ListBuffer[A] When the enclosing collection is Vector, the following builder in the Vector companion object is used: def newBuilder[A]: Builder[A, Vector[A]] = new VectorBuilder[A] When the enclosing collection is Set, the following builder in abstract class ImmutableSetFactory is used: def newBuilder[A]: Builder[A, CC[A]] = new SetBuilder[A, CC[A]](empty[A]) For each collection-typed element (e.g. a Set or Vector), flatMap gets the builder (the ListBuffer / VectorBuilder / SetBuilder) to add (to the List / Vector / Set it is building) the elements of the collection-typed element (a traversable whose elements can be accessed with its foreach method). an implicit conversion which asserts that the element type of this collection, e.g. List/Vector/Set, is a GenTraversableOnce, which provides a foreach method giving access to its elements. How the flatten method of a collection handles nested collections of types differing from that of the collection b: a collection builder - the type of builder depends on the type of the collection to be built, see bottom of slide
  11. The flatMap and flatten methods of Scala collections can operate

    on elements of many types, including Option, Range and Map Option, Range and Map are all examples of GenTraversableOnce import scala.collection.GenTraversableOnce val o: GenTraversableOnce[Int] = Some(3) val r: GenTraversableOnce[Int] = Range(1,3) val m: GenTraversableOnce[(String,Int)] = Map("1"->1,"2"->2) So the flatMap and flatten methods of a collection, e.g. a List, can operate on those types: assert(List(Some(1),None,Some(2),None,Some(3)).flatten == List(1,2,3)) assert(List(0,1,2).flatMap{case 0 => None case x => Some(x)} == List(1,2)) assert(List(Range(1,4),Range(4,7)).flatten == List(1,2,3,4,5,6)) assert(List(0,1,4).flatMap{case 0 => Range(0,0) case x => Range(x,x+3)} == List(1,2,3,4,5,6)) assert(List(Map("1" -> 1, "2" -> 2), Map[String,Int](), Map("3" -> 3, "4" -> 4)).flatten == List("1"->1, "2"->2, "3"->3, "4"->4)) assert(List(0,1,3).flatMap{case 0 => Map[String,Int]() case x => Map(s"$x"->x, s"${x+1}"->(x+1))} == List("1"->1, "2"->2, "3"->3, "4"->4))