Slide 1

Slide 1 text

Lazy Instantiation for Object Yuta Ono scala meetup 2016-05-26

Slide 2

Slide 2 text

Intro java.io.NotSerializableException dstream foreachRDD { rdd => val w1 = “driver exe” rdd foreach { rec => val w2 = “executor exe” } } } FYFDVUPS UBTL UBTL ESJWFS KPC KPC FYFDVUPS UBTL UBTL spark cluster

Slide 3

Slide 3 text

• lazy val ͔ ௚઀ object Λݺͼग़ͯ͠ղܾ͍ͯͨ͠
 => DI Ͱ͖ͳ͍ trait Repository { def put(s: Iterator[String]): Unit } object RepositoryImpl extends Repository { if (notInitialize()) connect() def put(s: Iterator[String]): Unit = { … } } class A { lazy val repo = RepositoryImpl def f(dstream: DStream[String]): Unit = { dstream foreachRDD { rdd => rdd foreachPartition { par => repo.put(par) } } } }

Slide 4

Slide 4 text

Motivation • Instance Խ͢Δͱ connection ΛுΔ repository object Λ spark ͷ executor ଆͰ instance Խ͢ΔΑ͏ʹ͍ͨ͠ • DI ͍ͨ͠ • Unit Test ͍ͨ͠

Slide 5

Slide 5 text

Lazy Evaluations In Scala • lazy val • Stream • view • call-by-name parameter a: => A

Slide 6

Slide 6 text

case class LazyInstantiate[+A](private val _run: Unit => A) { lazy val run: A = _run(()) def rerun: A = _run(()) } object LazyInstantiate { type LI[A] = LazyInstantiate[A] def apply[A](a: => A): LazyInstantiate[A] = LazyInstantiate(Unit => a) implicit class AtoLI[+A](a: => A) { def toLI: LazyInstantiate[A] = apply[A](a) } } LazyInstantiate https://gist.github.com/yutaono/ 4a80212dd51c9e31272bef23d9229a9c

Slide 7

Slide 7 text

class A(repository: LI[Repository]) extends Serializable { def f(dstream: DStream[String]): Unit = { dstream foreachRDD { rdd => rdd foreachPartition { par => repository.run.put(par) } } } } new A(RepositoryImpl.toLI) Solution

Slide 8

Slide 8 text

Remaining issues (lazy..) • org.specs2.mock.Mockito ͕ Spark Ͱಈ͔ͳ ͍ͷͰ Spark จ຺ͷ Unit Test ͕Ͱ͖͍ͯͳ͍ Caused by: java.io.NotSerializableException: Repository$ $EnhancerByMockitoWithCGLIB$$2b303ed2

Slide 9

Slide 9 text

Links • https://gist.github.com/yutaono/ 4a80212dd51c9e31272bef23d9229a9c • http://allegro.tech/2015/08/spark-kafka- integration.html • http://spark.apache.org/docs/1.4.1/ streaming-programming-guide.html#design- patterns-for-using-foreachrdd

Slide 10

Slide 10 text

Appendixes class A(repository: LI[Repository]) extends Serializable { def f(dstream: DStream[String]): Unit = { val repoForDriver = repository.rerun dstream foreachRDD { rdd => repoForDriver.put() rdd foreachPartition { par => repository.run.put(par) } } } } new A(RepositoryImpl.toLI) driver ͱ executor Ͱ࢖༻͍ͨ͠৔߹