Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lazy Instantiation for Object

Lazy Instantiation for Object

Introduction of a Scala snippet that uses lazy evaluation features on Scala. It prevents java.io.NotSerializableException on some spark apps.

yutaono

May 26, 2016
Tweet

More Decks by yutaono

Other Decks in Technology

Transcript

  1. Intro java.io.NotSerializableException dstream foreachRDD { rdd => val w1 =

    “driver exe” rdd foreach { rec => val w2 = “executor exe” } } } FYFDVUPS UBTL UBTL ESJWFS KPC KPC FYFDVUPS UBTL UBTL spark cluster
  2. • lazy val ͔ ௚઀ object Λݺͼग़ͯ͠ղܾ͍ͯͨ͠
 => DI Ͱ͖ͳ͍

    trait Repository { def put(s: Iterator[String]): Unit } object RepositoryImpl extends Repository { if (notInitialize()) connect() def put(s: Iterator[String]): Unit = { … } } class A { lazy val repo = RepositoryImpl def f(dstream: DStream[String]): Unit = { dstream foreachRDD { rdd => rdd foreachPartition { par => repo.put(par) } } } }
  3. Motivation • Instance Խ͢Δͱ connection ΛுΔ repository object Λ spark

    ͷ executor ଆͰ instance Խ͢ΔΑ͏ʹ͍ͨ͠ • DI ͍ͨ͠ • Unit Test ͍ͨ͠
  4. Lazy Evaluations In Scala • lazy val • Stream •

    view • call-by-name parameter a: => A
  5. case class LazyInstantiate[+A](private val _run: Unit => A) { lazy

    val run: A = _run(()) def rerun: A = _run(()) } object LazyInstantiate { type LI[A] = LazyInstantiate[A] def apply[A](a: => A): LazyInstantiate[A] = LazyInstantiate(Unit => a) implicit class AtoLI[+A](a: => A) { def toLI: LazyInstantiate[A] = apply[A](a) } } LazyInstantiate https://gist.github.com/yutaono/ 4a80212dd51c9e31272bef23d9229a9c
  6. class A(repository: LI[Repository]) extends Serializable { def f(dstream: DStream[String]): Unit

    = { dstream foreachRDD { rdd => rdd foreachPartition { par => repository.run.put(par) } } } } new A(RepositoryImpl.toLI) Solution
  7. Remaining issues (lazy..) • org.specs2.mock.Mockito ͕ Spark Ͱಈ͔ͳ ͍ͷͰ Spark

    จ຺ͷ Unit Test ͕Ͱ͖͍ͯͳ͍ Caused by: java.io.NotSerializableException: Repository$ $EnhancerByMockitoWithCGLIB$$2b303ed2
  8. Appendixes class A(repository: LI[Repository]) extends Serializable { def f(dstream: DStream[String]):

    Unit = { val repoForDriver = repository.rerun dstream foreachRDD { rdd => repoForDriver.put() rdd foreachPartition { par => repository.run.put(par) } } } } new A(RepositoryImpl.toLI) driver ͱ executor Ͱ࢖༻͍ͨ͠৔߹