Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Alice and travelling back in time

Roksolana
February 18, 2022

Alice and travelling back in time

"Alice found out that her friends from the world of pods and higher-order functions are in danger. The key to their happy future is their turbulent past. Alice is going to use the power of Delta Lake to explore the records of the past and find a way to save her friends."
Presented at Scala Love 2022 (virtual)

Roksolana

February 18, 2022
Tweet

More Decks by Roksolana

Other Decks in Programming

Transcript

  1. Roksolana Diachuk • Big Data Developer at Captify • Diversity

    & Inclusion ambassador at Captify • Women Who Code Kyiv Data Engineering Lead • Speaker
  2. “Oh, don’t be rude, Alice!
 I will only entertain you

    a bit with my riddle” “Like it explains anything”
  3. Alice kept thinking of the note she got and the

    journey which led her to it. Was someone helping her? And what this note actually means?
  4. magicdb event_date = 20140608 event_hour = 1402185600 event_hour = 1402196720

    … event_date = 20140609 … event_date = 20211031
  5. Transactions issue in Spark Failure during write —> data loss

    Data update —> data is not consistent
  6. Delta lake setup libraryDependencies += “io.delta” %% “delta-core” % “1.1.0”

    val spark = SparkSession .builder()
 .con fi g(“spark.sql.extensions”, “io.delta.sql.DeltaSparkSessionExtension”)
 .con fi g(“spark.sql.catalog.spark_catalog”, “spark.sql.delta.catalog.DeltaCatalog”) .getOrCreate()
  7. Timestamp Entity_name Entity_type Belongs to 2014-06-07 factory-worker-0 Pod StatefulSet 2014-06-07

    factory-worker-1 Pod StatefulSet 2014-06-07 factory-worker-2 Pod StatefulSet … spark.read.format(“delta”).load(“/magic-db”)
  8. Version Timestamp User Operation Comment 1 2014-06-07 Architect CREATE Default

    name space city 2 2014-06-08 Architect CREATE Factory … val deltaTable = DeltaTable.forPath(spark, “/magic-db”) deltaTable.history()
  9. 5. Pod lost NAME READY STATUS AGE pod/magic-db-cluster-0 1/1 Error

    15m pod/magic-db-cluster-1 1/1 Running 15m pod/magic-db-cluster-2 1/1 Running 15m
  10. Alice was sitting at the cafe. Even her favourite cheesecake

    could not console her. She knew s h e h a d t o d o something to save the pods and higher-order functions.
  11. val newPods = Seq( (“2021-11-01”, “magic-db-0”, “Pod”, “CRD”), (“2021-11-01”, “magic-db-1”,

    “Pod”, “CRD”), (“2021-11-01”, “magic-db-2”, “Pod”, “CRD”), ).toDF("timestamp", “entity_name”, “entity_type”, “belongs_to”)
  12. Version Timestamp User Operation Comment 9 2021-11-01 Alice CREATE New

    pods created 10 2021-11-01 System DELETE All pods lost val deltaTable = DeltaTable.forPath(spark, “/magic-db”) deltaTable.history()
  13. “I need to get back to the world of pods

    and higher-order functions”