Alice and travelling back in time

Roksolana Diachuk • Big Data Developer at Captify • Diversity & Inclusion ambassador at Captify • Women Who Code Kyiv Data Engineering Lead • Speaker

In previous episodes talks…

Functional forest

magic-db- cluster-0

2 years later

magic-db- cluster-0 Long time no see! !

NAME READY STATUS AGE launcher-crd 1/1 Running 33s magic-db- cluster-0

3 months passed

{ “predictions”: [ [ “B-natural phenomenon”, “O”, “B-geographical entity”, “B-time indicator”, ] ] }

{ “instances”: [Pods and higher- order functions are in danger]} Test

5 Days passed

“Oh, don’t be rude, Alice!
 I will only entertain you a bit with my riddle” “Like it explains anything”

What was lost can be found only in the anomalies of this world

ML model Source data Anomalies detected

01010100 01101000 01100101 00100000 01110000 01100001 01110011 …

The past is the key to the future

1 Day LAter

Alice kept thinking of the note she got and the journey which led her to it. Was someone helping her? And what this note actually means?

“I need to check magic-db”

alice% _

alice% _ You’re now connected to the magic-db

magicdb event_date = 20140608 event_hour = 1402185600 event_hour = 1402196720 … event_date = 20140609 … event_date = 20211031

“Wait, there’re some tools that allow to look into data versions”

Cloud storage Delta lake storage layer Apache Spark services

Data fi les Metadata Storage layer

Ingestion tables Re fi ned tables Agg data store

ACID Atomicity Consistency Isolation Durability

Transactions issue in Spark Failure during write —> data loss Data update —> data is not consistent

ACID transactions Schema enforcement Metadata handling Upserts and deletes Time travel

Time travel “That’s exactly what I need”

magicdb event_date = 20140608 event_hour = 1402185600 event_hour = 1402196720 … event_date = 20140609 … _metadata

“Hm, there are already metadata fi les available”

magicdb _last_checkpoint 00000000000000005536 … _metadata 00000000000000005537 00000000000000005538 00000000000000005539

Delta lake setup libraryDependencies += “” %% “delta-core” % “1.1.0” val spark = SparkSession .builder()
 .con fi g(“spark.sql.extensions”, “”)
 .con fi g(“spark.sql.catalog.spark_catalog”, “”) .getOrCreate()

Timestamp Entity_name Entity_type Belongs to 2014-06-07 factory-worker-0 Pod StatefulSet 2014-06-07 factory-worker-1 Pod StatefulSet 2014-06-07 factory-worker-2 Pod StatefulSet …“delta”).load(“/magic-db”)

“It’s possible to look directly into the history of the data!”

Version Timestamp User Operation Comment 1 2014-06-07 Architect CREATE Default name space city 2 2014-06-08 Architect CREATE Factory … val deltaTable = DeltaTable.forPath(spark, “/magic-db”) deltaTable.history()

0. World creation

1. Default namespace city creation

2. Factory creation

3. Forest discovery

4. The fi rst human’s visit

5. Pod lost NAME READY STATUS AGE pod/magic-db-cluster-0 1/1 Error 15m pod/magic-db-cluster-1 1/1 Running 15m pod/magic-db-cluster-2 1/1 Running 15m

Slide 47 text .format(“delta”) .option(“versionAsOf”, “4”) .load(“/magic-db”) . fi lter($“entity_type” == “Pod”) .count Result: 10400

Slide 48 text .format(“delta”) .option(“versionAsOf”, “5”) .load(“/magic-db”) . fi lter($“entity_type” == “Pod”) .count Result: 10399

6. Anomalies PodLost PodLost PodLost PodLost PodLost PodLost

Slide 50 text .format(“delta”) .option(“versionAsOf”, “6”) .load(“/magic-db”) . fi lter($“entity_type” == “Pod”) .count Result: 6130

“What happened to the pods?”

7. Second human’s visit

8. Factory failure

“Although there was this mysterious visitor…”

Alice was sitting at the cafe. Even her favourite cheesecake could not console her. She knew s h e h a d t o d o something to save the pods and higher-order functions.

“I can just write them back into the database”

val newPods = Seq( (“2021-11-01”, “magic-db-0”, “Pod”, “CRD”), (“2021-11-01”, “magic-db-1”, “Pod”, “CRD”), (“2021-11-01”, “magic-db-2”, “Pod”, “CRD”), ).toDF("timestamp", “entity_name”, “entity_type”, “belongs_to”)

 .write .format(“delta”) .mode(SaveMode.Append) .save(“/magic-db”)“delta”) .load(“/magic-db”)
 . fi lter($“entity_type” == “Pod”) .orderBy($“timestamp”.desc)

Timestamp Entity_name Entity_type Belongs to 2021-11-01 magic-db-0 Pod CRD 2021-11-01 magic-db-1 Pod CRD 2021-11-01 magic-db-2 Pod CRD …

val deltaTable = DeltaTable.forPath(“/magic-db”)

val deltaTable = DeltaTable.forPath(“/magic-db”)

Version Timestamp User Operation Comment 9 2021-11-01 Alice CREATE New pods created 10 2021-11-01 System DELETE All pods lost val deltaTable = DeltaTable.forPath(spark, “/magic-db”) deltaTable.history()

10. All pods lost

 . fi lter($“entity_type” == “Pod”) .count Result: 0

“Hm, there was nothing about the functional forest”

Slide 67

Slide 67 text .format(“delta”) .option(“versionAsOf”, “4”) .load(“/magic-db”)
 .select(“entity_type”).distinct Result: Pod, ReplicaSet, Deployment, CRD…

“What if the architect could help me?”

“I should fi nd a way to help them”

“I need to get back to the world of pods and higher-order functions”

To be continued…

Thank you for attention

