Upgrade to Pro — share decks privately, control downloads, hide ads and more …

War stories of lighting a Spark in the Kubernetes sea

Roksolana
September 14, 2019
780

War stories of lighting a Spark in the Kubernetes sea

"Big data, Spark, Kubernetes - you may have heard these buzzwords far too many times. Have you ever thought of combining all of them? Running spark jobs on Kubernetes - sounds intriguing! But sailing the unknown see always goes with a challenge, and not even one. And who knows what is hiding deep in the water? If you are ready, let’s embark on this dangerous yet exciting journey."

Presented at OSDN (Kyiv, Ukraine) and Scala Berlin User Group meetup
Video recording from OSDN (Kyiv, Ukraine) : https://youtu.be/uQtjAUzglEY

Roksolana

September 14, 2019
Tweet

Transcript

  1. •Big data developer at Captif y •Diversity&Inclusion ambassador for Captify

    Kyiv offic e •Women Who Code Kyiv Data Engineering Lead and Mento r •Speaker and traveller Roksolana Diachuk
  2. 01 02 03 How to run Spark on Kubernetes Research

    stage Project development stage Agenda 04 Conclusions
  3. Kubernetes operator etcd Postgres Operator Postgres Deploymen t /StatefulSet Postgres

    Deploymen t /StatefulSet Custom automation for work fl ow actions State kubectl apply
  4. Spark Kubernetes operator Controllers Submission runner Spark Pod Monitor Mutating

    Admission Webhook API Server / Scheduler kubectl spark-app.yaml Spark Application Object Spark Application Pod Events Driver Executor Executor
  5. NAME READY STATUS RESTARTS AG E spark-driver 0/1 Pending 0

    0 s spark-driver 0/1 Init:0/1 0 0 s spark-driver 0/1 Init:Error 0 3s Story №1
  6. apiVersion: sparkoperator.k8s.io/v1alpha1 kind: SparkApplicatio n metadata : name: spark-p i

    namespace: defaul t spec : type: Scal a image: gcr.io/ynli-k8s/spark:v2.4. 0 mainClass: org.apache.spark.examples.SparkP i mainApplicationFile: local:///tmp/jars/spark_example.jar mode: cluste r deps: { } Spark-app.yaml
  7. driver : coreLimit: 1000 m cores: 0. 1 labels :

    version: 2.4. 0 memory: 1024 m serviceAccount: spar k executor : cores: 1 instances: 1 labels: version: 2.4. 0 memory: 1024 m imagePullPolicy: Neve r restartPolicy: Never Spark-app.yaml
  8. NAME READY STATUS RESTARTS AG E sparkoperator 0/1 Pending 0

    0 s sparkoperator 0/1 ContainerCreating 0 0 s sparkoperator 0/1 Error 0 5 s sparkoperator 0/1 CrashLoopBackOff 0 9s Story №2. Docker images
  9. The server could not find the requested resource message at

    http://host:port/apis/ sparkoperator.k8s.io/v1alpha1/ namespaces/default/sparkapplications/ spark-example/status REST API call Story №4. Subresources
  10. Lessons learned • Thorough research and results discussion • Read

    the documentation carefully • Community is everything
  11. Problem Solution Lack of expertise with big data stack on

    k8s Constant discussions and team education Consequence CI/CD creation took months Story №1. Expertise
  12. Problem Solution log4j file configuration is not picked up Building

    a config map and mounting it into Spark custom object Story №2.Logging and monitoring
  13. Story №3.Infrastructure support Problem Solution Shared development cluster Agreements about

    the policies Consequence Data loss and constant infrastructure changes
  14. Lessons learned • Keep in mind challenges while choosing the

    tech stack • Make sure there’s enough expertise for the project development • Communicate a lot
  15. • Expertise development in the departmen t • 1 production-level

    project with missed deadline s • …lots of sleepless nights \_(ツ)_/ Results
  16. Building big data infrastructures on top of Kubernetes is very

    challenging but do not give up, it is fun!   (may produce headaches and eye twitching)
  17. Resources • Running Spark on Kubernetes documentation • Kubernetes documentatio

    n • K. Hightower. Kubernetes: Up & Runnin g • G. Kim. Project Phoenix