Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

How It Works - Spark

How It Works - Spark

A series of talks on data engineering

Avatar for Yuri Ostapchuk

Yuri Ostapchuk

September 13, 2021
Tweet

More Decks by Yuri Ostapchuk

Other Decks in Programming

Transcript

  1. PROBLEM: HADOOP WEAKPOINTS PROBLEM: HADOOP WEAKPOINTS slow intermediate results are

    saved to disk complex imperative style, too verbose APIs, not- available to regular humans 4 . 1
  2. IDEA IDEA lets keep all data being processed in memory

    lets treat whole dataset simply as a collection lets build functional API for processing 5 . 1
  3. RDD FEATURES RDD FEATURES immutable lazy partitioned, location-aware & location-

    transparancy persistence distributed, scalable in-memory fault-tolerant, lineage: child knows its parents functional api: declarative, typed 6 . 5
  4. SQL api, functional api, typed/untyped interactive, analytical interface, uni ed

    programming model distributed, scalable code generation, out-of-the-box optimizations = catalyst engine memory & binary & compute optimizations = tungsten engine integration: multiple datasources, single representation, hive metastore 7 . 4
  5. DEMO DEMO spark-shell text le (rdd) load into memory lter,

    map, group by reduce save show ui show plan, explain caching rdd -> dataframe 9 . 1
  6. CALL TO ACTION CALL TO ACTION High Performance Spark -

    Holden Karau install spark, run spark-shell, load text le, play with it http://learn.mapr.com/dev-360-apache-spark- essentials 11 . 1