How It Works - Spark

HOW IT WORKS: HOW IT WORKS: SPARK SPARK 1

PLAN PLAN Hadoop weakpoints Spark core ideas & concepts Applications
& Ecosystem Demo 2 . 1

RECAP: HADOOP & MAPREDUCE RECAP: HADOOP & MAPREDUCE 3 .
1

PROBLEM: HADOOP WEAKPOINTS PROBLEM: HADOOP WEAKPOINTS slow intermediate results are
saved to disk complex imperative style, too verbose APIs, not- available to regular humans 4 . 1

IDEA IDEA lets keep all data being processed in memory
lets treat whole dataset simply as a collection lets build functional API for processing 5 . 1

SPARK CORE CONCEPTS SPARK CORE CONCEPTS 6 . 1

RDD RDD Resilient Distributed Dataset 6 . 2

RDD FEATURES RDD FEATURES immutable lazy partitioned, location-aware & location-
transparancy persistence distributed, scalable in-memory fault-tolerant, lineage: child knows its parents functional api: declarative, typed 6 . 5

DAG DAG Directed Acyclic Graph 6 . 6

EXECUTION MODEL EXECUTION MODEL 6 . 10

6 . 11

DEPLOYMENT DEPLOYMENT 6 . 12

6 . 13

API API 6 . 14

6 . 15

COMPONENTS COMPONENTS 6 . 16

6 . 17

SPARK SQL & DATAFRAME SPARK SQL & DATAFRAME 7 .
1

SQL api, functional api, typed/untyped interactive, analytical interface, uni ed
programming model distributed, scalable code generation, out-of-the-box optimizations = catalyst engine memory & binary & compute optimizations = tungsten engine integration: multiple datasources, single representation, hive metastore 7 . 4

ECOSYSTEM & USECASES ECOSYSTEM & USECASES 8 . 1

DEMO DEMO spark-shell text le (rdd) load into memory lter,
map, group by reduce save show ui show plan, explain caching rdd -> dataframe 9 . 1

PLACE OF SPARK IN BIGDATA ECOSYSTEM PLACE OF SPARK IN
BIGDATA ECOSYSTEM 10 . 1

10 . 2

10 . 3

CALL TO ACTION CALL TO ACTION High Performance Spark -
Holden Karau install spark, run spark-shell, load text le, play with it http://learn.mapr.com/dev-360-apache-spark- essentials 11 . 1

12 . 1

How It Works - Spark

How It Works - Spark

More Decks by Yuri Ostapchuk

Other Decks in Programming

Featured

Transcript