Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How It Works - Spark
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yuri Ostapchuk
September 13, 2021
Programming
0
27
How It Works - Spark
A series of talks on data engineering
Yuri Ostapchuk
September 13, 2021
Tweet
Share
More Decks by Yuri Ostapchuk
See All by Yuri Ostapchuk
Detecting person's direction of interest
twist522
0
26
Hedera fundamentals course
twist522
0
17
Sweet.tv - hackathon 2020 - movie recommendations by emotion
twist522
0
10
How It Works - Kafka
twist522
0
49
Spark: From Interactivity To Production (And Back)
twist522
0
25
What Is Data Engineering
twist522
0
41
What Is Big Data
twist522
0
27
How I Learned To Stop Worrying And Love LSP (And Metals)
twist522
0
35
How It Works - Hadoop
twist522
0
30
Other Decks in Programming
See All in Programming
並行開発のためのコードレビュー
miyukiw
0
220
FOSDEM 2026: STUNMESH-go: Building P2P WireGuard Mesh Without Self-Hosted Infrastructure
tjjh89017
0
170
[KNOTS 2026登壇資料]AIで拡張‧交差する プロダクト開発のプロセス および携わるメンバーの役割
hisatake
0
290
IFSによる形状設計/デモシーンの魅力 @ 慶應大学SFC
gam0022
1
300
Smart Handoff/Pickup ガイド - Claude Code セッション管理
yukiigarashi
0
140
CSC307 Lecture 02
javiergs
PRO
1
780
開発者から情シスまで - 多様なユーザー層に届けるAPI提供戦略 / Postman API Night Okinawa 2026 Winter
tasshi
0
200
20260127_試行錯誤の結晶を1冊に。著者が解説 先輩データサイエンティストからの指南書 / author's_commentary_ds_instructions_guide
nash_efp
1
980
フロントエンド開発の勘所 -複数事業を経験して見えた判断軸の違い-
heimusu
7
2.8k
CSC307 Lecture 01
javiergs
PRO
0
690
Package Management Learnings from Homebrew
mikemcquaid
0
230
CSC307 Lecture 05
javiergs
PRO
0
500
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
450
Designing for humans not robots
tammielis
254
26k
The SEO identity crisis: Don't let AI make you average
varn
0
260
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.7k
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
66
37k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
66
Raft: Consensus for Rubyists
vanstee
141
7.3k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
2.1k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Crafting Experiences
bethany
1
49
Transcript
HOW IT WORKS: HOW IT WORKS: SPARK SPARK 1
PLAN PLAN Hadoop weakpoints Spark core ideas & concepts Applications
& Ecosystem Demo 2 . 1
RECAP: HADOOP & MAPREDUCE RECAP: HADOOP & MAPREDUCE 3 .
1
PROBLEM: HADOOP WEAKPOINTS PROBLEM: HADOOP WEAKPOINTS slow intermediate results are
saved to disk complex imperative style, too verbose APIs, not- available to regular humans 4 . 1
IDEA IDEA lets keep all data being processed in memory
lets treat whole dataset simply as a collection lets build functional API for processing 5 . 1
SPARK CORE CONCEPTS SPARK CORE CONCEPTS 6 . 1
RDD RDD Resilient Distributed Dataset 6 . 2
6 . 3
6 . 4
RDD FEATURES RDD FEATURES immutable lazy partitioned, location-aware & location-
transparancy persistence distributed, scalable in-memory fault-tolerant, lineage: child knows its parents functional api: declarative, typed 6 . 5
DAG DAG Directed Acyclic Graph 6 . 6
6 . 7
6 . 8
6 . 9
EXECUTION MODEL EXECUTION MODEL 6 . 10
6 . 11
DEPLOYMENT DEPLOYMENT 6 . 12
6 . 13
API API 6 . 14
6 . 15
COMPONENTS COMPONENTS 6 . 16
6 . 17
SPARK SQL & DATAFRAME SPARK SQL & DATAFRAME 7 .
1
7 . 2
7 . 3
SQL api, functional api, typed/untyped interactive, analytical interface, uni ed
programming model distributed, scalable code generation, out-of-the-box optimizations = catalyst engine memory & binary & compute optimizations = tungsten engine integration: multiple datasources, single representation, hive metastore 7 . 4
7 . 5
7 . 6
ECOSYSTEM & USECASES ECOSYSTEM & USECASES 8 . 1
8 . 2
DEMO DEMO spark-shell text le (rdd) load into memory lter,
map, group by reduce save show ui show plan, explain caching rdd -> dataframe 9 . 1
PLACE OF SPARK IN BIGDATA ECOSYSTEM PLACE OF SPARK IN
BIGDATA ECOSYSTEM 10 . 1
10 . 2
None
10 . 3
CALL TO ACTION CALL TO ACTION High Performance Spark -
Holden Karau install spark, run spark-shell, load text le, play with it http://learn.mapr.com/dev-360-apache-spark- essentials 11 . 1
12 . 1