Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How It Works - Spark
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Yuri Ostapchuk
September 13, 2021
Programming
0
27
How It Works - Spark
A series of talks on data engineering
Yuri Ostapchuk
September 13, 2021
Tweet
Share
More Decks by Yuri Ostapchuk
See All by Yuri Ostapchuk
Detecting person's direction of interest
twist522
0
26
Hedera fundamentals course
twist522
0
17
Sweet.tv - hackathon 2020 - movie recommendations by emotion
twist522
0
11
How It Works - Kafka
twist522
0
49
Spark: From Interactivity To Production (And Back)
twist522
0
25
What Is Data Engineering
twist522
0
41
What Is Big Data
twist522
0
27
How I Learned To Stop Worrying And Love LSP (And Metals)
twist522
0
35
How It Works - Hadoop
twist522
0
30
Other Decks in Programming
See All in Programming
「接続」—パフォーマンスチューニングの最後の一手 〜点と点を結ぶ、その一瞬のために〜
kentaroutakeda
2
790
[SF Ruby Feb'26] The Silicon Heel
palkan
0
110
AHC061解説
shun_pi
0
400
RubyとGoでゼロから作る証券システム: 高信頼性が求められるシステムのコードの外側にある設計と運用のリアル
free_world21
0
310
maplibre-gl-layers - 地図に移動体たくさん表示したい
kekyo
PRO
0
290
AWS Infrastructure as Code の新機能 2025 総まとめ 〜SA 4人による怒涛のデモ祭り〜
konokenj
10
3.4k
ふつうのRubyist、ちいさなデバイス、大きな一年 / Ordinary Rubyists, Tiny Devices, Big Year
chobishiba
1
480
Windows on Ryzen and I
seosoft
0
320
社内規程RAGの精度を73.3% → 100%に改善した話
oharu121
13
8.2k
go directiveを最新にしすぎないで欲しい話──あるいは、Go 1.26からgo mod initで作られるgo directiveの値が変わる話 / Go 1.26 リリースパーティ
arthur1
2
570
AI Assistants for Your Angular Solutions
manfredsteyer
PRO
0
150
今からFlash開発できるわけないじゃん、ムリムリ! (※ムリじゃなかった!?)
arkw
0
110
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.5k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
290
A designer walks into a library…
pauljervisheath
210
24k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
390
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.5k
Designing for Timeless Needs
cassininazir
0
170
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.9k
YesSQL, Process and Tooling at Scale
rocio
174
15k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.4k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
Navigating Weather and Climate Data
rabernat
0
140
Transcript
HOW IT WORKS: HOW IT WORKS: SPARK SPARK 1
PLAN PLAN Hadoop weakpoints Spark core ideas & concepts Applications
& Ecosystem Demo 2 . 1
RECAP: HADOOP & MAPREDUCE RECAP: HADOOP & MAPREDUCE 3 .
1
PROBLEM: HADOOP WEAKPOINTS PROBLEM: HADOOP WEAKPOINTS slow intermediate results are
saved to disk complex imperative style, too verbose APIs, not- available to regular humans 4 . 1
IDEA IDEA lets keep all data being processed in memory
lets treat whole dataset simply as a collection lets build functional API for processing 5 . 1
SPARK CORE CONCEPTS SPARK CORE CONCEPTS 6 . 1
RDD RDD Resilient Distributed Dataset 6 . 2
6 . 3
6 . 4
RDD FEATURES RDD FEATURES immutable lazy partitioned, location-aware & location-
transparancy persistence distributed, scalable in-memory fault-tolerant, lineage: child knows its parents functional api: declarative, typed 6 . 5
DAG DAG Directed Acyclic Graph 6 . 6
6 . 7
6 . 8
6 . 9
EXECUTION MODEL EXECUTION MODEL 6 . 10
6 . 11
DEPLOYMENT DEPLOYMENT 6 . 12
6 . 13
API API 6 . 14
6 . 15
COMPONENTS COMPONENTS 6 . 16
6 . 17
SPARK SQL & DATAFRAME SPARK SQL & DATAFRAME 7 .
1
7 . 2
7 . 3
SQL api, functional api, typed/untyped interactive, analytical interface, uni ed
programming model distributed, scalable code generation, out-of-the-box optimizations = catalyst engine memory & binary & compute optimizations = tungsten engine integration: multiple datasources, single representation, hive metastore 7 . 4
7 . 5
7 . 6
ECOSYSTEM & USECASES ECOSYSTEM & USECASES 8 . 1
8 . 2
DEMO DEMO spark-shell text le (rdd) load into memory lter,
map, group by reduce save show ui show plan, explain caching rdd -> dataframe 9 . 1
PLACE OF SPARK IN BIGDATA ECOSYSTEM PLACE OF SPARK IN
BIGDATA ECOSYSTEM 10 . 1
10 . 2
None
10 . 3
CALL TO ACTION CALL TO ACTION High Performance Spark -
Holden Karau install spark, run spark-shell, load text le, play with it http://learn.mapr.com/dev-360-apache-spark- essentials 11 . 1
12 . 1