Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How It Works - Spark
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Yuri Ostapchuk
September 13, 2021
Programming
30
0
Share
How It Works - Spark
A series of talks on data engineering
Yuri Ostapchuk
September 13, 2021
More Decks by Yuri Ostapchuk
See All by Yuri Ostapchuk
Detecting person's direction of interest
twist522
0
28
Hedera fundamentals course
twist522
0
28
Sweet.tv - hackathon 2020 - movie recommendations by emotion
twist522
0
16
How It Works - Kafka
twist522
0
53
Spark: From Interactivity To Production (And Back)
twist522
0
28
What Is Data Engineering
twist522
0
44
What Is Big Data
twist522
0
33
How I Learned To Stop Worrying And Love LSP (And Metals)
twist522
0
40
How It Works - Hadoop
twist522
0
33
Other Decks in Programming
See All in Programming
AIとASP.NET Coreで雑Webアプリを作った話
mayuki
0
170
運用エージェントは "作る" から "育てる" へ - 記憶と自己進化の3層設計パターン / self-evolving-agents-three-layer-agent-design
gawa
12
3.4k
Migrations : C'est une question d'hygiène !
vinceamstoutz
0
3.2k
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.1k
Technical Debt: Understanding it Rightly, Engaging it Rightly #LaravelLiveJP
shogogg
0
190
ReactとSvelteのその先、Ripple-TS / Beyond React and Svelte: Ripple-TS
ssssota
3
2k
AI駆動開発勉強会 広島支部 第一回勉強会 AI駆動開発概要とワークショップ
hayatoshimiu
0
430
New "Type" system on PicoRuby
pocke
1
430
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
170
ビジネスモデルから紐解く、AI+型駆動開発
hirokiomote
2
5.2k
Oxcを導入して開発体験が向上した話
yug1224
4
280
プラグインで拡張される Context をtype-safe にする難しさと設計判断
kazupon
2
560
Featured
See All Featured
GraphQLの誤解/rethinking-graphql
sonatard
75
12k
Agile that works and the tools we love
rasmusluckow
331
21k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
590
Making Projects Easy
brettharned
120
6.7k
Rails Girls Zürich Keynote
gr2m
96
14k
Designing Powerful Visuals for Engaging Learning
tmiket
1
390
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
300
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
150
Statistics for Hackers
jakevdp
799
230k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
960
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
720
Transcript
HOW IT WORKS: HOW IT WORKS: SPARK SPARK 1
PLAN PLAN Hadoop weakpoints Spark core ideas & concepts Applications
& Ecosystem Demo 2 . 1
RECAP: HADOOP & MAPREDUCE RECAP: HADOOP & MAPREDUCE 3 .
1
PROBLEM: HADOOP WEAKPOINTS PROBLEM: HADOOP WEAKPOINTS slow intermediate results are
saved to disk complex imperative style, too verbose APIs, not- available to regular humans 4 . 1
IDEA IDEA lets keep all data being processed in memory
lets treat whole dataset simply as a collection lets build functional API for processing 5 . 1
SPARK CORE CONCEPTS SPARK CORE CONCEPTS 6 . 1
RDD RDD Resilient Distributed Dataset 6 . 2
6 . 3
6 . 4
RDD FEATURES RDD FEATURES immutable lazy partitioned, location-aware & location-
transparancy persistence distributed, scalable in-memory fault-tolerant, lineage: child knows its parents functional api: declarative, typed 6 . 5
DAG DAG Directed Acyclic Graph 6 . 6
6 . 7
6 . 8
6 . 9
EXECUTION MODEL EXECUTION MODEL 6 . 10
6 . 11
DEPLOYMENT DEPLOYMENT 6 . 12
6 . 13
API API 6 . 14
6 . 15
COMPONENTS COMPONENTS 6 . 16
6 . 17
SPARK SQL & DATAFRAME SPARK SQL & DATAFRAME 7 .
1
7 . 2
7 . 3
SQL api, functional api, typed/untyped interactive, analytical interface, uni ed
programming model distributed, scalable code generation, out-of-the-box optimizations = catalyst engine memory & binary & compute optimizations = tungsten engine integration: multiple datasources, single representation, hive metastore 7 . 4
7 . 5
7 . 6
ECOSYSTEM & USECASES ECOSYSTEM & USECASES 8 . 1
8 . 2
DEMO DEMO spark-shell text le (rdd) load into memory lter,
map, group by reduce save show ui show plan, explain caching rdd -> dataframe 9 . 1
PLACE OF SPARK IN BIGDATA ECOSYSTEM PLACE OF SPARK IN
BIGDATA ECOSYSTEM 10 . 1
10 . 2
None
10 . 3
CALL TO ACTION CALL TO ACTION High Performance Spark -
Holden Karau install spark, run spark-shell, load text le, play with it http://learn.mapr.com/dev-360-apache-spark- essentials 11 . 1
12 . 1