Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How It Works - Spark
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Yuri Ostapchuk
September 13, 2021
Programming
30
0
Share
How It Works - Spark
A series of talks on data engineering
Yuri Ostapchuk
September 13, 2021
More Decks by Yuri Ostapchuk
See All by Yuri Ostapchuk
Detecting person's direction of interest
twist522
0
27
Hedera fundamentals course
twist522
0
21
Sweet.tv - hackathon 2020 - movie recommendations by emotion
twist522
0
13
How It Works - Kafka
twist522
0
52
Spark: From Interactivity To Production (And Back)
twist522
0
27
What Is Data Engineering
twist522
0
42
What Is Big Data
twist522
0
31
How I Learned To Stop Worrying And Love LSP (And Metals)
twist522
0
38
How It Works - Hadoop
twist522
0
32
Other Decks in Programming
See All in Programming
JAWS-UG横浜 #100 祝・第100回スペシャルAWS は VPC レスの時代へ
maroon1st
0
150
運転動画を検索可能にする〜Cosmos-Embed1とDatabricks Vector Searchで〜/cosmos-embed1-databricks-vector-search
studio_graph
0
220
2026_04_15_量子計算をパズルとして解く
hideakitakechi
0
110
アーキテクチャモダナイゼーションとは何か
nwiizo
19
5.3k
PHPで TLSのプロトコルを実装してみるをもう一度しゃべりたい
higaki_program
0
210
煩雑なSkills管理をSoC(関心の分離)により解決する――関心を分離し、プロンプトを部品として育てるためのOSSを作った話 / Solving Complex Skills Management Through SoC (Separation of Concerns)
nrslib
4
950
Running Swift without an OS
kishikawakatsumi
0
840
ローカルで稼働するAI エージェントを超えて / beyond-local-ai-agents
gawa
3
280
感情を設計する
ichimichi
5
1.5k
アクセシビリティ試験の"その後"を仕組み化する
yuuumiravy
0
150
Kingdom of the Machine
yui_knk
2
340
ハンズオンで学ぶクラウドネイティブ
tatsukiminami
0
130
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.7k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
120
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
450
Code Review Best Practice
trishagee
74
20k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
260
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Building Adaptive Systems
keathley
44
3k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
170
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Typedesign – Prime Four
hannesfritz
42
3k
Transcript
HOW IT WORKS: HOW IT WORKS: SPARK SPARK 1
PLAN PLAN Hadoop weakpoints Spark core ideas & concepts Applications
& Ecosystem Demo 2 . 1
RECAP: HADOOP & MAPREDUCE RECAP: HADOOP & MAPREDUCE 3 .
1
PROBLEM: HADOOP WEAKPOINTS PROBLEM: HADOOP WEAKPOINTS slow intermediate results are
saved to disk complex imperative style, too verbose APIs, not- available to regular humans 4 . 1
IDEA IDEA lets keep all data being processed in memory
lets treat whole dataset simply as a collection lets build functional API for processing 5 . 1
SPARK CORE CONCEPTS SPARK CORE CONCEPTS 6 . 1
RDD RDD Resilient Distributed Dataset 6 . 2
6 . 3
6 . 4
RDD FEATURES RDD FEATURES immutable lazy partitioned, location-aware & location-
transparancy persistence distributed, scalable in-memory fault-tolerant, lineage: child knows its parents functional api: declarative, typed 6 . 5
DAG DAG Directed Acyclic Graph 6 . 6
6 . 7
6 . 8
6 . 9
EXECUTION MODEL EXECUTION MODEL 6 . 10
6 . 11
DEPLOYMENT DEPLOYMENT 6 . 12
6 . 13
API API 6 . 14
6 . 15
COMPONENTS COMPONENTS 6 . 16
6 . 17
SPARK SQL & DATAFRAME SPARK SQL & DATAFRAME 7 .
1
7 . 2
7 . 3
SQL api, functional api, typed/untyped interactive, analytical interface, uni ed
programming model distributed, scalable code generation, out-of-the-box optimizations = catalyst engine memory & binary & compute optimizations = tungsten engine integration: multiple datasources, single representation, hive metastore 7 . 4
7 . 5
7 . 6
ECOSYSTEM & USECASES ECOSYSTEM & USECASES 8 . 1
8 . 2
DEMO DEMO spark-shell text le (rdd) load into memory lter,
map, group by reduce save show ui show plan, explain caching rdd -> dataframe 9 . 1
PLACE OF SPARK IN BIGDATA ECOSYSTEM PLACE OF SPARK IN
BIGDATA ECOSYSTEM 10 . 1
10 . 2
None
10 . 3
CALL TO ACTION CALL TO ACTION High Performance Spark -
Holden Karau install spark, run spark-shell, load text le, play with it http://learn.mapr.com/dev-360-apache-spark- essentials 11 . 1
12 . 1