Spark in Action - Overview

Dulitha Wijewantha (Chan)

May 14, 2014

79

Spark in Action - Overview

An overview of spark

Dulitha Wijewantha (Chan)

May 14, 2014

Tweet

More Decks by Dulitha Wijewantha (Chan)

See All by Dulitha Wijewantha (Chan)

Tool Belt for JavaScript App Development

2

300

Git - simple overview and architecture

5

1.8k

Managing Enterprise Mobile Devices and Delivering Enterprise Mobile Applications

0

75

Introduction to Jaggery.js

0

2k

Non-blocking IO & Event Loop

0

130

Other Decks in Technology

See All in Technology

AI によるドキュメント処理を加速するためのOCR 結果の永続化と再利用戦略

0

310

[TechNight #91] Oracle Database 最新パフォーマンス分析手法

oracle4engineer

PRO

4

340

【CEDEC2025】『Shadowverse: Worlds Beyond』二度目のDCG開発でゲームをリデザインする～遊びやすさと競技性の両立～

PRO

1

240

SAE J1939シミュレーション環境構築

1

210

VLMサービスを用いた請求書データ化検証 / SaaSxML_Session_1

0

190

経験がないことを言い訳にしない、 AI時代の他領域への染み出し方

0

290

From Live Coding to Vibe Coding with Firebase Studio

firebasethailand

1

400

隙間時間で爆速開発！ Claude Code × Vibe Coding で作るマニュアル自動生成サービス

3

250

反脆弱性(アンチフラジャイル)とデータ基盤構築

2

140

テキストからの実世界知能の実現に向けて

0

120

AIエージェントを支える設計

12

2.9k

データエンジニアがクラシルでやりたいことの現在地

3

810

Featured

See All Featured

The Web Performance Landscape in 2024 [PerfNow 2024]

8

730

Building a Modern Day  E-commerce SEO Strategy

42

7.4k

Fantastic passwords and where to find them - at NoRuKo

51

3.4k

Learning to Love Humans: Emotional Interface Design

273

40k

How To Stay Up To Date on Web Technology

790

250k

Cheating the UX When There Is Nothing More to Optimize - PixelPioneers

stephaniewalter

283

13k

How STYLIGHT went responsive

100

5.7k

Typedesign – Prime Four

42

2.7k

GraphQLとの向き合い方2022年版

49

14k

No one is an island. Learnings from fostering a developers community.

21

3.4k

Chrome DevTools: State of the Union 2024 - Debugging React & Beyond

7

770

Optimising Largest Contentful Paint

37

3.4k

Transcript

{ Spark Cluster computing with working sets Dulitha Wijewantha  @dulitharw
Spark aims to solve the main use-‐‑cases of Ñ
Iterative jobs – Machine learning algorithms Ñ Iterative analytics Hadoop is slow when it comes to performs operations multiple times since each time it will come up as another MapReduce job. Introduction
Ñ Spark works on Resilient Distributed Data sets – an
abstraction over data objects Ñ Spark is implemented in Scala Ñ Many distributed operators are available [count, collect, first] Ñ 10x faster than Hadoop in iterative machine learning Ñ Sub-‐‑second latency to scan a 39GB dataset Introduction
Ñ Represented by a Scala object Ñ Can be
created by files in file system [HDFS], transforming an existing RDD etc. Ñ Cacheable Ñ Tracks the lineage (how it was built) – this allows Spark to rebuild a lost RDD Resilient Distributed Dataset
Ñ Broadcast variables -‐‑ sent to worker node once
Ñ Accumulators – only operation available is add. Available at the master node (driver program) Shared Variables
Au Revoir