Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Deploy your own Spark cluster in 4 minutes usin...
Search
Pishen Tsai
December 05, 2015
Programming
620
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Deploy your own Spark cluster in 4 minutes using sbt.
Pishen Tsai
December 05, 2015
More Decks by Pishen Tsai
See All by Pishen Tsai
Introduction to Minitime
pishen
1
170
都什麼時代了,你還在寫 while loop 嗎?
pishen
2
750
Pishen's Emacs Journey
pishen
0
160
Scala + Google Dataflow = Serverless Spark
pishen
6
870
Shapeless Introduction
pishen
2
920
ScalaKitchen
pishen
1
480
sbt-emr-spark
pishen
1
170
My Personal Report of Scala Kansai 2016
pishen
0
440
SBT Basic Concepts
pishen
1
670
Other Decks in Programming
See All in Programming
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
5.3k
Copilot CLI の継戦能力を高める コンテキスト管理
nozomutu
1
1.2k
ふつうのFeature Flag実践入門
irof
7
3.7k
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
3.6k
Lemonade + Foundry Toolkit でお手軽アプリ開発
seosoft
1
320
Observability in Practice:Grafana 與 Edge Device SRE 的那些事
blueswen
0
160
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
120
Javaの型とAI時代に型が大事な理由 / java types and type in AI era
kishida
2
120
過去最大のMCPアップデート! 2026-07-28 RC版の謎に迫る
licux
6
240
エージェンティックRAGにAWSで入門しよう!
har1101
8
1.4k
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
260
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
190
Featured
See All Featured
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.7k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
400
Building an army of robots
kneath
306
46k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
330
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.6k
The Spectacular Lies of Maps
axbom
PRO
1
800
Making Projects Easy
brettharned
120
6.7k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
62
44k
Testing 201, or: Great Expectations
jmmastey
46
8.2k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Transcript
Pishen Tsai @ KKBOX Deploy your own Spark cluster in
4 minutes using sbt
KKBOX / spark-deployer • SBT plugin. • Productively used in
KKBOX. • 100% Scala. https://github.com/KKBOX/spark-deployer
None
destroy cluster submit job create cluster write the code compile
& assembly
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
https://aws.amazon.com/elasticmapreduce/details/spark http://spark.apache.org/docs/latest/ec2-scripts.html spark-ec2: amazon emr:
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
spark-ec2 write the code compile & assembly submit job create
cluster destroy cluster sbt scp & ssh spark-ec2 spark-ec2
spark-ec2’s commands $ sbt assembly $ spark-ec2 -k awskey -i
~/.ssh/awskey.pem -r us-west-2 -z us-west-2a --vpc-id=vpc-a28d24c7 -- subnet-id=subnet-4eb27b39 -s 2 -t c4.xlarge -m m4.large --spark-version=1.5.2 --copy-aws- credentials launch my-spark-cluster $ scp -i ~/.ssh/awskey.pem target/scala-2.10 /my_job-assembly-0.1.jar root@<copy-master-ip- by-yourself>:~/job.jar $ ssh -i ~/.ssh/awskey.pem root@<master-ip> '. /spark/bin/spark-submit --class mypackage.Main --master spark://<master-ip>:7077 --executor- memory 6G job.jar arg0' $ spark-ec2 -r us-west-2 destroy my-spark- cluster
spark-ec2 write the code compile & assembly submit job create
cluster destroy cluster sbt spark-ec2 spark-ec2 scp & ssh make
spark-ec2’s bad parts Need to install sbt and spark-ec2. Need
to design and maintain Makefiles. Slow startup time (~20mins).
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
emr write the code compile & assembly submit job create
cluster destroy cluster sbt emr
emr’s commands $ sbt assembly $ aws emr create-cluster --name
my-spark-cluster --release-label emr-4.2.0 --instance-type m3. xlarge --instance-count 2 --applications Name=Spark --ec2-attributes KeyName=awskey --use- default-roles $ aws emr put --cluster-id j-2AXXXXXXGAPLF --key- pair-file ~/.ssh/mykey.pem --src target/scala- 2.10/my_job-assembly-0.1.jar --dest /home/hadoop/job.jar $ aws emr add-steps --cluster-id j-2AXXXXXXGAPLF --steps Type=Spark,Name=my-emr,ActionOnFailure= CONTINUE,Args=[--executor-memory,13G,--class, mypackage.Main,/home/hadoop/job.jar,arg0] $ aws emr terminate-clusters --cluster-id j-2AXX
emr write the code compile & assembly submit job create
cluster destroy cluster sbt emr make
emr’s bad parts Need to install sbt and emr. Need
to design and maintain Makefiles. Spark’s version is old. Restricted machine type.
Since sbt is a powerful build tool itself, why don’t
we let it handle all the dirty works for us?
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
spark-deployer write the code compile & assembly submit job create
cluster destroy cluster sbt
spark-deployer’s commands $ sbt "sparkCreateCluster 2" $ sbt "sparkSubmitJob arg0"
$ sbt "sparkDestroyCluster"
spark-deployer’s good parts Need to install only sbt. No Makefile.
Easy to use. Let you focus on your code. Fast and parallel startup (~4mins). Dynamic scale out. Flexible design.
How to use it?
Prerequisites • java • sbt • export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=...
http://www.scala-sbt.org/0.13/tutorial/Manual-Installation.html#Unix sbt installation
Demo
• Report issues. • Join our gitter channel. • Send
pull requests. https://github.com/KKBOX/spark-deployer Give it a try, and share! KKBOX / spark-deployer
Thank you Pishen Tsai @ KKBOX KKBOX / spark-deployer