Deploy your own Spark cluster in 4 minutes using sbt.

Pishen Tsai @ KKBOX Deploy your own Spark cluster in
4 minutes using sbt

KKBOX / spark-deployer • SBT plugin. • Productively used in
KKBOX. • 100% Scala. https://github.com/KKBOX/spark-deployer

destroy cluster submit job create cluster write the code compile
& assembly

• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
https://aws.amazon.com/elasticmapreduce/details/spark http://spark.apache.org/docs/latest/ec2-scripts.html spark-ec2: amazon emr:

spark-ec2 write the code compile & assembly submit job create
cluster destroy cluster sbt scp & ssh spark-ec2 spark-ec2

spark-ec2’s commands $ sbt assembly $ spark-ec2 -k awskey -i
~/.ssh/awskey.pem -r us-west-2 -z us-west-2a --vpc-id=vpc-a28d24c7 -- subnet-id=subnet-4eb27b39 -s 2 -t c4.xlarge -m m4.large --spark-version=1.5.2 --copy-aws- credentials launch my-spark-cluster $ scp -i ~/.ssh/awskey.pem target/scala-2.10 /my_job-assembly-0.1.jar root@<copy-master-ip- by-yourself>:~/job.jar $ ssh -i ~/.ssh/awskey.pem root@<master-ip> '. /spark/bin/spark-submit --class mypackage.Main --master spark://<master-ip>:7077 --executor- memory 6G job.jar arg0' $ spark-ec2 -r us-west-2 destroy my-spark- cluster

spark-ec2 write the code compile & assembly submit job create
cluster destroy cluster sbt spark-ec2 spark-ec2 scp & ssh make

spark-ec2’s bad parts Need to install sbt and spark-ec2. Need
to design and maintain Makefiles. Slow startup time (~20mins).

emr write the code compile & assembly submit job create
cluster destroy cluster sbt emr

emr’s commands $ sbt assembly $ aws emr create-cluster --name
my-spark-cluster --release-label emr-4.2.0 --instance-type m3. xlarge --instance-count 2 --applications Name=Spark --ec2-attributes KeyName=awskey --use- default-roles $ aws emr put --cluster-id j-2AXXXXXXGAPLF --key- pair-file ~/.ssh/mykey.pem --src target/scala- 2.10/my_job-assembly-0.1.jar --dest /home/hadoop/job.jar $ aws emr add-steps --cluster-id j-2AXXXXXXGAPLF --steps Type=Spark,Name=my-emr,ActionOnFailure= CONTINUE,Args=[--executor-memory,13G,--class, mypackage.Main,/home/hadoop/job.jar,arg0] $ aws emr terminate-clusters --cluster-id j-2AXX

emr write the code compile & assembly submit job create
cluster destroy cluster sbt emr make

emr’s bad parts Need to install sbt and emr. Need
to design and maintain Makefiles. Spark’s version is old. Restricted machine type.

Since sbt is a powerful build tool itself, why don’t
we let it handle all the dirty works for us?

spark-deployer write the code compile & assembly submit job create
cluster destroy cluster sbt

spark-deployer’s commands $ sbt "sparkCreateCluster 2" $ sbt "sparkSubmitJob arg0"
$ sbt "sparkDestroyCluster"

spark-deployer’s good parts Need to install only sbt. No Makefile.
Easy to use. Let you focus on your code. Fast and parallel startup (~4mins). Dynamic scale out. Flexible design.

How to use it?

Prerequisites • java • sbt • export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=...
http://www.scala-sbt.org/0.13/tutorial/Manual-Installation.html#Unix sbt installation

• Report issues. • Join our gitter channel. • Send
pull requests. https://github.com/KKBOX/spark-deployer Give it a try, and share! KKBOX / spark-deployer

Thank you Pishen Tsai @ KKBOX KKBOX / spark-deployer

Deploy your own Spark cluster in 4 minutes usin...

Deploy your own Spark cluster in 4 minutes using sbt.

Pishen Tsai

More Decks by Pishen Tsai

Other Decks in Programming

Featured

Transcript

Pishen Tsai @ KKBOX Deploy your own Spark cluster in

KKBOX / spark-deployer • SBT plugin. • Productively used in

destroy cluster submit job create cluster write the code compile

• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions

• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions

spark-ec2 write the code compile & assembly submit job create

spark-ec2’s commands $ sbt assembly $ spark-ec2 -k awskey -i

spark-ec2 write the code compile & assembly submit job create

spark-ec2’s bad parts Need to install sbt and spark-ec2. Need

• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions

emr write the code compile & assembly submit job create

emr’s commands $ sbt assembly $ aws emr create-cluster --name

emr write the code compile & assembly submit job create

emr’s bad parts Need to install sbt and emr. Need

Since sbt is a powerful build tool itself, why don’t

• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions

spark-deployer write the code compile & assembly submit job create

spark-deployer’s commands $ sbt "sparkCreateCluster 2" $ sbt "sparkSubmitJob arg0"

spark-deployer’s good parts Need to install only sbt. No Makefile.

How to use it?

Prerequisites • java • sbt • export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=...

Demo

• Report issues. • Join our gitter channel. • Send

Thank you Pishen Tsai @ KKBOX KKBOX / spark-deployer