Slide 1

Slide 1 text

sbt-emr-spark Run your Spark on AWS EMR with sbt

Slide 2

Slide 2 text

@pishen (ピシェン)

Slide 3

Slide 3 text

私たちは台湾から来ました!!

Slide 4

Slide 4 text

と言う Meetup Group の者です 一ヶ月に一回、20 人ぐらい gitter channel があります、Please call us if you come to 台北!

Slide 5

Slide 5 text

pishen / sbt-emr-spark お願いします!

Slide 6

Slide 6 text

どうしてこのツールを作ったの?

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

pip install awscli pip は何?食べ物ですか?

Slide 9

Slide 9 text

aws emr create-cluster --release-label emr-5.2.1 --name my-cluster --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.x large InstanceGroupType=CORE,InstanceCount=2,InstanceType= m3.xlarge --log-uri s3://my-bucket/log-dir/ --service-role EMR_DefaultRole --ec2-attributes SubnetId=subnet-xxxxxxxx,AdditionalMasterSecurityGroups=sg -xxxxxxxx,InstanceProfile=EMR_EC2_DefaultRole,AdditionalSl aveSecurityGroups=sg-xxxxxxxx sbt assembly aws s3 cp target/... s3://... aws emr add-steps ... 長い!面倒くさい!

Slide 10

Slide 10 text

Just use SBT and get things done!

Slide 11

Slide 11 text

resolvers += Resolver.bintrayIvyRepo("pishen", "sbt-plugins") addSbtPlugin("net.pishen" % "sbt-emr-spark" % "0.3.0") project/plugins.sbt

Slide 12

Slide 12 text

name := "sbt-emr-spark-test" scalaVersion := "2.11.8" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "2.0.2" % "provided" ) sparkAwsRegion := "ap-northeast-1" sparkSubnetId := Some("subnet-xxxxxxxx") sparkAdditionalSecurityGroupIds := Some(Seq("sg-xxxxxxxx")) sparkS3JarFolder := "s3://my-emr-bucket/my-emr-folder/" sparkInstanceCount := 2 build.sbt

Slide 13

Slide 13 text

package mypackage import org.apache.spark._ object Main { def main(args: Array[String]): Unit = { //setup spark val sc = new SparkContext(new SparkConf()) //your algorithm val n = 10000000 val count = sc.parallelize(1 to n).map { i => val x = scala.math.random val y = scala.math.random if (x * x + y * y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) } }

Slide 14

Slide 14 text

$ sbt > sparkCreateCluster > sparkSubmitJob

Slide 15

Slide 15 text

pishen / sbt-emr-spark Please see the README for more information.