Upgrade to Pro — share decks privately, control downloads, hide ads and more …

sbt-emr-spark

Pishen Tsai
February 26, 2017

 sbt-emr-spark

Pishen Tsai

February 26, 2017
Tweet

More Decks by Pishen Tsai

Other Decks in Programming

Transcript

  1. aws emr create-cluster --release-label emr-5.2.1 --name my-cluster --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.x large

    InstanceGroupType=CORE,InstanceCount=2,InstanceType= m3.xlarge --log-uri s3://my-bucket/log-dir/ --service-role EMR_DefaultRole --ec2-attributes SubnetId=subnet-xxxxxxxx,AdditionalMasterSecurityGroups=sg -xxxxxxxx,InstanceProfile=EMR_EC2_DefaultRole,AdditionalSl aveSecurityGroups=sg-xxxxxxxx sbt assembly aws s3 cp target/... s3://... aws emr add-steps ... 長い!面倒くさい!
  2. name := "sbt-emr-spark-test" scalaVersion := "2.11.8" libraryDependencies ++= Seq( "org.apache.spark"

    %% "spark-core" % "2.0.2" % "provided" ) sparkAwsRegion := "ap-northeast-1" sparkSubnetId := Some("subnet-xxxxxxxx") sparkAdditionalSecurityGroupIds := Some(Seq("sg-xxxxxxxx")) sparkS3JarFolder := "s3://my-emr-bucket/my-emr-folder/" sparkInstanceCount := 2 build.sbt
  3. package mypackage import org.apache.spark._ object Main { def main(args: Array[String]):

    Unit = { //setup spark val sc = new SparkContext(new SparkConf()) //your algorithm val n = 10000000 val count = sc.parallelize(1 to n).map { i => val x = scala.math.random val y = scala.math.random if (x * x + y * y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) } }