Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Spark Demo

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Spark Demo

Avatar for Noah Cornwell

Noah Cornwell

July 16, 2013
Tweet

Other Decks in Technology

Transcript

  1. About Me • twitter: @noahcornwell • http://stackoverflow.com/users/47496/noah • https://github.com/ncornwell •

    https://github.com/BostonTechnologies • http://www.linkedin.com/in/noahcornwell Tuesday, July 16, 13
  2. What is it? • Cluster Computing Framework • Compatible with

    Hadoop Eco System • Accessible via Scala, Java, and Python • Fast! Tuesday, July 16, 13
  3. How it Works • Resilient Distributed Dataset (RDD) • Fault

    Tolerant • Partitioned across cluster • Immutable • Manipulate your RDD to easily Map Reduce your dataset val file = spark.textFile(“hdfs://…”) val counts = file.flatMap(line => line.split(” “)) .map(word => (word, 1)) .reduceByKey(_ + _) Tuesday, July 16, 13
  4. FX Market Data • Uses FIX protocol to send Data

    • 8=FIX.4.29=030235=X49=PRI_SITE56=BRIDGE_QA234=51695152=20130703-00:40:44262=MdRequestId_QIHkguAmN1268=2279=1269=0278=GBP/ USD-51903968861703854855=GBP/USD270=1.5145315=GBP271=200000282=na279=1269=1278=GBP/USD-51903968861823855255=GBP/ USD270=1.5145815=GBP271=200000282=na60=20130703-00:40:44.00910=044 • Not as fast as equities but trades 24/5 • Internally we use Scribe to aggregate logs Tuesday, July 16, 13
  5. Our Setup • 3 Slaves + 1 Master • 32GB

    RAM per Server • Dual 8-Core Xeon’s (32 cores with HT) • 300GB Raid 10 • 30GB of log data in HDFS replicated on all nodes Tuesday, July 16, 13
  6. Resources • Spark Presentations from Scala Days: • http://www.parleys.com/play/51c2df18e4b0ed877035684d/chapter0/ about

    • http://www.parleys.com/play/51c37a11e4b0ed8770356864/chapter0/ about • http://spark-project.org/ • http://www.mlbase.org/ • Shark - Hive on Spark https://github.com/amplab/shark Tuesday, July 16, 13
  7. Code from this talk • Spark example: https://github.com/ncornwell/spark/blob/master/examples/ src/main/scala/spark/examples/SparkFX.scala •

    Streaming example: https://github.com/ncornwell/spark/blob/master/ examples/src/main/scala/spark/streaming/examples/ FXLogisticRegression.scala Tuesday, July 16, 13