Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Spark Demo

Spark Demo

Noah Cornwell

July 16, 2013
Tweet

Other Decks in Technology

Transcript

  1. About Me • twitter: @noahcornwell • http://stackoverflow.com/users/47496/noah • https://github.com/ncornwell •

    https://github.com/BostonTechnologies • http://www.linkedin.com/in/noahcornwell Tuesday, July 16, 13
  2. What is it? • Cluster Computing Framework • Compatible with

    Hadoop Eco System • Accessible via Scala, Java, and Python • Fast! Tuesday, July 16, 13
  3. How it Works • Resilient Distributed Dataset (RDD) • Fault

    Tolerant • Partitioned across cluster • Immutable • Manipulate your RDD to easily Map Reduce your dataset val file = spark.textFile(“hdfs://…”) val counts = file.flatMap(line => line.split(” “)) .map(word => (word, 1)) .reduceByKey(_ + _) Tuesday, July 16, 13
  4. FX Market Data • Uses FIX protocol to send Data

    • 8=FIX.4.29=030235=X49=PRI_SITE56=BRIDGE_QA234=51695152=20130703-00:40:44262=MdRequestId_QIHkguAmN1268=2279=1269=0278=GBP/ USD-51903968861703854855=GBP/USD270=1.5145315=GBP271=200000282=na279=1269=1278=GBP/USD-51903968861823855255=GBP/ USD270=1.5145815=GBP271=200000282=na60=20130703-00:40:44.00910=044 • Not as fast as equities but trades 24/5 • Internally we use Scribe to aggregate logs Tuesday, July 16, 13
  5. Our Setup • 3 Slaves + 1 Master • 32GB

    RAM per Server • Dual 8-Core Xeon’s (32 cores with HT) • 300GB Raid 10 • 30GB of log data in HDFS replicated on all nodes Tuesday, July 16, 13
  6. Resources • Spark Presentations from Scala Days: • http://www.parleys.com/play/51c2df18e4b0ed877035684d/chapter0/ about

    • http://www.parleys.com/play/51c37a11e4b0ed8770356864/chapter0/ about • http://spark-project.org/ • http://www.mlbase.org/ • Shark - Hive on Spark https://github.com/amplab/shark Tuesday, July 16, 13
  7. Code from this talk • Spark example: https://github.com/ncornwell/spark/blob/master/examples/ src/main/scala/spark/examples/SparkFX.scala •

    Streaming example: https://github.com/ncornwell/spark/blob/master/ examples/src/main/scala/spark/streaming/examples/ FXLogisticRegression.scala Tuesday, July 16, 13