Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Spark streaming HadoopCon 2016

906495aee953d1a6dc3d661d28da0081?s=47 Erica Li
September 12, 2016

Spark streaming HadoopCon 2016

Presented by Mark Yang

906495aee953d1a6dc3d661d28da0081?s=128

Erica Li

September 12, 2016
Tweet

Transcript

  1. NEAR REAL TIME

  2. About Me • Mark Yang 楊擇中 • Taiwan Spark User

    Group(台北) 共同創辦人 • scala /akka /spark • Boundless Cloud 大千雲端 技術經理
  3. About Boundless Cloud

  4. 員工福利

  5. 本片開始

  6. What is Spark Streaming

  7. What is Spark Streaming 可擴展 高吞吐 容錯

  8. Discretized Streams DStream

  9. What is DStream • 連續的資料流 • 連續的RDD序列

  10. Simple Example Word Count

  11. Demo

  12. Operations on DStreams • Transformation ◦ Stateless ◦ Stateful ◦

    Window • Output
  13. Stateless

  14. Stateful- updateByKey val pairs = ... val wordCounts = pairs.updateStateByKey[Int](updateFunction

    _) t-1 t t+1 t+2 t+3 pairs wordCounts ....
  15. CheckPointing • Metadata ◦ Configuration ◦ DStream operations ◦ Incomplete

    batches • Data
  16. Demo

  17. Window ❏ window length = 3 ❏ sliding interval =

    2
  18. Window Transformations • window (windowLength, slideInterval) • countByWindow (windowLength,slideInterval) •

    reduceByWindow (func, windowLength,slideInterval) • reduceByKeyAndWindow (func,windowLength, slideInterval, [numTasks]) • reduceByKeyAndWindow (func, invFunc,windowLength, slideInterval, [numTasks]) • countByValueAndWindow (windowLength,slideInterval, [numTasks])
  19. Demo

  20. Output • print () • saveAsTextFiles (prefix, [suffix]) • saveAsObjectFiles

    (prefix, [suffix]) • saveAsHadoopFiles (prefix, [suffix]) • foreachRDD (func)
  21. foreachRDD-Design Pattern dstream.foreachRDD { rdd => val connection = createNewConnection()

    rdd.foreach { record => connection.send(record) } } 1
  22. foreachRDD-Design Pattern dstream.foreachRDD { rdd => rdd.foreachPartition { partitionOfRecords =>

    val connection = createNewConnection() partitionOfRecords.foreach(record => connection.send(record)) connection.close() } } 2
  23. foreachRDD-Design Pattern dstream.foreachRDD { rdd => rdd.foreachPartition { partitionOfRecords =>

    // ConnectionPool is a static, lazily initialized pool of connections val connection = ConnectionPool.getConnection() partitionOfRecords.foreach(record => connection.send(record)) ConnectionPool.returnConnection(connection) // return to the pool for future reuse } } 3
  24. DStream+RDD 與MLlib及Dataframe交互操作 • MLlib: Streaming Linear Regression, Streaming KMeans ...等

    • Dataframe
  25. Example DStream+DataFrame words.foreachRDD { rdd => val sparkSession = //

    SQLContext.getOrCreate(rdd.sparkContext) //spark 1.x SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate() //spark 2.0 import sparkSession.implicits._ val wordsDataFrame = rdd.toDF("word") wordsDataFrame // .registerTempTable("words") //spark 1.x .createOrReplaceTempView("words") //spark 2.0 val wordCountsDataFrame = sparkSession.sql("select word, count(*) as total from words group by word") wordCountsDataFrame.show() }
  26. Thank You

  27. Receivers

  28. Spark Streaming Custom Receivers

  29. Kafka Cluster 什麼是Kafka Producer & Consumer producer producer producer consumer

    consumer consumer
  30. Offset 什麼是Kafka Topic & Partition & Offset

  31. 什麼是Kafka Broker

  32. 什麼是 Kafka • Producer • Consumer • Broker • Topic

    • Partition
  33. 開始使用 Kafka • 啟動Zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties • 啟動Kafka Server bin/kafka-server-start.sh

    config/server.properties
  34. • 創建Topic bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1

    --topic test • 取得Topic列表 bin/kafka-topics.sh --list --zookeeper localhost:2181 開始使用 Kafka
  35. • 發送消息 bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test • 接收消息 bin/kafka-console-consumer.sh

    --zookeeper localhost:2181 --topic test --from-beginning 開始使用 Kafka
  36. Kafka+Spark Streaming • Receiver-based

  37. Kafka+Spark Streaming • Direct

  38. Demo

  39. Thank You