デブサミ2018 Apache Kafkaの話

17d4ef53b432ebf7c566fd6a11345570?s=47 yuuki takezawa
February 02, 2018

デブサミ2018 Apache Kafkaの話

Developers Summit2018でのスライド資料です
"Apache Kafkaによるスケーラブル アプリケーション開発"

17d4ef53b432ebf7c566fd6a11345570?s=128

yuuki takezawa

February 02, 2018
Tweet

Transcript

  1. 2.

    Agenda • What Apache Kafka • Kafka Connect / Kafka

    Streams • Kappa Architecture • ΞϓϦέʔγϣϯͰ׆༻
  2. 6.

    Apache Kafka • ZookeeperΛར༻ͨ͠ΫϥελϦϯάʹΑΔߴՄ༻ੑ • ϝοηʔδͷӬଓԽɺϨϓϦέʔγϣϯɺ࠶औಘՄ • ϏοάσʔλରԠ • ϑΝΠϧγεςϜར༻Ͱɺ


    γʔέϯγϟϧΞΫηεʹΑΔߴ଎Խ • ετϦʔϜରԠͷϝοηʔδϯάϛυϧ΢ΣΞ • Kafka ConnectʹΑΔपลγεςϜͱͷߴ͍਌࿨ੑ
 (Amazon kinesisͱ΄΅ಉ͡)
  3. 8.

    Apache Kafka֓ཁ • Producer 
 ϝοηʔδ഑৴Λߦ͏
 ֤ݴޠͷΫϥΠΞϯτϥΠϒϥϦΛར༻ • Consumer 


    ϝοηʔδߪಡΛߦ͏
 ফඅ͞Εͨϝοηʔδ͸ഁغ͞Εͣɺ
 Ұఆظؒอ؅͞ΕΔ • Broker
 KafkaຊମͰɺProducerɺConsumerؒͷΩϡʔ
  4. 9.

    Apache Kafka֓ཁ • Topic 
 Producer͔Βͷϝοηʔδ͸͜ͷTopicʹ֨ೲ͞ΕΔ
 ϝοηʔδ͸Ұҙʹ؅ཧɺFIFO(ޙड़partition)Ͱॲཧ • Partition
 ෛՙ෼ࢄ༻్ʹར༻


    ෳ਺ͷConsumer͕ͦΕͧΕͷPartitionΛࢀর͠ɺ
 ͦΕͧΕ͕ॲཧΛߦ͏
 ॲཧϑϩʔͷσβΠϯʹΑͬͯଟ༷ͳར༻ํ๏
  5. 12.

    import ( "github.com/confluentinc/confluent-kafka-go/kafka" ) func (kc *KafkaClient) PrepareProducer() { p,

    err := kafka.NewProducer(&kafka.ConfigMap{ "bootstrap.servers": *kc.BootstrapServers, "broker.address.family": "v4", "queue.buffering.max.messages": "1000", "client.id": "testingClient", }) // লུ } ઀ଓ͢Δ,BGLBΛࢦఆ
  6. 13.

    Kafka Connect • Kafka Connectͱ͸ɺ
 पลγεςϜ͔ΒͷσʔλΛऔΓࠐΈ(Source)ɺ
 σʔλૹ৴(Sink)ͷೋछྨΛαϙʔτ͢Δػೳ • Amazon SQS΍MongoDBͷσʔλΛKafkaͰऔࠐΉɺ


    ϝοηʔδΛͦͷ··Elasticsearch΍RDBMSʹ֨ೲɺ
 ͕ߦ͑Δ • Connect͸ࣗ༝ʹ֦ுͯ͠ಠࣗConnectΛ࣮૷Մೳ
 (java, Scala)
  7. 18.

    $ confluent load elasticsearch-sink -d \ /etc/kafka-connect-elasticsearch/sample-es.properties $ connect-standalone \

    -daemon /etc/schema-registry/elasticsearch-connect.properties \ /etc/kafka-connect-elasticsearch/sample-es.properties DPOOFDUΛొ࿥ సૹ͕։࢝͞Ε·͢
  8. 19.
  9. 24.

    Kafka Streams • KStream / KTable • KStream͸ετϦʔϜͰྲྀΕͯ͘Δσʔλ͕KVͰͦͷ ··ྲྀΕͯ͘Δ •

    KStream͸γϯϓϧʹfilterॲཧ΍ɺஔ͖׵͑༻్
 ϩά΍ɺπΠʔτɺλΠϜϥΠϯతͳ΋ͷʹ
  10. 26.

    def main(args: Array[String]) { println("Kafka Streams Sample.") val config: Properties

    = { val prop = new Properties() prop.load(new java.io.FileInputStream(args(0).toString)) prop.put(StreamsConfig.APPLICATION_ID_CONFIG, prop.getProperty("sample.app_id")) prop.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, prop.getProperty("sample.bootstrap.servers")) prop.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass) prop.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass) // exactly once prop.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE) prop } } LBGLB4USFBNͰ࣮ߦ͞ΕΔΔEFG αʔϏεશମͰಉ͡BQQ@JE͸࢖Θͳ͍ ઀ଓ͢ΔTFSWFSΛࢦఆ
  11. 27.

    val ft: String = config.getProperty("sample.stream.topic") val tt: String = config.getProperty("sample.streamed.topic")

    println("stream topic: from " + ft) println("to " + tt) val stringSerde: Serde[String] = Serdes.String() val builder: StreamsBuilder = new StreamsBuilder val rawStream: KStream[String, String] = builder.stream(ft) val mapStream: KStream[String, String] = rawStream.mapValues(new ValueMapper[String, String] { override def apply(value: String): String = new ElementAppender(value).append() }) mapStream.to(stringSerde, stringSerde, tt) val streams: KafkaStreams = new KafkaStreams(builder.build(), config) streams.start() 4USFBNॲཧର৅UPQJD 4USFBNॲཧޙʹ֨ೲ͢ΔUPQD ,4USFBNͰॲཧ σʔλͷஔ͖׵͑ ॲཧޙʹUPQJD֨ೲΛߦ͏ 4USFBNॲཧ໋ྩܧଓతʹ࣮ߦ͞ΕΔ
  12. 30.

    ϝοηʔδ఻ୡͷอূ • At least once semantics
 ॏෳΛڐՄ • At most

    once semantics
 ܽଛΛڐՄ • Exactly once semantics
 ॏෳɺ͓ΑͼܽଛΛڐՄ͠ͳ͍
 
 "Exactly-once Semantics are Possible: Here’s How Kafka Does it". Confluent APACHE KAFKA. 
 https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ ࢀর
  13. 37.

    എܠ • େ͖͘ෳࡶԽ͢ΔΞϓϦέʔγϣϯͷ੹຿
 -> ݶΒΕͨػೳͷΈఏڙ͢Δ 
 -> ϚΠΫϩαʔϏεΑΓʹ͢Δඞཁ͕͋ͬͨ • ೗Կʹεέʔϧ͍͔ͤͯ͘͞


    -> ͍͍ϋʔυ΢ΣΞͰؤுΔʹ΋ݶք͕͋Δ • ߴՄ༻ɾো֐଱ੑ
 -> σʔλଛࣦ୲อɺϦτϥΠͳͲ͕Մೳͳ΋ͷ͕ʂ
  14. 38.

    ௨஌ػೳʹಋೖ • ͳʹͳʹ͞Μ͕Ͳ͜ͰԿ͠·ͨ͠
 ԿϙΠϯτ֫ಘ͠·ͨ͠ʂ etc • αʔϏεԣஅͰ࢖ΘΕΔ • ਺100ສ෼ͷEvent (ଟ͍ͱ͖)

    • σʔλՃ޻ʹRDBMSͱ૊Έ߹ΘͤΔඞཁ͕͋Γɺ
 αʔϏεʹΑͬͯ͸ෳࡶʹͳΔ΋ͷ΋͋ΔͨΊɺ
 Consumerͷෛՙ෼ࢄΛ༰қʹ͍ͨ͠ • ϝοηʔδଛࣦΛͳΔ΂͘๷͍͗ͨ
  15. 41.
  16. 44.

    $ confluent load named-kafka-connect-rabbitmq -d \ /etc/kafka-connect-rabbitmq/rabbitmq-source-connect.properties $ connect-standalone \

    -daemon /etc/schema-registry/connect-json-standalone.properties \ /etc/kafka-connect-rabbitmq/rabbitmq-source-connect.properties
  17. 49.

    Lambda Architecture • όον૚ɺαʔϏε૚ɺεϐʔυ૚Ͱߏ੒ • όον૚͸ɺେ͖ͳσʔλͷूܭ΍ɺେྔσʔλͷ෼ੳͳ ͲΛ୲౰͢Δ -> Hadoop(MapReduce), Spark

    • αʔϏε૚͸όον૚ͷू໿݁ՌΛఏڙ͢Δ
 Hive, HBase, ElephantDB, Splout SQL, pipelineDB… • εϐʔυ૚͸ϦΞϧλΠϜॲཧͷ݁ՌΛఏڙ͢Δ૚
 Spark, Storm, Kafka, Cassandra etc.. • αʔϏε૚ͱεϐʔυ૚ͷ྆ํͷ஋ΛϚʔδͯ͠ฦ٫
 αʔϏεʹΑͬͯ͸೉қ౓͕ߴ͍ɾɾʂ
  18. 54.

    def main(args: Array[String]) { val kafkaParams = Map[String, Object]( "bootstrap.servers"

    -> "localhost:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "kafka_builderscon_stream", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean) ) // val spark = SparkSession .builder .master("local[*]") .appName("buildersconSmaple") .config("spark.cassandra.connection.host", "192.168.10.10") .getOrCreate() val streamingContext = new StreamingContext(spark.sparkContext, Seconds(5)) LBGLB4USFBNͰ࣮ߦ͞ΕΔΔEFG ,BGLB઀ଓ৘ใ ࠷৽ͷྲྀΕͯདྷͨσʔλΛ࢖ͬͯৗʹॲཧ 4QBSL"QQ໊ 4USFBNJOHॲཧ݁ՌΛ$BTTBOESBʹ ඵ͓͖ʹશσʔλΛॻ͖ࠐΉ
  19. 55.

    streamingContext.checkpoint("/tmp/") val topics = Array("message-topic") val stream = KafkaUtils.createDirectStream[String, String](

    streamingContext, PreferConsistent, Subscribe[String, String](topics, kafkaParams) ) val pairs = stream.map(record => (record.value, 1)) val count = pairs.updateStateByKey(updateFunc) ॲཧσʔλͷܦաΛॻ͖ࠐΉ ࢮΜͰ΋͔͜͜Β΍Γ௚͢ ,BGLB5PQJDͷσʔλΛ 4QBSLͷ4USFBNʹྲྀ͠ࠐΉ
  20. 56.

    count.foreachRDD((rdd, time) => { val count = rdd.count() if (count

    > 0) { rdd.map(record => ("spark", streamMessageParse(record._1.toString).message, record._2)) .saveToCassandra("builderscon", "counter", SomeColumns("stream", "message", "counter")) } }) count.print() streamingContext.start() streamingContext.awaitTermination() } ྲྀΕͯ͘Δ3%%ͷॲཧ։࢝ ूܭ݁ՌΛ$BTTBOESBͷΧϥϜʹNBQ ͯ͠ॻ͖ࠐΉ
  21. 65.

    { "tableName": “action", "schemaName": "analyze", "topicName": "analyze.action", "message": { "dataFormat":

    "json", "fields": [ { "name": "uuid", "mapping": "uuid", "type": "VARCHAR" }, { "name": "uri", "mapping": "uri", "type": "VARCHAR" }, { "name": "name", "mapping": "name", "type": "VARCHAR" } ] } LBGLBͷUPQJD৘ใΛهࡌ ΧϥϜΛఆٛϓϦϛςΟϒͳܕʹม׵
  22. 67.

    SELECT redttt._key, redttt._value, test_id, test_name, created_at, uri, uuid FROM my_tests.testing.tests

    AS myttt INNER JOIN red_tests.test.string AS redttt ON redttt._key = myttt.test_name INNER JOIN kafka_tests.analyze.action AS kafkataa ON kafkataa.name = myttt.test_name WHERE myttt.test_name = '{$name}' LIMIT 1";
  23. 69.

    Apache Kafka GUI • Cluster؅ཧ 
 https://github.com/yahoo/kafka-manager • Message؅ཧ 


    https://github.com/landoop/kafka-topics-ui
 https://github.com/ldaniels528/trifecta
 https://github.com/Landoop/kafka-topics-ui
 https://github.com/Landoop/kafka-connect-ui
 ͳͲ