Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Flinkで構築する リアルタイムストリーム処理 パイプライン #scalafukuoka /processing-platform-with-apache-flink

Apache Flinkで構築する リアルタイムストリーム処理 パイプライン #scalafukuoka /processing-platform-with-apache-flink

Scala福岡2017のLT資料です

Manabu Matsuzaki

July 29, 2017
Tweet

More Decks by Manabu Matsuzaki

Other Decks in Technology

Transcript

  1. Source(input), Sink(output) ͸৭ʑબ୒Մೳ • Twitter (source) • Kafka (source/sink) •

    RabbitMQ (source/sink) • Apache NiFi (source/sink) • AWS Kinesis (source/sink)
  2. Source(input), Sink(output) ͸৭ʑબ୒Մೳ • HDFS (sink) • Elasticsearch (sink) •

    Cassandra (sink) • Redis (sink) • Flume (sink) • ActiveMQ (sink) • Third-Party Projects (e.g. Apache Zeppelin)
  3. // Word count in Scala // set up the execution

    environment val env = ExecutionEnvironment.getExecutionEnvironment // get input data val text = env.fromElements( "To be, or not to be --that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles") // count val counts = text .flatMap { _.toLowerCase.split("\\W+") } .map { (_, 1) } .groupBy(0) .sum(1) // emit result and print result counts.print()
  4. // Word count in Java // set up the execution

    environment // (লུ) // get input data // (লུ) // count DataStream<Tuple2<String, Integer>> counts = text .flatMap((String line, Collector<String> out) -> { String[] tokens = line.toLowerCase().split("\\W+"); Arrays.stream(tokens) .forEach(out::collect); }) .map(s -> new Tuple2<>(s, 1)) .groupBy(0) .sum(1); // emit result and print result // (লུ)
  5. • ֤http statusίʔυ਺ • http statusίʔυͷׂ߹
 (2xx, 3xx, 4xx, 5xx)

    • ϨεϙϯελΠϜͷׂ߹ • ϨεϙϯελΠϜͷpercentile
 (avg, min, 50, 90, 95, 98, 99) ͪͳΈʹɺΧϯϑΝϨϯεͰ࿩͕ग़ͯΔ ΞΫηεϩάूܭ͸͜Μͳ΍ͭͰ͢