Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Stream Processing with AWS Kinesis

Data Stream Processing with AWS Kinesis

Avatar for Alexey Novakov

Alexey Novakov

May 03, 2021
Tweet

More Decks by Alexey Novakov

Other Decks in Programming

Transcript

  1. Agenda # # # # # K I N E

    S I S S E R V I C E S S T R E A M ( 1 ) A N A L Y T I C S A P P L I C A T I O N ( 2 ) F I R E H O S E ( 3 ) S U M M A R Y 2
  2. Kinesis Services 3 Data Stream (acts as buffer) Analytics App

    Firehose (acts as buffer): - can deliver to destination unlike Stream - can transform data as well reads writes S3 writes reads Video Stream reads
  3. Data Streams 5 - collect gigabytes of data per second

    - make it available for processing and analysing in real time - serverless - SDKs for Java, Scala, Python, Go, Rust, etc. - data retention 1-365 days - AWS Glue Schema Registry - Up to 1 Mb payload - Array[Byte] Comparison with Kafka: concepts Kinesis Kafka Message holder stream topic Throughput shard partition Server N/A broker
  4. Shards 6 def putEntry(key: String, data: String) = PutRecordsEntry( partitionKey

    = key, data = data.getBytes("UTF-8"), explicitHashKey = None )
  5. Data Stream Cost 8 Property Spec Records / second 100

    Avg. record size, KB 100 Consumer count 10 Total monthly cost* Frankfurt, EU Ohio, US 662.26 USD 551.27 USD *as of April 2021 Max (9.77 shards needed for ingress, 48.85 shards needed for egress, 0.100 shards needed for records) = 48.85 Number of shards
  6. Analytics Applications 10 Some use-cases: - Generate time-series analytics -

    Feed real-time dashboards - Create real-time metrics Option 2: Scala/Java Flink appliaction (jar on S3) Option 1: ANSI 2008 SQL standard with extensions Reminds Kafka tools: - Kafka-Streams lib - KSQL - Any client Kafka app SELECT STREAM "number", AVG("temperature") AS avg_temperature FROM "sensor-temperature_001" -- Uses a 10-second tumbling time window GROUP BY "number", FLOOR(("sensor-temperature_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
  7. Flink Option: consumer 11 val input = createConsumer(env, consumerProps) input

    .flatMap { json => Option(json) .filter(_.trim.nonEmpty) .map(j => Json.readValue(j, classOf[Event])) } .keyBy(_.sensor.number) // Logically partition the stream per sensor id .timeWindow(Time.seconds(10), Time.seconds(5)) // Sliding window definition .apply(new TemperatureAverager) .name("TemperatureAverager") .map(Json.writeAsString(_)) .addSink(createProducer(producerProps)) .name("Kinesis Stream")
  8. Flink Option: TemperatureAverager 12 /** apply() is invoked once for

    each window */ override def apply( sensorId: Int, window: TimeWindow, events: Iterable[Event], out: Collector[Event] ): Unit = { val (count, sum) = events.foldLeft((0, 0.0)) { case ((count, temperature), e) => (count + 1, temperature + e.temperature) } // emit an Event with the average temperature out.collect(Event(window.getEnd, avgTemp, events.head.sensor)) } val avgTemp = if (count == 0) 0 else sum / count
  9. Analytics Application Cost 13 Unit conversions SQL: SQL KPUs: 5

    per day * (730 hours in a month / 24 hours in a day) = 152.08 per month Pricing calculations: 10 applications x 152.08 KPUs x 0.127 USD = 193.14 USD per month for SQL applications Kinesis Data Analytics for SQL applications cost (monthly): 193.14 USD* *as of April 2021
  10. Firehose: data flow 15 Processing feaures: 1. Convert data to

    Parquet/ORC 2. Transform data with AWS Lambda Destinations: 1. S3 2. Redshift 3. Elasticsearch 4. Splunk 5. HTTP endpoint Sources: 1. Data Streams 2. Direct PUT
  11. Firehose Data Delivery 16 Frequency - Depends on destination: S3,

    Redshift, etc. - Firehose buffers data, thus your flow is not real streaming - S3: - Buffer size: 1-128 Mb - Buffer interval: 60-900 seconds - …. - ….
  12. Kinesis Monitoring 18 - CloudWatch Logs, Metrics - Custom Metrics

    - CloudTrail Streams, Analytics Applications, Firehose:
  13. 20

  14. Thank you! Questions? 22 Twitter: @alexey_novakov Blog: https://novakov-alexey.github.io/ Example project

    to create: - stream - analytics app, - firehose, - and run producer https://github.com/novakov-alexey/kinesis-ingest