AWS Kinesis

AWS Kinesis

Quick Introduction to Amazon Kinesis Stream

Bcf098adee8624904f168ccbf4d2d0a6?s=128

Julian Kleinhans

June 02, 2017
Tweet

Transcript

  1. 1 02.06.17 AWS Kinesis Quick Introduction to Amazon Kinesis Stream

    02.06.2017, AOE Meetup, Julian Kleinhans
  2. 2 02.06.17 Julian Kleinhans Software Architect @ AOE GmbH @kj187

  3. 3 02.06.17 Amazon Kinesis Amazon Kinesis is a real- time

    data processing platform ... ... which makes it easier to work with real-time, streaming data in the AWS Cloud.
  4. 4 02.06.17 Kinesis Product Family Kinesis Firehose Available since 2015

    Load massice volumes of streaming data into Amazon S3 and Redshift Kinesis Analytics Available since 2016 Analyze data streams using SQL queries Kinesis Streams Available since 2014 Build your own custom application that process or analyze streaming data
  5. 5 02.06.17 AWS Kinesis Streams High-throughput, low- latency service for

    real-time data processing over large, distributed data streams
  6. 6 02.06.17 AWS Kinesis Streams It`s like a message queue,

    but more scalable and with multiple concurrent readers of each message
  7. 7 02.06.17 Typical Use Cases Process and analyse Log data,

    Finance data, Mobile or Online Gaming data in real-time
  8. 8 02.06.17 High Level Architecture Source: http://docs.aws.amazon.com/streams/latest/dev/key-concepts.html

  9. 9 02.06.17 Key Concepts Shards • Streams a made of

    shards • One shard provides a capacity of 1 MB/sec data input and 2 MB/sec data output • One shard can support up to 1000 PUT records per second • Add or remove shards dynamically by resharding the stream Producer Producer … ENDPOINT Shard 1 … Shard n Shards
  10. 10 02.06.17 Key Concepts Data Record • A record is

    the unit of data stored in • A record is composed of a partition key, data blob and a • self generated unique sequence number • Max size of payload is 1 MB (after base64-decoding) • Accessible for a default of 24 hours (up to 7 days) Shard 1 … Shard n ... ... Data Record # Partition Key Data Blob (Payload) # Sequence Number Unique auto generated by Kinesis
  11. 11 02.06.17 Key Concepts Producer (data ingestion) • Options for

    writing • AWS SDKs (PUTRecord), Kinesis Producer Library (KPL), Amazon Kinesis Agent ... • KPL is an easy-to-use, highly configurable, Java based libary developed by Amazon Consumer • Options for reading • AWS SDKs, Kinesis Client Library (KCL), EC2, Lambda ... • KCL = Life Saver !! Also developed by Amazon • Available in Java, Python, Ruby, NodeJS and .NET
  12. 12 02.06.17 Consumer Sequential reading -> Two-step process 1) GetShardIterator,

    to establish the position within the shard • Options • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • TRIM_HORIZON • LATEST Shard 1 … Shard n LATEST New records AFTER_SEQUENCE_NUMBER AT_SEQUENCE_NUMBER TRIM_HORIZON All records in last 24h
  13. 13 02.06.17 Consumer Sequential reading -> Two-step process 2) GetRecords,

    with shardIterator from step 1 • max 2 MB/sec • Use getRecords inside a loop (low level API) • Or use KCL (high level API) Shard 1 … Shard n New records AT_SEQUENCE_NUMBER max 2 MB/sec
  14. 14 02.06.17 Pricing Shard-hour $0.015 PUT payload units (1 unit

    = 25KB) $0.014 Extended data retention (up to 7 days), per shard hour $0.020
  15. 15 02.06.17 DEMO Terraform provider "aws" {} resource "aws_kinesis_stream" "test_stream"

    { name = "aws-kinesis-demo" shard_count = 1 retention_period = 24 } AWS Utility https://github.com/kj187/aws-utility $ php bin/aws-utility.php kinesis:produce $ php bin/aws-utility.php kinesis:consume
  16. 16 02.06.17 Thank you Any Questions ?