Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWS Kinesis

AWS Kinesis

Quick Introduction to Amazon Kinesis Stream

Julian Kleinhans

June 02, 2017
Tweet

More Decks by Julian Kleinhans

Other Decks in Technology

Transcript

  1. 3 02.06.17 Amazon Kinesis Amazon Kinesis is a real- time

    data processing platform ... ... which makes it easier to work with real-time, streaming data in the AWS Cloud.
  2. 4 02.06.17 Kinesis Product Family Kinesis Firehose Available since 2015

    Load massice volumes of streaming data into Amazon S3 and Redshift Kinesis Analytics Available since 2016 Analyze data streams using SQL queries Kinesis Streams Available since 2014 Build your own custom application that process or analyze streaming data
  3. 5 02.06.17 AWS Kinesis Streams High-throughput, low- latency service for

    real-time data processing over large, distributed data streams
  4. 6 02.06.17 AWS Kinesis Streams It`s like a message queue,

    but more scalable and with multiple concurrent readers of each message
  5. 7 02.06.17 Typical Use Cases Process and analyse Log data,

    Finance data, Mobile or Online Gaming data in real-time
  6. 9 02.06.17 Key Concepts Shards • Streams a made of

    shards • One shard provides a capacity of 1 MB/sec data input and 2 MB/sec data output • One shard can support up to 1000 PUT records per second • Add or remove shards dynamically by resharding the stream Producer Producer … ENDPOINT Shard 1 … Shard n Shards
  7. 10 02.06.17 Key Concepts Data Record • A record is

    the unit of data stored in • A record is composed of a partition key, data blob and a • self generated unique sequence number • Max size of payload is 1 MB (after base64-decoding) • Accessible for a default of 24 hours (up to 7 days) Shard 1 … Shard n ... ... Data Record # Partition Key Data Blob (Payload) # Sequence Number Unique auto generated by Kinesis
  8. 11 02.06.17 Key Concepts Producer (data ingestion) • Options for

    writing • AWS SDKs (PUTRecord), Kinesis Producer Library (KPL), Amazon Kinesis Agent ... • KPL is an easy-to-use, highly configurable, Java based libary developed by Amazon Consumer • Options for reading • AWS SDKs, Kinesis Client Library (KCL), EC2, Lambda ... • KCL = Life Saver !! Also developed by Amazon • Available in Java, Python, Ruby, NodeJS and .NET
  9. 12 02.06.17 Consumer Sequential reading -> Two-step process 1) GetShardIterator,

    to establish the position within the shard • Options • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • TRIM_HORIZON • LATEST Shard 1 … Shard n LATEST New records AFTER_SEQUENCE_NUMBER AT_SEQUENCE_NUMBER TRIM_HORIZON All records in last 24h
  10. 13 02.06.17 Consumer Sequential reading -> Two-step process 2) GetRecords,

    with shardIterator from step 1 • max 2 MB/sec • Use getRecords inside a loop (low level API) • Or use KCL (high level API) Shard 1 … Shard n New records AT_SEQUENCE_NUMBER max 2 MB/sec
  11. 14 02.06.17 Pricing Shard-hour $0.015 PUT payload units (1 unit

    = 25KB) $0.014 Extended data retention (up to 7 days), per shard hour $0.020
  12. 15 02.06.17 DEMO Terraform provider "aws" {} resource "aws_kinesis_stream" "test_stream"

    { name = "aws-kinesis-demo" shard_count = 1 retention_period = 24 } AWS Utility https://github.com/kj187/aws-utility $ php bin/aws-utility.php kinesis:produce $ php bin/aws-utility.php kinesis:consume