Load massice volumes of streaming data into Amazon S3 and Redshift Kinesis Analytics Available since 2016 Analyze data streams using SQL queries Kinesis Streams Available since 2014 Build your own custom application that process or analyze streaming data
shards • One shard provides a capacity of 1 MB/sec data input and 2 MB/sec data output • One shard can support up to 1000 PUT records per second • Add or remove shards dynamically by resharding the stream Producer Producer … ENDPOINT Shard 1 … Shard n Shards
the unit of data stored in • A record is composed of a partition key, data blob and a • self generated unique sequence number • Max size of payload is 1 MB (after base64-decoding) • Accessible for a default of 24 hours (up to 7 days) Shard 1 … Shard n ... ... Data Record # Partition Key Data Blob (Payload) # Sequence Number Unique auto generated by Kinesis
writing • AWS SDKs (PUTRecord), Kinesis Producer Library (KPL), Amazon Kinesis Agent ... • KPL is an easy-to-use, highly configurable, Java based libary developed by Amazon Consumer • Options for reading • AWS SDKs, Kinesis Client Library (KCL), EC2, Lambda ... • KCL = Life Saver !! Also developed by Amazon • Available in Java, Python, Ruby, NodeJS and .NET
to establish the position within the shard • Options • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • TRIM_HORIZON • LATEST Shard 1 … Shard n LATEST New records AFTER_SEQUENCE_NUMBER AT_SEQUENCE_NUMBER TRIM_HORIZON All records in last 24h
with shardIterator from step 1 • max 2 MB/sec • Use getRecords inside a loop (low level API) • Or use KCL (high level API) Shard 1 … Shard n New records AT_SEQUENCE_NUMBER max 2 MB/sec