Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWS Kinesis

AWS Kinesis

Quick Introduction to Amazon Kinesis Stream

Julian Kleinhans

June 02, 2017
Tweet

More Decks by Julian Kleinhans

Other Decks in Technology

Transcript

  1. 1 02.06.17
    AWS Kinesis
    Quick Introduction to Amazon Kinesis Stream
    02.06.2017, AOE Meetup, Julian Kleinhans

    View full-size slide

  2. 2 02.06.17
    Julian Kleinhans
    Software Architect @ AOE GmbH
    @kj187

    View full-size slide

  3. 3 02.06.17
    Amazon Kinesis
    Amazon Kinesis is a real-
    time data processing
    platform ... ... which makes it easier to
    work with real-time,
    streaming data in the
    AWS Cloud.

    View full-size slide

  4. 4 02.06.17
    Kinesis Product Family
    Kinesis Firehose
    Available since 2015
    Load massice volumes of
    streaming data into
    Amazon S3 and Redshift
    Kinesis Analytics
    Available since 2016
    Analyze data streams
    using SQL queries
    Kinesis Streams
    Available since 2014
    Build your own custom
    application that process or
    analyze streaming data

    View full-size slide

  5. 5 02.06.17
    AWS Kinesis Streams
    High-throughput, low-
    latency service for real-time
    data processing over large,
    distributed data streams

    View full-size slide

  6. 6 02.06.17
    AWS Kinesis Streams
    It`s like a message queue,
    but more scalable and
    with multiple concurrent
    readers of each message

    View full-size slide

  7. 7 02.06.17
    Typical Use Cases
    Process and analyse Log
    data, Finance data,
    Mobile or Online
    Gaming data in real-time

    View full-size slide

  8. 8 02.06.17
    High Level Architecture
    Source: http://docs.aws.amazon.com/streams/latest/dev/key-concepts.html

    View full-size slide

  9. 9 02.06.17
    Key Concepts
    Shards
    • Streams a made of shards
    • One shard provides a capacity of 1 MB/sec data input and 2 MB/sec data output
    • One shard can support up to 1000 PUT records per second
    • Add or remove shards dynamically by resharding the stream
    Producer
    Producer

    ENDPOINT
    Shard 1

    Shard n
    Shards

    View full-size slide

  10. 10 02.06.17
    Key Concepts
    Data Record
    • A record is the unit of data stored in
    • A record is composed of a partition key, data blob and a
    • self generated unique sequence number
    • Max size of payload is 1 MB (after base64-decoding)
    • Accessible for a default of 24 hours (up to 7 days)
    Shard 1

    Shard n
    ...
    ...
    Data Record
    #
    Partition Key
    Data Blob (Payload)
    #
    Sequence Number
    Unique auto generated by Kinesis

    View full-size slide

  11. 11 02.06.17
    Key Concepts
    Producer (data ingestion)
    • Options for writing
    • AWS SDKs (PUTRecord), Kinesis Producer Library (KPL), Amazon Kinesis Agent ...
    • KPL is an easy-to-use, highly configurable, Java based libary developed by Amazon
    Consumer
    • Options for reading
    • AWS SDKs, Kinesis Client Library (KCL), EC2, Lambda ...
    • KCL = Life Saver !! Also developed by Amazon
    • Available in Java, Python, Ruby, NodeJS and .NET

    View full-size slide

  12. 12 02.06.17
    Consumer
    Sequential reading -> Two-step process
    1) GetShardIterator, to establish the position within the shard
    • Options
    • AT_SEQUENCE_NUMBER
    • AFTER_SEQUENCE_NUMBER
    • TRIM_HORIZON
    • LATEST
    Shard 1

    Shard n LATEST
    New records
    AFTER_SEQUENCE_NUMBER
    AT_SEQUENCE_NUMBER
    TRIM_HORIZON
    All records in last 24h

    View full-size slide

  13. 13 02.06.17
    Consumer
    Sequential reading -> Two-step process
    2) GetRecords, with shardIterator from step 1
    • max 2 MB/sec
    • Use getRecords inside a loop (low level API)
    • Or use KCL (high level API)
    Shard 1

    Shard n
    New records
    AT_SEQUENCE_NUMBER
    max 2 MB/sec

    View full-size slide

  14. 14 02.06.17
    Pricing
    Shard-hour $0.015
    PUT payload units (1 unit = 25KB) $0.014
    Extended data retention (up to 7 days), per shard hour $0.020

    View full-size slide

  15. 15 02.06.17
    DEMO
    Terraform
    provider "aws" {}
    resource "aws_kinesis_stream" "test_stream" {
    name = "aws-kinesis-demo"
    shard_count = 1
    retention_period = 24
    }
    AWS Utility
    https://github.com/kj187/aws-utility
    $ php bin/aws-utility.php kinesis:produce
    $ php bin/aws-utility.php kinesis:consume

    View full-size slide

  16. 16 02.06.17
    Thank you
    Any Questions ?

    View full-size slide