• Over a decade of experience in the industry • Software Development • .NET (C# and VB.NET) • JVM (Scala) • JavaScript • Data Streaming Architectures • Kafka, Kinesis, Event Hubs • Application Architecture • AWS, Azure, and on-prem solutions • Incessant traveler with a new-found skiing addiction • I Live in Denver, by way of St Louis, and grew up in TN INTRO: KEVIN TINN 2
are available at https://github.com/kevasync/aws-meetup-group-data-services • This link is available in the comments of the Meetup deets https://www.meetup.com/AWSMeetupGroup/events/269768602/ Please join the Meetup if you haven’t already REPO INFO 4
batches, which only allows apps to be as up to date, highly dependent systems may even write to each other’s persistence layer GOAL: MOVE FROM BATCH TO EVENT 5
data • Produce messages for temperature and pressure readings at a manufacturing site • Raw data is stored for compliance purposes • Data is separated into separate streams of data: one for temp, the other for pressure • Enrich pressure data with altitude reference data • Enrich temperature data with ambient weather reference data • Store enriched data in s3 data lake with Athena query capabilities DEMO DATA PIPELINE OVERVIEW 9
out repo from my Terraform and Pulumi Meetup: https://github.com/kevasync/aws-meetup-group-terraform • Wanted to use Pulumi on this to try out the new v2.0 release • Introduces full fidelity between languages, including full C# support • Love Terraform too • Demo • get into repo and take a spin around the project • Deploy from Pulumi CLI INTRO TO CODED INFRA AND DEPLOYMENT 12
durable data ingestion and processing service optimized for streaming data • Allows for many-to-many communication with extremely low latency • Fully managed service • Stream consists of shards, which allow for parallelism • Messages are produced with a partition key to allow for time-ordered process • When dealing with stream processing, always make consumption an idempotent process COMPONENTS: KINESIS STREAM 13
for delivering real-time streaming data to destinations such as • Either dump (Commonly referred to as sink) data to sources without writing code • Mapping transformations allow for light ETL tasks • Various destination are supported • S3 • Redshift • Elasticsearch • Splunk COMPONENTS: KINESIS FIREHOSE DELIVERY STREAM 14
can process and analyze streaming data using standard SQL. The service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics • Allows for stream input to be combined with other streams, as well as reference data from s3 • Uses SQL syntax that is relatively intuitive • Apache Flink can be used as well • Similar to KSQL in the Confluent Platform COMPONENTS: KINESIS ANALYTICS APPLICATION 15
offered by Amazon Web Services (AWS) that provides object storage through a web service interface • Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run • Athena databases are configured to read from an s3 bucket • Athena database table specifies schema of files, as well as format • CSV • TSV • JSON • Parquet COMPONENTS: S3 & ATHENA 16
Create Athena tables • Demo data going through Analytics Applications • Py data producer • View data in raw, enriched, and reference buckets • Dive into Analytics Application in console • SQL Syntax • Check out Athena query interface MANUAL STEPS 17
event driven architectures • Overview of Demo architecture • Deployment of coded infrastructure • Overview of AWS components used • Manual steps to complete setup of Analytics Applications and Athena • Thank you for coming • Please talk to me if you have further questions CONCLUSION 19
impl • Data warehousing with Red Shift • Shared app layer with Elasticsearch • Redshift/s3 integration using s3 Spectrum • Other sweet AWS data things… • Crowdsourcing ideas welcome! • Curiosities • Problems • AppSync Part 2, Second Wednesday of May – Austin Loveless • IoT Core, Last Wednesday of June – Kevin Tinn UPCOMING MEETUPS 20