Slide 1

Slide 1 text

clda.co/jeffconf-hamburg Serverless Data Warehousing & Data Analysis on AWS 2/16/2018

Slide 2

Slide 2 text

About Me twi$er://@alex_casalboni Computer Science Background Master in Sound & Music Engineering Sr. SoMware Engineer & Web Developer clda.co/jeffconf-hamburg

Slide 3

Slide 3 text

Agenda Why do you need a DWH? Warehouses Vs. Lakes Serverless Architecture Q & A clda.co/jeffconf-hamburg

Slide 4

Slide 4 text

Why do you need a DWH? #bigdata clda.co/jeffconf-hamburg

Slide 5

Slide 5 text

Data Warehousing goals clda.co/jeffconf-hamburg Historical data repository ReporUng & DDDM Data Analysis & ML Data integraUon

Slide 6

Slide 6 text

clda.co/jeffconf-hamburg How “Big” is your Data?

Slide 7

Slide 7 text

clda.co/jeffconf-hamburg How “Correct” is your Data?

Slide 8

Slide 8 text

clda.co/jeffconf-hamburg Data-Driven Decision Making

Slide 9

Slide 9 text

Warehouses Vs. Lakes #buzzwordschallenge clda.co/jeffconf-hamburg

Slide 10

Slide 10 text

Warehouses Vs. Lakes clda.co/jeffconf-hamburg Only structured Data Rigid & Expensive Business-Analyst-friendly Literally any kind of Data Agile & Cheap Data-ScienUsts-friendly

Slide 11

Slide 11 text

Hybrid approaches clda.co/jeffconf-hamburg DWH Data Lake Amazon Redshi. Amazon Athena Redshi. Spectrum Amazon S3 +

Slide 12

Slide 12 text

SeparaUon of compute and storage clda.co/jeffconf-hamburg Independent scaling Storage stays cheap and highly available Compute scales out only if/when needed Data sources can be reused

Slide 13

Slide 13 text

clda.co/jeffconf-hamburg

Slide 14

Slide 14 text

Serverless Data IngesUon & Data AnalyUcs Architecture #JeffFTW clda.co/jeffconf-hamburg

Slide 15

Slide 15 text

Architecture black box clda.co/jeffconf-hamburg 1. Submit event/data 2. Submit query/analysis 3. Fetch analysis results

Slide 16

Slide 16 text

Architecture goals clda.co/jeffconf-hamburg No hourly/monthly costs No servers to manage No scale limitaUons or resize Possibly anonymous producers Storage as cheap as possible Data validaUon / manipulaUon IntuiUve data exploraUon & reporUng Real-Ume metrics & alerts

Slide 17

Slide 17 text

clda.co/jeffconf-hamburg 1. Get CredenUals 3. Put Records 2. HTTP POST 4. Filter / Manipulate 5. Compress & Encrypt 6. Query 7. SPICE Import 8. Analyse 9. Sliding SQL 10. Process aggregates 11. Update RealUme Metrics

Slide 18

Slide 18 text

Gotchas clda.co/jeffconf-hamburg Kinesis Data AnalyUcs & Streams are not 100% serverless API Gateway isn’t cheap (directly using PutRecords might help) Don’t forget Athena ParUUons to reduce cost and latency AWS Glue is your friend for ETL and schema discovery

Slide 19

Slide 19 text

Deploy it with AWS SAM! clda.co/jeffconf-hamburg github.com/alexcasalboni/serverless-data-pipeline-sam

Slide 20

Slide 20 text

clda.co/jeffconf-hamburg

Slide 21

Slide 21 text

Danke schön :) Q & A clda.co/jeffconf-hamburg 2/16/2018