Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless Data Warehousing & Data Analysis on AWS

Alex Casalboni
February 16, 2018

Serverless Data Warehousing & Data Analysis on AWS

Data science teams need to reduce storage and maintenance costs, and at the same time provide analytics tools for data analysts and scientists.
How can we make data collection and data analysis exciting, performant, and cost-effective in the Cloud?
Alex will connect the dots around the data processing building blocks provided by AWS, without managing any servers!

Alex Casalboni

February 16, 2018
Tweet

More Decks by Alex Casalboni

Other Decks in Technology

Transcript

  1. About Me twi$er://@alex_casalboni Computer Science Background Master in Sound &

    Music Engineering Sr. SoMware Engineer & Web Developer clda.co/jeffconf-hamburg
  2. Agenda Why do you need a DWH? Warehouses Vs. Lakes

    Serverless Architecture Q & A clda.co/jeffconf-hamburg
  3. Warehouses Vs. Lakes clda.co/jeffconf-hamburg Only structured Data Rigid & Expensive

    Business-Analyst-friendly Literally any kind of Data Agile & Cheap Data-ScienUsts-friendly
  4. SeparaUon of compute and storage clda.co/jeffconf-hamburg Independent scaling Storage stays

    cheap and highly available Compute scales out only if/when needed Data sources can be reused
  5. Architecture goals clda.co/jeffconf-hamburg No hourly/monthly costs No servers to manage

    No scale limitaUons or resize Possibly anonymous producers Storage as cheap as possible Data validaUon / manipulaUon IntuiUve data exploraUon & reporUng Real-Ume metrics & alerts
  6. clda.co/jeffconf-hamburg 1. Get CredenUals 3. Put Records 2. HTTP POST

    4. Filter / Manipulate 5. Compress & Encrypt 6. Query 7. SPICE Import 8. Analyse 9. Sliding SQL 10. Process aggregates 11. Update RealUme Metrics
  7. Gotchas clda.co/jeffconf-hamburg Kinesis Data AnalyUcs & Streams are not 100%

    serverless API Gateway isn’t cheap (directly using PutRecords might help) Don’t forget Athena ParUUons to reduce cost and latency AWS Glue is your friend for ETL and schema discovery