Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to AWS Lake Formation

Ricardo Sueiras
November 14, 2019
110

Introduction to AWS Lake Formation

This is a level 200 deck that introduces the concept of Data Lakes and shows how AWS Lake Formation makes our customers lives easier by simplifying the steps to setup, secure and use your business data.

Ricardo Sueiras

November 14, 2019
Tweet

Transcript

  1. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda Why data lakes? What is hard about building data lakes? What is AWS Lake Formation? How it works!
  2. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Decision making used to… OLTP ERP CRM LOB Enterprise Data Warehouse Business Intelligence …revolve around the Enterprise Data Warehouse
  3. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data no longer fits Data every 5 years There is more data than people think 15 years live for Data platforms need to 1,000x scale >10x grows Data is more diverse
  4. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Broader workloads There are more people accessing data That want to analyze it in different ways And there are more rules around data use Data Scientists Analysts Business Users Applications machine learning SQL analytics scientific real-time, streaming
  5. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data lake: The new information hub A centralized secure repository that enables you to govern, discover, share, and analyze structured and unstructured data at any scale
  6. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda Why data lakes? What is hard about building data lakes? What is AWS Lake Formation? How it works!
  7. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Typical steps of building a data lake Setup storage 1 Move data 2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5 Ingestion & cleaning Security Analytics & ML Data Engineer Data Security Officer Data Analyst
  8. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Manually building secure data lakes is hard
  9. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  10. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda Why data lakes? What is hard about building data lakes? What is AWS Lake Formation? How it works!
  11. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Lake Formation lets you build secure data lakes in days
  12. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Built on Amazon S3 a robust data lake infrastructure Amazon S3 Data Lake Storage Cost effective, durable storage with global replication capabilities
  13. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Automates manual, repetitive, low value tasks Amazon S3 Data Lake Storage Lake Formation AWS Glue Blueprints ML Transforms Cost effective, durable storage with global replication capabilities Simplified ingest & cleaning enables data engineers to build faster
  14. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Provides a central locus of control Amazon S3 Data Lake Storage Lake Formation AWS Glue Blueprints ML Transforms Data Catalog Access Control Cost effective, durable storage with global replication capabilities Simplified ingest & cleaning enables data engineers to build faster Centralized management of fine grained permissions empower security officers
  15. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Enables all your data users Amazon S3 Data Lake Storage Lake Formation Cost effective, durable storage with global replication capabilities Simplified ingest & cleaning enables data engineers to build faster Centralized management of fine grained permissions empower security officers Comprehensive set of integrated tools enable every user equally Amazon Athena Amazon QuickSight Amazon Redshift Amazon SageMaker Amazon EMR AWS Glue Blueprints ML Transforms Data Catalog Access Control
  16. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Fastest way to build secure data lakes Amazon S3 Data Lake Storage Lake Formation Amazon Athena Amazon QuickSight Amazon Redshift Amazon SageMaker Amazon EMR Enables all your users to run any analytics workload, at any scale, in a secure and cost-effective manner AWS Glue Blueprints ML Transforms Data Catalog Access Control
  17. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Tools that enable data engineers, security officers & data analysts
  18. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building data lakes with Lake Formation Ingestion & cleaning Security Analytics & ML Serverless Spark Blueprints ML Transforms Data catalog Centralized permissions Real time monitoring Auditing Comprehensive portfolio of integrated tools Redshift Glue EMR Athena Data Engineer Data Security Officer Data Analyst
  19. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AWS Lake Formation is fully integrated w/ AWS Glue Blueprints Glue ETL Jobs Workflow Glue Crawlers Glue Data Catalog Connections, Databases, Tables Monitoring Security, search, collaboration AWS Glue AWS Lake Formation
  20. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Easily load data into your data lake w/ blueprints Logs DBs Prebuilt templates to serve common ingestion use cases Automatically build AWS Glue workflows AWS Glue jobs and crawlers discover, transform and structure data Load data incrementally or in full Automatically populate the Data Catalog Amazon CloudFront Elastic Load Balancing Amazon RDS Amazon Aurora AWS CloudTrail AWS Glue Workflows
  21. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Blueprints create AWS Glue workflows
  22. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. With blueprints You Point to data source Specify data lake location Specify data load frequency Blueprints Discover source table(s) schema Convert to target data format Partition data automatically Track data that was already processed Customize to your needs
  23. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Leverage machine learning to solve hard problems Record matching Finding the relationships between multiple datasets, even when those datasets do not share an identifier (or when their identifier is unreliable) Deduplication Transforming a dataset that has multiple rows referring to the same actual thinginto a dataset where no two rows refer to the same actual thing ML FindMatches
  24. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Securing data lakes with Lake Formation Ingestion & cleaning Security Analytics & ML Serverless Spark AWS Glue Glue ML transformations Blueprints Data catalog Centralized permissions Real time monitoring Integrated auditing Comprehensive portfolio of integrated tools Redshift Glue EMR Athena Data Engineer Data Security Officer Data Analyst
  25. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data Catalog & Permissions Permissions are set on data catalog objects Lake Formation & AWS Glue use the same Data Catalog Choice of using the Glue or the Lake Formation permissions system For backwards compatibility, the default settings enable the Glue permissions system Existing Glue crawlers, jobs, triggers and workflows will not change Existing access to Glue resources will still be governed by IAM & S3 policies Data Catalog ETL Jobs Access Control Crawlers Workflows
  26. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Upgrading to the Lake Formation permissions model Not using the Glue Catalog? Change the default settings to start using the Lake Formation permissions system Using the Glue Catalog? Explicitly upgrade each data location, database and table when ready 1) Understand existing policies / access / usage 2) Configure corresponding Lake Formation policies 3) Remove the Glue permissions system by changing the default settings 4) Turn on the Lake Formation permissions system by registering the location
  27. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Centralized permissions Data Catalog Access Control Lake Formation Amazon S3 Data Lake Storage Redshift Glue EMR Athena Data Security Officer Data Analyst
  28. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Security – Deep dive User IAM users, roles, and Active Directory Amazon S3 Lake Formation Redshift Glue EMR Athena Data Analyst
  29. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view permissions granted to a particular user Audit all data access in one place User 1 User 2
  30. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data catalog and metadata management Text-based search across all metadata Add attributes like data owners, stewards, and others as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  31. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Audit and monitor in real time See detailed activity in the console Analyze audit logs in CloudTrail using Amazon Athena Data ingest and catalog notifications also published to Amazon CloudWatch events Detailed activity
  32. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Accessing data lakes with Lake Formation Ingestion & cleaning Security Analytics & ML Serverless Spark AWS Glue Glue ML transformations Blueprints Data catalog Centralized permissions Real time monitoring Auditing Comprehensive portfolio of integrated tools Redshift Glue EMR Athena Data Engineer Data Security Officer Data Analyst
  33. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Comprehensive portfolio of integrated tools Compliant services honor Lake Formation permissions They guarantee that users only see tables & columns they have access to All access is logged and auditable Amazon Redshift AWS Glue Amazon EMR Amazon Athena
  34. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda Why data lakes? Why choose AWS for data lakes? What is AWS Lake Formation? How it works!
  35. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Step 1: Register S3 path as data lake location Data Engineer
  36. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Step 2: Load data with blueprints Data Engineer
  37. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Step 3: Grant permissions to users
  38. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Step 4: Query data with compatible services Data Analyst
  39. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Step 5: Audit and monitor in real time
  40. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used.