Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data platform example architectures

Data platform example architectures

Bhuvanesh

April 02, 2020
Tweet

More Decks by Bhuvanesh

Other Decks in Technology

Transcript

  1. 1 2 3 4 5 Agenda Introduction - Why do

    we need to think beyond Data Lakes? Driving automation and insights utilizing AWS data services Best Practices for Data Architecture Implementation Case Studies and Outcomes Delivered by Searce Q&A
  2. Know your data Structured Un-structured Semi-structured • CRM • ERP

    • SQL Databases • Log Files • Image files • Calls • Mobile Data • iOT Sensors • Social media data
  3. Data Life Cycle in AWS Data Platform Kinesis SFTP DMS

    Snowball Direct Connect DynamoDB ElasticSearch Glue Catalog Glue EMR RedShift Athena QuickSight Data Ingestion Get your data into S3 with secure Data Catalog Access & Search metadata Process & Analytics Get insights from your data MSK
  4. Data Lake or Data Warehouse Data Lake Data Warehouse Schema

    on Read PROCESSING Schema on write Structured, Semi Structured, Unstructured, Raw DATA Structured and Processed Designed For Low Cost Storage STORAGE Expensive for large data volumes Helps for fast ingestion of new data DATA PROCESSING Time-consuming to introduce new content. Data Scientists, etc. USERS Business Professionals
  5. 1 2 3 4 5 AWS Data Lake Infrastructure Highly

    durable & Unlimited storage Support for open file formats Easy integration to other AWS services Secure, Complainant & Audit Decouple of storage and compute
  6. ETL for Analytics • RDS - Source • Glue -

    ETL • S3 - Storage • Athena - Interactive Query service
  7. Streaming Data Solutions with Amazon Kinesis Components: • Kinesis Data

    Stream • Kinesis FireHose • Kinesis Analytics • Lambda • DynamoDB • SNS
  8. Streaming Relational Database Solution - CDC Components: • RDS MySQL

    • Debezium Connector • AWS MSK • S3 • ElasticSearch • EMR • RedShift • Consumer App
  9. 1 2 3 4 5 6 Extract Transform & Process

    Data Lake (Storage) Visualization AI/ML Data Lake Lifecycle Security
  10. Data Governance “Data governance is the formal orchestration of people,

    processes, and technology that enables an organization to leverage data as an enterprise asset.” Data Governance on AWS: • De-Identified Data lake • Data Matching • Data Transformation • Data Catalog • Analytics and Data processing • Monitoring
  11. Where are you in your Data Journey? Ecommerce or Retail

    - Real- time Analytics • Real time clickstream data • Use ML for Recommendation engine. Services: 1. Kinesis 2. Sagemaker 3. DynamoDB Digital Native already on Cloud - cost optimization • Move complex ETL workloads to BigData clusters • Move Large volume of cold data to DataLake Services: 1. RedShift 2. EMR 3. Glue 4. S3 5. Athena 6. Spectrum Traditional Enterprise or DNB - DW/DL - Security • Move your Glue catalog, Athena to Lake Formation. • Control the database/Storage level access with AWS Lake formation Services: 1. Lake Formation 2. IAM 3. KMS
  12. Speaker Wei Chung Low Sr. Specialist Partner Solution Architect Big

    Data and Analytics Amazon Web Services Best Practices for Data Architecture
  13. Challenge Solution Business Impact Case Studies | AWS | FlowerAura

    Needed reliable Data Lake solutions to: • Collect and process POS as well as website/ mobile application data • Support analytics-based services for deeper understanding of purchasing behaviors • Help the customers/visitors to make a better decision while purchasing • Built a Data Lake that collected real-time data from the existing data sources and used AWS Glue which performed ETL on the collected data • Trained the transformed data using Sagemaker which provided recommendations to the customer as per the browsing and purchasing history • A single source of truth with all data sources in one repository • The recommendation engine presented users with choices regarding items based on selections and from the list of available items. • It led to upsell, higher offtake, greater retention of existing customers, and lower advertising costs. FlowerAura is an online flower store that delivers deliver the best quality fresh cut flowers in more than 220 cities across India using strong affiliate network and channel stores. Workload AWS S3,Redshift, DynamoDb, Quicksight, Sagemaker, AWS Glue/Kinesis Industry E-Commerce
  14. Challenge Solution Business Impact Case Studies | AWS | Britannia

    • Manual processes for ETL and consolidating data- took 3-4 days and scale was a big bottleneck. • Fulfillment for 18000+ stores all across India by analyzing the purchase behavior of customers to help Britannia identify the demand-supply pattern and keep up to date with the SKU's • Provisioned infrastructure on AWS - VPC, ETL instances, RedShift, Processing Server and Server which will host Tableau. • Established Site to Site VPN • Initiated one time dump of the data from On Premise SQL server to S3 • Authored ETL jobs for loading from multiple data sources from On-premise to AWS S3 and help Emisha team connect Tableau Server to the Redshift • Deployed and served ML model Workload S3, Redshift, VPC, EC2, Sagemaker Industry FMCG Britannia Industries Limited is one of the oldest existing Indian food-products corporations. Existing manual process of consolidating data for analytics and predictions on customer's buying patterns took 3-4 days, now replaced by a real time dashboard, thus making tracking and management easy.
  15. Generative Designs: Data meets AI, meets creativity For world’s 5th

    largest watchmaker, Searce created a Deep Learning model that is capable of generating watch designs based on input parameters These parameters included • Band Color • Dial Color • Gender • Dial Size • Band Material