Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What is the Modern Data Stack?

Marketing OGZ
September 20, 2022
360

What is the Modern Data Stack?

Marketing OGZ

September 20, 2022
Tweet

Transcript

  1. What is the Modern Data Stack? What is it, how

    did we get here, and where is it going?
  2. 2 What is the Modern Data Stack? Frank Knot 23

    August 2022 Solution Architect Fivetran
  3. 3 Agenda What is it, how did we get here,

    and where is it going? 1. How strides in technology enabled the current state of data platforms 2. The essential components & features that make up a modern data stack 3.What data teams need to know to take advantage of emerging trends
  4. 5 The modern data stack. Every department & individual wants

    to be data-driven, teams are using 100s of tools to do so. The most important difference between a modern data stack and a legacy data stack is that the modern data stack is hosted in the cloud, requires little technical configuration, and opens up the barrier to entry for more users.
  5. 2012 - 2016 Components of the modern data stack built

    and strengthened their features, innovation is king. The History of the MDS Pre-2012 Business data scattered across siloed databases, machine-generated data relied on Hadoop, analytics lived in local Excel spreadsheets. 6 2012 Amazon Redshift is released - this is regarded as the historical event that caused a major shift. 2022 The modern data stack is becoming and setting the standard in data & analytics.
  6. 7 Apps Cloud-based Databases Cloud-based and on-prem Reporting for departmental

    executives Cloud-native destinations with limited support for processing to transform data Customized scripts tailored to each data source to extract and transform data ETL is standard. More data lives in more places — and needs to get it to other places, including now the cloud.
  7. 8 Many Sources Infinitely Elastic Modern EDW or Data Lake

    Many Use Cases Huge Audience Modern Data Stack Welcome, ELT. Rapid expansion of tools creating data, destination types, and departments to serve mean data needs to be more accessible by more people.
  8. The modern data stack is a set of technologies that

    power data analytics efficiently. THE MDS
  9. 11 The Modern Data Stack: Visualized Modeled Data Schema 1

    Schema n table 1 table n table 1 table n Marketing Analytics Databases Files Finance Analytics Product Analytics Sales Analytics Data Sources Data Pipeline Replicated Data Raw Database Schema 1 Schema n table 1 table n table 1 table n Transformation Inputs Transformation Outputs Cloud Data Warehouse Business Intelligence Data Transformations Reverse ETL Embedded Analytics Ad Hoc Reporting Data Science & AI/ML Data Lake Data Sources Ingestion Destinations Transformations Output Event Collectors
  10. Modern Data Stack Data sources are representative of any type

    of platform that stores your data. Common classifications for 1st party data include: • Databases • Files • Applications, categorized by use case ◦ i.e. marketing analytics, finance analytics • Event Collectors 12 Data Sources Marketing Analytics Databases Files Finance Analytics Product Analytics Sales Analytics Data Sources Event Collectors
  11. Modern Data Stack A data pipeline extracts data from your

    sources and loads that data to your destination. A modern data pipeline needs to be: • Low code / no code • Normalized & automated • Maintained for data integrity 13 Ingestion Data Pipeline Replicated Data
  12. Modern Data Stack Destinations refer to where your targeted data

    lands. Destinations in the Modern Data Stack are natively cloud-based, compute and store more data at lower costs. • Data warehouses are relational databases designed to store and transform data, best for quickly analyzing historical data. • Data lakes hold high volumes of data in a raw format, supporting structured, unstructured, and semi-structured types of data, best for building AI models. 14 Destinations Modeled Data Schema 1 Schema n table 1 table n table 1 table n Raw Database Schema 1 Schema n table 1 table n table 1 table n Transformation Inputs Transformation Outputs Cloud Data Warehouse Data Transformations Data Lake
  13. Modern Data Stack Data transformation is the process of revising,

    computing, separating and combining raw data into analysis-ready data models. A modern transformations tool needs to be: • Automated and regular • Ensure data integrity • In SQL, common language for analysts and engineers 15 Transformations Modeled Data Schema 1 Schema n table 1 table n table 1 table n Raw Database Schema 1 Schema n table 1 table n table 1 table n Transformation Inputs Transformation Outputs Cloud Data Warehouse Data Transformations Data Lake
  14. Now your data is extracted, stored, cleaned, and ready to

    be put to use. Where data goes to get analyzed. Needs to be accessible, easy to understand. • Business intelligence • Embedded analytics • Ad hoc reporting Newer to the modern data stack. • Reverse ETL: Process of loading data from a data warehouse into an application or tool • Data Science & AI/ML: Enables machines to make decisions based on learned data models Modern Data Stack 16 Output Business Intelligence Reverse ETL Embedded Analytics Ad Hoc Reporting Data Science & AI/ML
  15. The Modern Data Stack: Cloud-First Legacy systems are inelastic &

    expensive to scale - cloud destinations enable higher volumes of data movement & compute at a lower cost. 18 With cloud-first technology growth, ETL evolved into ELT, enabling post-load transformations execution in the cloud destination. SQL is inherent in the MDS for data modeling which allows data analysts to action earlier in the MDS, in transformations. Data as an automated service is the cornerstone of the MDS, instead of being created and maintained by you & your team. ETL → ELT SQL-Based Fully Managed Essential Features
  16. 19 Compute & store higher volumes of data at a

    significantly faster rate & reduced cost Higher Volumes + Lower Costs Data Accessibility Self-service analytics programs increase your organization’s data literacy Built to Scale Pricing & products supports scale for data volumes, # of users, and use cases Best-in-Breed Technologies Specialization drives innovation and modularity gives your team flexibility The Modern Data Stack: The Benefits
  17. The Fivetran Modern Data Stack Fully-Managed Data Integration: • Managed

    data connectors • Automatic schema migration • Integrated transformations Target Destinations: • Cloud Data Warehouses • Cloud Data Lakes Support for Downstream Use Cases: • Data-driven models (Customer Journey or Experience) • BI reporting / visualizations • Powering customer applications Data Sources Automated Ingestion Cloud Infrastructure Transformation Output Data Science & AI /ML (Domino, DataRobot, DataIku, Sagemaker…) Operational Sync (Census, Thoughtspot)) Ad Hoc (Tableau, Google Sheets, Excel) Dashboards (Looker, Tableau, Superset, Mode) Embedded Analytics (Sisense, Looker) Augmented Analytics (Thoughtspot, Sisu) OLTP Databases via CDC Applications/ERP (Oracle, Salesforce, Netsuite…) Event Collectors (Segment, Snowplow) Integrated Transform s Normalized Schemas Logs 3rd Party APIs (Zuora, Stripe, Facebook) File and Object Storage Data Lake Fivetran Data Flows Other Data Flows Dimensional Schemas Cloud Data Warehouse Managed Data Connectors
  18. ➔Automatic Data Updates (DML) ➔Automatic Schema Migrations (DDL) ➔Automated Recovery

    from Failure (Idempotent) ➔Micro-batched architecture ➔Query Ready Data in 5 minutes The Fivetran Advantage
  19. 23 Heterogeneous Data Sources Marketing Analytics CONNECTORS, DESTINATIONS, AND BI

    TOOL SUPPORT Finance Analytics Product Analytics PBF Customer Success Analytics Engineering Analytics Databases E-Commerce & POS Analytics HR Analytics Marketo Salesforce Marketing Cloud Hubspot LinkedIn Ads Facebook Pages Youtube Analytics Google Analytics Google Ads Twitter Ads Pinterest Ads Appsflyer Adobe Analytics Google Campaign Manager Klayvio TikTok Ads Segment Snapchat Criteo Anaplan Quickbooks Xero NetSuite SAP Stripe Shopify Coupa Magento Stripe Square Shopify Zuora Sales Analytics Salesforce Microsoft Dynamics 365 Asana Jira Delighted Zendesk Mixpanel Pendo Heap Optimizely Asana Jira GitHub Zendesk Freshdesk Google Sheets Microsoft Ads Facebook Ads Amazon S3 Quickbooks Google Search Console Salesforce Intercom Kustomer Iterable Front Freshdesk Survey Monkey Workday Greenhouse Oracle Mongo DB Maria DB SQL Server MySQL Google Cloud Postgres DynamoDB Files, Events, Technical Sources FTP Webhooks Azure Blob Storage Dropbox Google Sheets AWS Lambda Amazon S3 Email Destinations Snowflake Amazon Redshift S3 Azure Synapse Relational Database AWS Google BigQuery Databricks BI Looker dbt Tableau Sigma Google Ads ThoughtSpot Sisense Twilio Lightspeed Retail Pipedrive Outreach
  20. 24 Make every decision data driven with automated, scalable, and

    secure data movement. Your network (Cloud) Data Lake Source Agent* Data Warehouse Event Stream (Kafka) File Your network (Cloud / On-premise) Events Application Databases Files Destination Agent *High Volume Real Time Replication Fivetran Reference Architecture HVR Hub Unpack/ Appconnect/Table Explore
  21. 25 Securing your pipelines Flexible connection methods Secure, encrypted connections

    whether it's via SSH, VPN, or PrivateLink. Anonymize personal data Block or hash sensitive data before it touches a destination. Data doesn’t persist Data is purged as soon as it’s successfully written to the destination. Transparency & auditability Full control over data access and detailed logging means full control over how data is handled — and by whom. Control data access Granular role-based access control ensures users get the right level of access. Control user access Role based access control and user provisioning via SSO providers. CCPA, GDPR, HIPAA, ISO, PCI, SOC2
  22. 27

  23. Rise of the Modern Data Platform The Modern Data Stack

    is evolving into the Modern Data Platform. Now, concept of “lakehouse” creates convergence of traditional BI and AI/ML/Multimodel data processing infrastructures. Fivetran is positioned to the key data movement solution for the MDP (ingestion & transformation). 28