Slide 1

Slide 1 text

What is the Modern Data Stack? What is it, how did we get here, and where is it going?

Slide 2

Slide 2 text

2 What is the Modern Data Stack? Frank Knot 23 August 2022 Solution Architect Fivetran

Slide 3

Slide 3 text

3 Agenda What is it, how did we get here, and where is it going? 1. How strides in technology enabled the current state of data platforms 2. The essential components & features that make up a modern data stack 3.What data teams need to know to take advantage of emerging trends

Slide 4

Slide 4 text

The history of the MDS 4

Slide 5

Slide 5 text

5 The modern data stack. Every department & individual wants to be data-driven, teams are using 100s of tools to do so. The most important difference between a modern data stack and a legacy data stack is that the modern data stack is hosted in the cloud, requires little technical configuration, and opens up the barrier to entry for more users.

Slide 6

Slide 6 text

2012 - 2016 Components of the modern data stack built and strengthened their features, innovation is king. The History of the MDS Pre-2012 Business data scattered across siloed databases, machine-generated data relied on Hadoop, analytics lived in local Excel spreadsheets. 6 2012 Amazon Redshift is released - this is regarded as the historical event that caused a major shift. 2022 The modern data stack is becoming and setting the standard in data & analytics.

Slide 7

Slide 7 text

7 Apps Cloud-based Databases Cloud-based and on-prem Reporting for departmental executives Cloud-native destinations with limited support for processing to transform data Customized scripts tailored to each data source to extract and transform data ETL is standard. More data lives in more places — and needs to get it to other places, including now the cloud.

Slide 8

Slide 8 text

8 Many Sources Infinitely Elastic Modern EDW or Data Lake Many Use Cases Huge Audience Modern Data Stack Welcome, ELT. Rapid expansion of tools creating data, destination types, and departments to serve mean data needs to be more accessible by more people.

Slide 9

Slide 9 text

What makes up the modern data stack 9

Slide 10

Slide 10 text

The modern data stack is a set of technologies that power data analytics efficiently. THE MDS

Slide 11

Slide 11 text

11 The Modern Data Stack: Visualized Modeled Data Schema 1 Schema n table 1 table n table 1 table n Marketing Analytics Databases Files Finance Analytics Product Analytics Sales Analytics Data Sources Data Pipeline Replicated Data Raw Database Schema 1 Schema n table 1 table n table 1 table n Transformation Inputs Transformation Outputs Cloud Data Warehouse Business Intelligence Data Transformations Reverse ETL Embedded Analytics Ad Hoc Reporting Data Science & AI/ML Data Lake Data Sources Ingestion Destinations Transformations Output Event Collectors

Slide 12

Slide 12 text

Modern Data Stack Data sources are representative of any type of platform that stores your data. Common classifications for 1st party data include: ● Databases ● Files ● Applications, categorized by use case ○ i.e. marketing analytics, finance analytics ● Event Collectors 12 Data Sources Marketing Analytics Databases Files Finance Analytics Product Analytics Sales Analytics Data Sources Event Collectors

Slide 13

Slide 13 text

Modern Data Stack A data pipeline extracts data from your sources and loads that data to your destination. A modern data pipeline needs to be: ● Low code / no code ● Normalized & automated ● Maintained for data integrity 13 Ingestion Data Pipeline Replicated Data

Slide 14

Slide 14 text

Modern Data Stack Destinations refer to where your targeted data lands. Destinations in the Modern Data Stack are natively cloud-based, compute and store more data at lower costs. ● Data warehouses are relational databases designed to store and transform data, best for quickly analyzing historical data. ● Data lakes hold high volumes of data in a raw format, supporting structured, unstructured, and semi-structured types of data, best for building AI models. 14 Destinations Modeled Data Schema 1 Schema n table 1 table n table 1 table n Raw Database Schema 1 Schema n table 1 table n table 1 table n Transformation Inputs Transformation Outputs Cloud Data Warehouse Data Transformations Data Lake

Slide 15

Slide 15 text

Modern Data Stack Data transformation is the process of revising, computing, separating and combining raw data into analysis-ready data models. A modern transformations tool needs to be: ● Automated and regular ● Ensure data integrity ● In SQL, common language for analysts and engineers 15 Transformations Modeled Data Schema 1 Schema n table 1 table n table 1 table n Raw Database Schema 1 Schema n table 1 table n table 1 table n Transformation Inputs Transformation Outputs Cloud Data Warehouse Data Transformations Data Lake

Slide 16

Slide 16 text

Now your data is extracted, stored, cleaned, and ready to be put to use. Where data goes to get analyzed. Needs to be accessible, easy to understand. ● Business intelligence ● Embedded analytics ● Ad hoc reporting Newer to the modern data stack. ● Reverse ETL: Process of loading data from a data warehouse into an application or tool ● Data Science & AI/ML: Enables machines to make decisions based on learned data models Modern Data Stack 16 Output Business Intelligence Reverse ETL Embedded Analytics Ad Hoc Reporting Data Science & AI/ML

Slide 17

Slide 17 text

What’s important about a modern data stack 17

Slide 18

Slide 18 text

The Modern Data Stack: Cloud-First Legacy systems are inelastic & expensive to scale - cloud destinations enable higher volumes of data movement & compute at a lower cost. 18 With cloud-first technology growth, ETL evolved into ELT, enabling post-load transformations execution in the cloud destination. SQL is inherent in the MDS for data modeling which allows data analysts to action earlier in the MDS, in transformations. Data as an automated service is the cornerstone of the MDS, instead of being created and maintained by you & your team. ETL → ELT SQL-Based Fully Managed Essential Features

Slide 19

Slide 19 text

19 Compute & store higher volumes of data at a significantly faster rate & reduced cost Higher Volumes + Lower Costs Data Accessibility Self-service analytics programs increase your organization’s data literacy Built to Scale Pricing & products supports scale for data volumes, # of users, and use cases Best-in-Breed Technologies Specialization drives innovation and modularity gives your team flexibility The Modern Data Stack: The Benefits

Slide 20

Slide 20 text

Making Data as Reliable as Electricity 20

Slide 21

Slide 21 text

The Fivetran Modern Data Stack Fully-Managed Data Integration: ● Managed data connectors ● Automatic schema migration ● Integrated transformations Target Destinations: ● Cloud Data Warehouses ● Cloud Data Lakes Support for Downstream Use Cases: ● Data-driven models (Customer Journey or Experience) ● BI reporting / visualizations ● Powering customer applications Data Sources Automated Ingestion Cloud Infrastructure Transformation Output Data Science & AI /ML (Domino, DataRobot, DataIku, Sagemaker…) Operational Sync (Census, Thoughtspot)) Ad Hoc (Tableau, Google Sheets, Excel) Dashboards (Looker, Tableau, Superset, Mode) Embedded Analytics (Sisense, Looker) Augmented Analytics (Thoughtspot, Sisu) OLTP Databases via CDC Applications/ERP (Oracle, Salesforce, Netsuite…) Event Collectors (Segment, Snowplow) Integrated Transform s Normalized Schemas Logs 3rd Party APIs (Zuora, Stripe, Facebook) File and Object Storage Data Lake Fivetran Data Flows Other Data Flows Dimensional Schemas Cloud Data Warehouse Managed Data Connectors

Slide 22

Slide 22 text

➔Automatic Data Updates (DML) ➔Automatic Schema Migrations (DDL) ➔Automated Recovery from Failure (Idempotent) ➔Micro-batched architecture ➔Query Ready Data in 5 minutes The Fivetran Advantage

Slide 23

Slide 23 text

23 Heterogeneous Data Sources Marketing Analytics CONNECTORS, DESTINATIONS, AND BI TOOL SUPPORT Finance Analytics Product Analytics PBF Customer Success Analytics Engineering Analytics Databases E-Commerce & POS Analytics HR Analytics Marketo Salesforce Marketing Cloud Hubspot LinkedIn Ads Facebook Pages Youtube Analytics Google Analytics Google Ads Twitter Ads Pinterest Ads Appsflyer Adobe Analytics Google Campaign Manager Klayvio TikTok Ads Segment Snapchat Criteo Anaplan Quickbooks Xero NetSuite SAP Stripe Shopify Coupa Magento Stripe Square Shopify Zuora Sales Analytics Salesforce Microsoft Dynamics 365 Asana Jira Delighted Zendesk Mixpanel Pendo Heap Optimizely Asana Jira GitHub Zendesk Freshdesk Google Sheets Microsoft Ads Facebook Ads Amazon S3 Quickbooks Google Search Console Salesforce Intercom Kustomer Iterable Front Freshdesk Survey Monkey Workday Greenhouse Oracle Mongo DB Maria DB SQL Server MySQL Google Cloud Postgres DynamoDB Files, Events, Technical Sources FTP Webhooks Azure Blob Storage Dropbox Google Sheets AWS Lambda Amazon S3 Email Destinations Snowflake Amazon Redshift S3 Azure Synapse Relational Database AWS Google BigQuery Databricks BI Looker dbt Tableau Sigma Google Ads ThoughtSpot Sisense Twilio Lightspeed Retail Pipedrive Outreach

Slide 24

Slide 24 text

24 Make every decision data driven with automated, scalable, and secure data movement. Your network (Cloud) Data Lake Source Agent* Data Warehouse Event Stream (Kafka) File Your network (Cloud / On-premise) Events Application Databases Files Destination Agent *High Volume Real Time Replication Fivetran Reference Architecture HVR Hub Unpack/ Appconnect/Table Explore

Slide 25

Slide 25 text

25 Securing your pipelines Flexible connection methods Secure, encrypted connections whether it's via SSH, VPN, or PrivateLink. Anonymize personal data Block or hash sensitive data before it touches a destination. Data doesn’t persist Data is purged as soon as it’s successfully written to the destination. Transparency & auditability Full control over data access and detailed logging means full control over how data is handled — and by whom. Control data access Granular role-based access control ensures users get the right level of access. Control user access Role based access control and user provisioning via SSO providers. CCPA, GDPR, HIPAA, ISO, PCI, SOC2

Slide 26

Slide 26 text

What’s next for the modern data stack 26

Slide 27

Slide 27 text

27

Slide 28

Slide 28 text

Rise of the Modern Data Platform The Modern Data Stack is evolving into the Modern Data Platform. Now, concept of “lakehouse” creates convergence of traditional BI and AI/ML/Multimodel data processing infrastructures. Fivetran is positioned to the key data movement solution for the MDP (ingestion & transformation). 28

Slide 29

Slide 29 text

Questions?

Slide 30

Slide 30 text

Meet us at Booth 44