ー タ 抽 出 ・ 取 込 Data Warehouse Data Marts (Curate) BI Dashboard Explorer 分 析 Raw Data Lake ML Model ML Enrich デ ー タ 抽 出 ・ 変 換 ・ 取 込 デ ー タ 変 換 解 析 Data Sources
ー タ 抽 出 ・ 取 込 Data Warehouse Data Marts (Curate) BI Dashboard Explorer 分 析 Raw Data Lake ML Model ML Enrich Enrich Curate 解 析 デ ー タ 変 換 デ ー タ 変 換 データ抽出・取込 データ抽出・取込 Data Sources
• データレイク譲りのコスト効率、柔軟性 • ストレージとコンピューティングの分離によるスケーラビリティ • SQL / Python双方のAPIをもつことによる透過的なデータアクセス ストレージ層 BI Dashboard Explorer SQL Lakehouse ML Model ML Pythn コンピューティングエンジン API Raw Enrich Curate
Storage • Shaping The Lake: Data Lake Framework – Adatis • Zones in a Data Lake — SQL Chick • How to structure the Data Lake | The Digital Talk • The Fundamentals of Data Warehouse + Data Lake = Lake House | by Garrett R Peternel | Towards Data Science その他データレイクゾーンの参考
• 地理的な近さによる待機時間の低減 • GDPRなどのデータ持ち出し要件への対応 • 複数ストレージによるスケーラビリティ • Cons • 構成の複雑さ • データの複製による各種コスト リージョン、グローバルレイク Global Data Lake Region(JP) Data Lake Region(US) Data Lake
Centralized ownership vs decentralized ownership | James Serra's Blog • 成功するデータメッシュの構築 – 単なるテクノロジーイニシアチブ以上のもの|リン クトイン (linkedin.com) • Data Trends: Comparing Data Fabrics, Data Meshes, And Knowledge Graphs – Diffblog (diffbot.com) • Data Mesh: The Balancing Act of Centralization and Decentralization | by Piethein Strengholt | Mar, 2022 | Towards Data Science Data Mesh その他参考
overview Azure Data Lake Storage Process Ingest Azure Blob Storage Cosmos DB Dataverse Microsoft SQL Event Hubs Pipelines Streaming Operational system Azure Data Landing Analytical sandbox Serve Consume Enrich Azure Machine Learning Azure Cognitive Service Azure Synapse Analytics Power BI Service Azure Stream Analytics Data Explorer Pool Serverless SQL Pool Dedicated SQL Pool Streaming Dataset Dataflow Datamart Dataset Reports Managed Online Endpoint Streaming data Predict Model train and MLOps Self-service Preparation(s) Cache/Query for enterprise data model Streaming Ingest Near real-time Analytics (Warm path) Stream Processing (Hot path) Streaming Ingest Store historical data (Cold path) Data warehousing Lakehouse view Big data Processing Data Integration Upload Extract Prediction with custom ML model Prediction with built-In AI model Deploy custom ML model Model Specific business data Cache/Query No-ETL data integration and HTAP with Synapse Link Visualize Analyze in Excel Extract subjective data /Preparation/Model Excel Application Push Movement Store Spark Pool Experiments ,models
model Raw Enrich Curate Databricks and Power BI Analytics solution overview Azure Data Lake Storage Process Ingest Azure Blob Storage Event Hubs Pipelines (Synapse/Data Factory) Streaming Operational system Landing Analytical sandbox Serve Consume Power BI Service Dataflow Datamart Reports Streaming data Prediction with built-In AI model MLOps Self-service Preparation(s) Cache/Query for enterprise data model Data Integration with auto loader Upload Extract Prediction with custom ML model Prediction with built-In AI model Model Specific business data Cache/Query Visualize Analyze in Excel Extract subjective data /Preparation/Model Excel Push Movement Store Azure ML Integration with mlflow Azure Databricks Data Science /Data Engineering Stream processing with Spark streaming SQL Analytics Lakehouse SQL workload Application Dataset Azure Cognitive Service