bridge crossing big gap landscape view sunset drawing black white, https://replicate.com/stability-ai/stable-diffusion Niels Zeilemaker & Jovan Gligorevic
EMR substantially reduced the effort required to build/deploy an Hadoop platform. Deploying a cluster now only took ~1.5 hours, and not 5 days as it typically took before. Gateway Node HDInsight Cluster
it easier/less error prone run jobs on these platforms. Also, their capabilities started to align with those of traditional data warehouses, and hence started to be an alternative Synapse, BigQuery, Snowflake can be used as a drop in replacement for most data warehouses. Data Lake V2 Databricks Data Lake Data Lake Gen2 Data Science ML Workspace Datasets Experiments Pipelines Models Notebook VMs Databricks Raw Reporting Prepared Data Lake Gen2 Synapse Analytics Data Mart Landing Zone Blob Storage Data Factory Scheduling
being able to ingest streaming and batch sources, dedicated data science environments, and data marts to support the business Data Lake V2 Databricks Staging Cleaned Aligned Blob Storage Landing Zone Data Lake Data Lake Gen2 Databricks Raw Databricks Prepared Data Lake Gen2 Data Mart Event Hub Capture Event Hub Data Science ML Workspace Datasets Experiments Pipelines Models Notebook VMs Business Events Raw Events Integration Shared Compute Databricks Reporting Reporting API Kubernetes Service Application Gateway Cosmos DB Synapse Analytics Data Factory Scheduling
being able to ingest streaming and batch sources, dedicated data science environments, and data marts to support the business Data Lake V2 Databricks Staging Cleaned Aligned Blob Storage Landing Zone Data Lake Data Lake Gen2 Databricks Raw Databricks Prepared Data Lake Gen2 Data Mart Event Hub Capture Event Hub Data Science ML Workspace Datasets Experiments Pipelines Models Notebook VMs Business Events Raw Events Integration Shared Compute Databricks Reporting Reporting API Kubernetes Service Application Gateway Cosmos DB Synapse Analytics Data Factory Scheduling
allows you to build data transformation pipelines • Being SQL-based, it allow much more people to contribute to the ETL pipelines compared to PySpark • It follows software engineering best practices like version-control, modularity, portability, CI/CD • dbt comes with built-in documentation support, keeping code and documentation in the same place 9 people getting new job black white drawing blue hue , https://replicate.com/stability-ai/stable-diffusion
Aligned Blob Storage Landing Zone Data Lake Data Lake Gen2 Databricks Raw Databricks Prepared Data Lake Gen2 Data Mart Event Hub Capture Event Hub Data Science ML Workspace Datasets Experiments Pipelines Models Notebook VMs Business Events Raw Events Integration Shared Compute Databricks Reporting Reporting API Kubernetes Service Application Gateway Cosmos DB Synapse Analytics Data Factory Scheduling
it is included in Office365), but typically requires either a pro or premium licence. Consists of 2 main components; • Power BI Desktop • Power BI Service
one or more datasets, uses those to create a report/visualisation, and finally publishes those to the power bi service inside an workspace. Consumers can directly access reports from the service, or through an app which can be a collection of reports. Report Designers Power BI desktop interact publish Consumers create Power BI service Power BI app
is typically used by a single team. This teams needs data, hence a connection (datasource) is created to the data platform. Another team, has a slightly different data need, and hence another connection is made. The next team, has the same data requirements, but doesn’t know that the first workspace already has the data. Resulting in yet another connection. Power BI service Workspace Datasets Reports Dashboards Tables Workspace Datasets Reports Dashboards Tables Workspace Datasets Reports Dashboards Tables Prepared Synapse Analytics
platform to also include a single Power BI workspace, the platform team can be made responsible maintaining and creating the link to the platform. No more duplicate connections, but also a much better user experience in Power BI. Power BI service Workspace Datasets Workspace Datasets Reports Dashboards Tables Workspace Datasets Reports Dashboards Tables Prepared Synapse Analytics
of the data in Power BI to get a better understand of the data they are working with. However, it’s not trivial to document your datasets, as it requires many separate steps in order to do so.
made by Marc Lelijveld, improves upon this with a dedicated desktop app which allows you to document your models. Which internally is using the XMLA endpoint of a single workspace.
DBT. Same is true for column descriptions. By extending DBT, we can make sure that this documentation also lands into Power BI. version: 2 models: - name: events description: This table contains clickstream events from the marketing website columns: - name: event_id description: This is a unique identifier for the event tests: - unique - not_null - name: user-id description: The user who performed the event tests: - not_null A DBT model