Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ABS2024: Building an Enterprise Data Platform with Microsoft Fabric by Gerald Reif & Weili Gao

ABS2024: Building an Enterprise Data Platform with Microsoft Fabric by Gerald Reif & Weili Gao

⭐️ Building an Enterprise Data Platform with Microsoft Fabric#
In this session we share our hands-on experience with using Microsoft Fabric to build a new enterprise data platform on one of our client projects. We’ll discuss the key differences between Fabric and classic data platform products such as Azure Synapse, its advantages and disadvantages, why we chose Fabric as the solution for our platform, and what impact Fabric has on the future of data platforms.
🙂 GERALD REIF ⚡️ Principal Architect, Director @ ipt Innovation Process Technology AG
🙂 WEILI GAO ⚡️ Principal Architect @ ipt Innovation Process Technology AG

More Decks by Azure Zurich User Group

Other Decks in Technology

Transcript

  1. May 16, 2024 Building an Enterprise Data Platform with Microsoft

    Fabric Weili Gao, Dr. Gerald Reif Azure Bootcamp Switzerland 2024, Bern May 16, 2024
  2. May 16, 2024 www.ipt.ch Who We Are • Principal Architect

    @ ipt • Azure, cloud-native development, DevOps 2 • Principal Architect, Director @ ipt • Azure, enterprise data platforms Weili Gao Gerald Reif
  3. May 16, 2024 Agenda Experiences of an early adopter What

    is Microsoft Fabric? Our Experience working with Fabric Would we choose Fabric again? 3
  4. May 16, 2024 www.ipt.ch 4 Current data integration Project Goals

    • Build a data platform that automates the E2E process from data ingestion to the visualization in dashboards • Enable users to share and consume data via self-service • Create the basis for advanced data services (ML) Swissmedic aims to become a data-driven authority Project Background
  5. May 16, 2024 www.ipt.ch Swissmedic preset: Data platform to be

    based on Azure-native services 1. Azure Synapse Analytics - Lack of development on Synapse (last update April 2023; Delta lake update) 2. Azure Databricks - More platform engineering effort (SCC, service principals, secure context, user management, IP range, ...) 3. Microsoft Fabric - Great vision on paper, General Available since Nov. 2023 - Not yet feature complete, but important features for us are on the roadmap - Client is ok with waiting for those features 5 Which Options to consider? 1 2 3
  6. May 16, 2024 www.ipt.ch • A single unified SaaS data

    platform • One shared experience for data engineers, analysis, scientists and citizen developers • OneLake as unified data store, reducing need to move or duplicate data 6 Fabric Overview
  7. May 16, 2024 www.ipt.ch 7 Less platform engineering (no resource

    groups, individual resources (storage, key vault, synapse), private endpoints Organization within Fabric in Domains and Subdomains, not via subscriptions and resource groups PaaS SaaS Think Different: PaaS → SaaS
  8. May 16, 2024 www.ipt.ch SaaS: Build-in Deployment Pipelines 8 No

    external deployment pipelines required for dev → test → prod
  9. May 16, 2024 www.ipt.ch 10 Starting from F64 capacity several

    relevant enterprise features are included Free trial license corresponds to F64+/- Flat Fee for the Service
  10. May 16, 2024 www.ipt.ch 11 Many Options to Work with

    Data Notebook Data pipeline Dataflow Gen 2
  11. May 16, 2024 www.ipt.ch 12 Notebook • Spark notebooks supporting

    PySpark, Spark, Spark SQL, SparkR • Application: Data ingestion, transformation & visualization • “Closer” to software engineering • Only option currently that works with private endpoint
  12. May 16, 2024 www.ipt.ch 13 Data Pipeline • Similar to

    Azure Data Factory pipeline • Application: Data ingestion & simple transformations • Orchestration of notebooks, dataflows and other pipelines
  13. May 16, 2024 www.ipt.ch 14 Dataflow Gen 2 • Low-code

    tool for data ingestion & transformation • Can “code” using Data Analytics Expressions (DAX) in Power Query • Experience very similar to Excel or Power BI
  14. May 16, 2024 www.ipt.ch 15 Professional Developer Citizen Developer Data

    Ingestion Spark, Data pipeline Data pipeline, Dataflow Gen 2 Data Transformation Spark + Lakehouse T-SQL + Warehouse Data pipeline, Dataflow Gen 2 Orchestration Data pipeline Reports & visualization Spark (pandas, R) Power BI Many Options to Work with Data
  15. May 16, 2024 www.ipt.ch Data Ingestion Architecture @ Swissmedic 16

    • Three network environments (for now) • Moving Targets in the Architecture • Reuse of existing solutions • Challenge: Data Contract with source systems
  16. May 16, 2024 www.ipt.ch How do we deal with missing

    Enterprise Features 17 Our Approach • Analysis of Requirements • Fabric Release Plan • Risk Analysis • Mitigation Actions • Microsoft FastTrack Managed Identity Managed Virtual Endpoints Double Encryption GIT Support GitHub Support Got the OK from the Enterprise Architecture Board ✅ IaC Support i.e. Terraform Data Catalog and OneSecurity
  17. May 16, 2024 www.ipt.ch • Place to create collection of

    items such as warehouses, lakehouses, notebooks, pipelines, … • Access managed by workspace roles • Each workspace can be assigned to a • capacity • Folder & Git branch • private endpoint • managed identity 18 Workspaces
  18. May 16, 2024 www.ipt.ch 19 Choosing a Workspace model Workspace

    PROD Workspace TEST Workspace DEV PRD TEST DEV Private Workspace Git Feature branch Main branch Deployment pipeline Data Source PR
  19. May 16, 2024 www.ipt.ch Still open but on the roadmap:

    • Full integration with Purview • OnSecurity: Great Vision, Implementation to be seen Would we choose Fabric again? Our Experiences so far Enterprise Features were released or are on the release plan • Features delivered according to release plan • New features are announced in Blog 20 Yes, we would! ✅ Microsoft delivered, so far at least in parts Fabric is a product that keeps platform engineering low ➢ Important for Swissmedic All data processing related features are available