Upgrade to Pro — share decks privately, control downloads, hide ads and more …

apidays Paris 2024 - Expose and Govern real-tim...

apidays
December 22, 2024

apidays Paris 2024 - Expose and Govern real-time business critical data with APIs, El Ghali Benchekroun, Databricks

Expose and Govern real-time business critical data with APIs
El Ghali Benchekroun, Senior Specialist Solutions Engineer at Databricks

apidays Paris 2024 - The Future API Stack for Mass Innovation
December 3 - 5, 2024

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

December 22, 2024
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. DATABRICKS AT API DAYS x About me El Ghali BENCHEKROUN

    Sr. Specialist Solutions Engineer https://www.linkedin.com/in/elghali-benchekroun/ 2 I help customers with: • Cloud Infrastructure / Security & Networking • Infrastructure as Code & CI/CD • Platform Administration • MLOps
  2. DATABRICKS AT API DAYS x 3 10,000+ global customers $1.5B+

    in revenue $4B in investment Inventor of the lakehouse & Pioneer of generative AI Gartner-recognized Leader Database Management Systems Data Science and Machine Learning Platforms The data and AI company Creator of
  3. DATABRICKS AT API DAYS x 4 CHALLENGERS LEADERS NICHE PLAYERS

    VISIONARIES DATABRICKS DATA DATABASE MANAGEMENT SYSTEMS MQ AI + CHALLENGERS LEADERS NICHE PLAYERS VISIONARIES DATABRICKS DATA SCIENCE AND MACHINE LEARNING MQ
  4. DATABRICKS AT API DAYS x 5 Mission Statement Democratize Data

    & AI for every enterprise to become a Data Forward Company
  5. DATABRICKS AT API DAYS x Today’s Agenda • Introduction: Unlock

    the business with Data & AI APIs • Govern your Data & AI with Unity Catalog Open APIs • Expose your Data to any consumers with APIs • Monetize your Data & AI assets securely • Q&A 1 2 3 4 5 5 min 10 min 10 min 6
  6. DATABRICKS AT API DAYS x Problem Statement - Wind Turbine

    Farm When predictive maintenance unlocks the business 7 A global renewable energy company has experienced a sharp decline in revenue due to downtime caused by wind turbine failures. The business has determined that if they could proactively identify and repair wind turbines prior to failure they could increase energy production by 20%. The business meets with IT and requests a predictive dashboard that would allow their Turbine Maintenance group to identify wind turbines that are about to fail in real-time.
  7. DATABRICKS AT API DAYS x Problem Statement - Wind Turbine

    Farm Why predictive maintenance ? 8 Decreasing cost Reactive Maintenance Preventive Maintenance Predictive Maintenance REACTIVE PROACTIVE PROACTIVE Failure-based Too late Time-based Too early Condition-based On-time
  8. DATABRICKS AT API DAYS x Problem Statement - Wind Turbine

    Farm Why predictive maintenance ? 9 Data + AI Maturity Increasing Energy Production Clean Data Reports Ad Hoc Queries Data Exploration Predictive Modeling Prescriptive Analytics Automated Decision Making Predict which ones will break Schedule maintenance before failure Know which turbines are broken Automatically adjust production / spare stock for the entire fleet
  9. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 10 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  10. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 11 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  11. DATABRICKS AT API DAYS x Today’s Agenda • Introduction: Unlock

    the business with Data & AI APIs • Govern your Data & AI with Unity Catalog Open APIs • Expose your Data to any consumers with APIs • Monetize your Data & AI assets securely • Q&A 1 2 3 4 5 5 min 10 min 10 min 12
  12. DATABRICKS AT API DAYS x The Ideal Governance Solution Govern

    your Data & AI with Unity Catalog Open APIs 13 Unified Governance for Data and AI Open Access Open Connectivity Unity Catalog Tables Files Models AI Tools Lineage Monitoring Auditing Access control Data sharing Discovery Amazon EMR Amazon Redshift AWS Glue Amazon S3
  13. DATABRICKS AT API DAYS x Centralized Access Controls Centrally grant

    and manage access permissions across workloads and foreign databases 14 1 4 GRANT <privilege> ON <securable_type> <securable_name> TO `<principal>` GRANT SELECT ON iot.events TO engineers Choose permission level Sync groups from your identity provider ‘Table’= collection of files in S3/ADLS Using ANSI SQL DCL Using UI securable = catalog, schema, table, view, function, share, volume, model, etc…
  14. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 15 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  15. DATABRICKS AT API DAYS x Problem Statement - Wind Turbine

    Farm What is a Materialized View (MV)? 16 A special type of view that pre-computes and stores the results of a SQL query and keeps them fresh over time. Benefits: 1. Accelerate BI dashboards. Much faster to query data that is pre-computed vs querying base tables. 2. Reduce data processing costs. MV results are refreshed incrementally avoiding the need to completely rebuild the view when new data arrives. 3. Improve data access control. More tightly govern what data can be seen by consumers by controlling access to base tables. CREATE MATERIALIZED VIEW customer_orders AS SELECT customers.name, sum(orders.amount), orders.orderdate FROM orders LEFT JOIN customers ON orders.custkey = customers.c_custkey GROUP BY name, orderdate; Results are pre-computed and incrementally refreshed orders (Table) customers (Table)
  16. DATABRICKS AT API DAYS x Problem Statement - Wind Turbine

    Farm What is a Streaming Table (ST)? 17 Answer: A special type of table for ingesting and processing streaming data on the Lakehouse. Benefits: 1. Unlock real-time use cases. Ability to support real-time analytics/BI, machine learning and operational use cases with streaming data. 2. Better scalability. More efficiently handle high volumes of data via incremental processing vs large batches. 3. Enable more practitioners. Simple SQL syntax makes data streaming accessible to all data engineers and analysts. Cloud Storage (S3, ADLS, GCS) Message Queues (Kafka, Pub/Sub, Kinesis, etc) CREATE STREAMING TABLE web_clicks AS SELECT * FROM STREAM read_files('s3://mybucket') CREATE STREAMING TABLE server_logs AS SELECT from_json(...) data FROM STREAM read_kafka(...) Data stream
  17. DATABRICKS AT API DAYS x MVs and STs can be

    chained together to create data (or ETL) pipelines 18 Bronze Layer Silver Layer Gold Layer orders (Streaming Table) customers (Streaming Table) customer_orders (Materialized View) Cloud Storage (S3, ADLS, GCS) Message Queues (Kafka, Pub/Sub, Kinesis, etc) Enabled by project Enzyme, these pipelines incrementally process data to unlock real-time streaming use cases. This is a big differentiator for Databricks. orders_raw (Streaming Table) customers_raw (Streaming Table)
  18. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 19 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  19. DATABRICKS AT API DAYS x Today’s Agenda • Introduction: Unlock

    the business with Data & AI APIs • Govern your Data & AI with Unity Catalog Open APIs • Expose your Data to any consumers with APIs • Monetize your Data & AI assets securely • Q&A 1 2 3 4 5 5 min 10 min 10 min 20
  20. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 21 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  21. DATABRICKS AT API DAYS x Fine-Grained access control Querying Behavior

    22 • The Analyzer takes responsibility for looking up table names in UC. • If a table lookup includes row/column policies, the analyzer modifies the query plan to add a filter and/or projection to implement the policy. • DML commands by users with MODIFY privileges are supported. Filters and masks are applied to the data that is read by UPDATE and DELETE statements and are not applied to data that is written (including INSERT).
  22. DATABRICKS AT API DAYS x Fine-Grained access control Querying Behavior

    23 SELECT user_id, email, country, product, total FROM sales_raw WHERE country != ‘Germany’ Scan sales_raw table Filter Projection Secure View Filter CASE WHEN is_account_group_member('auditors') THEN email ELSE 'REDACTED' END CASE WHEN is_account_group_member('managers') THEN TRUE ELSE total <= 1000000 END WHERE country != ‘Germany’
  23. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 24 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  24. DATABRICKS AT API DAYS x Problem Statement - Wind Turbine

    Farm Augmenting engineers with a comprehensive, context-aware Q&A app 25
  25. DATABRICKS AT API DAYS x Today’s Agenda • Introduction: Unlock

    the business with Data & AI APIs • Govern your Data & AI with Unity Catalog Open APIs • Expose your Data to any consumers with APIs • Monetize your Data & AI assets securely • Q&A 1 2 3 4 5 5 min 10 min 10 min 26
  26. DATABRICKS AT API DAYS x Message Buses Data Sources Problem

    Statement - Wind Turbine Farm Predictive Maintenance with Lakehouse 27 Infer damaged turbines Load model and get turbine status in real time Data cleanup Enforce data quality Ingest new data Incrementally load data in real time (files, kafka…) Build training dataset Join sensor data with past turbine status Training model Predicting turbine status based on historical dataset Detect faulty equipment Increased energy production sensors_bronze (Streaming Table) sensors_silver (Streaming Table) status_turbine (Streaming Table) sensors_silver (Streaming Table) turbine_gold (Streaming Table) Sharing & Monetizing data & ML Models
  27. DATABRICKS AT API DAYS x Unlock more value with AI

    Open new revenue streams by sharing AI Models 28 Share AI Models, notebooks & volumes Supports diverse use cases, from data exploration to model deployment. Customers can derive quicker value with notebooks/dashboards and broader value with AI/ML and Analytics Databricks Marketplace AI models Notebooks Volumes Tables Views Data provider Data consumer
  28. DATABRICKS AT API DAYS x Today’s Agenda • Introduction: Unlock

    the business with Data & AI APIs • Govern your Data & AI with Unity Catalog Open APIs • Expose your Data to any consumers with APIs • Monetize your Data & AI assets securely • Q&A 1 2 3 4 5 5 min 10 min 10 min 29