Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UMC_UtrechtXebia-Modular_Working_with_a_Knowled...

Avatar for Marketing OGZ Marketing OGZ PRO
September 29, 2025
2

 UMC_UtrechtXebia-Modular_Working_with_a_Knowledge-_and_Data_Platform_Building_Blocks_for_a_Flexible_Ecosystem.pdf

Avatar for Marketing OGZ

Marketing OGZ PRO

September 29, 2025
Tweet

More Decks by Marketing OGZ

Transcript

  1. The Universal Data Connector Teus Kappen Chief Science Information Officer

    Modular Working with a Knowledge- and Data Platform: Building Blocks for a Flexible Ecosystem Julian de Ruiter Field CTO - Data & AI
  2. THE DOCTOR WILL SEE YOU NOW HOW AI IS GOING

    TO CURE OUR SICK HEALTH CARE SYSTEM
  3. Healthcare is ready for A.I. and machine learning …the standardization

    of medical concepts dramatically eases communication between software… …the global Artificial Intelligence (AI) in healthcare market is projected to grow from $13.82 billion in 2022 to $164.10 billion by 2029… …more complex neural networks and the ability to learn from high dimensional data allow models to learn and extract as much knowledge as possible from the various kinds of medical data… …healthcare is uniquely primed for machine learning due to the exponential increase in the volume of patient data over the past two decades. Today, around 30% of the world's data is generated by the healthcare industry. - big data standardization - - investments complexity -
  4. Reality: a leaky AI pipeline van Royen FS, Moons KGM,

    Geersing G-J, et al. Developing, validating, updating and judging the impact of prognostic models for respiratory diseases. Eur Respir J 2022; 60: 2200250
  5. The number of AI models 5 Figure 3. Cumulative estimated

    number of regression and non- regression–based CPM development articles between 1950 and 2024.
  6. The number of AI models 6 one new model every

    20 minutes… Figure 3. Cumulative estimated number of regression and non- regression–based CPM development articles between 1950 and 2024.
  7. The number of AI models one new model every 20

    minutes… Figure 3. Cumulative estimated number of regression and non- regression–based CPM development articles between 1950 and 2024. …is developed and reported
  8. Reality: a leaky AI pipeline van Royen FS, Moons KGM,

    Geersing G-J, et al. Developing, validating, updating and judging the impact of prognostic models for respiratory diseases. Eur Respir J 2022; 60: 2200250
  9. Actual Reality: a leaky AI pipeline van Royen FS, Moons

    KGM, Geersing G-J, et al. Developing, validating, updating and judging the impact of prognostic models for respiratory diseases. Eur Respir J 2022; 60: 2200250
  10. 1 0

  11. Why a central (cloud) data platform? Breaking silo’s Empower users

    with easier data discovery and central access to high quality data. Governance Manage security, data access controls, and auditing from one central location. Scalability Handle growing data volumes and user demands, across a wide variety of workloads.
  12. What are the main challenges? Low latency Some use cases

    require low latency data (< 15 minutes), daily loads are not enough. Security Patient data is highly sensitive, should only be accessed by the right people at the right time. Maintainability Should not be too complex for the UMCU to maintain in the long run.
  13. Implementation: a Databricks Lakehouse • Unified platform: unify data, analytics

    and AI workloads on a single platform, simplifying the overall architecture. • Support for batch + streaming: supports both batch and streaming use cases, scales well to larger workloads. • Security and governance: data access managed via Unity Catalog, with dynamic masking of sensitive columns.
  14. Lakehouse Platform core: a Databricks Lakehouse HIX Cold source Warm

    path (incremental) Cold path (batch) Gold Layer Consumers Incremental ingestion Warm source Batch ingestion Bronze Layer
  15. 1 9

  16. 2 0

  17. 2 1

  18. Challenge: making it faster Operational applications require lower data latencies

    (< 1s) However, the lakehouse was not designed for this: • Azure Data Factory does not support low-latency ingestion. • Databricks Delta not ideal for sub-second workloads. • Complex transformations (e.g. joins) introduce even more latency.
  19. Let’s turn up the heat By adding an additional “hot”

    path: • Provides a subset of data with low latency (< 1s). • With support for simple data transformations.
  20. Let’s turn up the heat By adding an additional “hot”

    path: • Provides a subset of data with low latency (< 1s). • With support for simple data transformations. Built around event-streaming technologies: • Change-data-capture with Debezium. • Azure Event Hub for storing events.
  21. Lakehouse What does this look like? Consumers Streaming ingestion Hot

    path (streaming) Gold Hot source Bronze Incremental ingestion Warm source Batch ingestion Cold source
  22. Each possible diseases may have own AI risk model… 38

    but should be based on shared interpretations
  23. In the Lakehouse: SQL in DBT • We already use

    DBT to define data models in SQL + metadata. • Allows you to implement rules using SQL transformations.
  24. In the hot path: Hamilton • Open-source (Apache) general-purpose framework

    for calculating features. • Allows you to implement rules with just a few lines of Python. https://hamilton.apache.org
  25. In the hot path: Hamilton Consumers Streaming ingestion Bronze Hot

    path (streaming) Gold Hot source Features/rules (Hamilton)
  26. There are many rules. And they change…a lot Data and

    knowledge management by domain experts
  27. There are many, many interpretations UMC Utrecht: 36.000 local healthcare

    protocols Covers only 20% of all healthcare processes zorgprocessen (estimated) Each protocol: 1 to 700 decisions (estimated) Each decision: many data interpretations
  28. Healthcare professionals… …should be responsible for the creation of intepretation

    rules …should be responsible for the maintainance of interpretation rules
  29. Healthcare professionals… …should be responsible for the creation of intepretation

    rules …should be responsible for the maintainance of interpretation rules Because they are responsible for the interpretation
  30. Healthcare professionals… …should be responsible for the creation of intepretation

    rules …should be responsible for the maintainance of interpretation rules Because they are responsible for the interpretation (and we have 6,000 of them in our hospital)
  31. Healthcare professionals… …should be responsible for the creation of intepretation

    rules …should be responsible for the maintainance of interpretation rules Because they are responsible for the interpretation (and we have 6,000 of them in our hospital)
  32. Healthcare professionals… …should be responsible for the creation of intepretation

    rules …should be responsible for the maintainance of interpretation rules Because they are responsible for the interpretation (and we have 6,000 of them in our hospital) … but they can’t code
  33. Healthcare workers capture knowledge in data all the time It

    is how they register data in an EMR Local healthcare protocols contain many tables and flowcharts Plenty of spreadsheets and databases for quality and research purposes
  34. Storing knowledge as data… …allows healthcare professionals to better administer

    knowledge resulting in more efficient use of human resources
  35. Storing knowledge as data… …allows healthcare professionals to better administer

    knowledge resulting in more efficient use of human resources …allows you to create a system of depencies rather than a system of rules
  36. When somebody ran a marathon when it’s hot 58 IF

    T1 > 38.3 & D1 == “infection” THEN label_T1 = “fever” ELSE IF T1 > 38.3 & D1 == “heat stroke” THEN label_T1 = “hyperthermia”
  37. The code could also be… 59 IF T1 > 38.3

    & (D1 == “infection” | D1 == “atypical pneumonia” | D1 == “pneumonia” | D1 == “viral pneumonia” | D1 == “urinary tract infection” | D1 == “wound infection” | D1 == “tuberculosis” | D1 == “malaria” | D1 == “meningitis” | D1 == “cholecystitis” | D1 == “conjunctivitis” | D1 == “arthritis” | D1 == “appendicitis” | D1 == “pharyngitis” | D1 == “gingivitis” | D1 == “otitis media” | D1 == “otitis externa” | D1 == “endocarditis” | D1 == “myocarditis” | D1 == “hepatitis” | ………………) THEN label_T1 = “fever” ELSE IF T1 > 38.3 & D1 == “heat stroke” THEN label_T1 = “hyperthermia”
  38. Which infections are there? 60 Infections Append icitis Conjunc- tivitis

    Pneumonia Bacterial Viral TBC Viral Viral conjunctivitis Bacterial Pneumonia Viral Pneumonia Tuberculosis Infection Infection Infection Infection Code list Taxonomy
  39. Which infections are there? 61 Infections Append icitis Conjunc- tivitis

    Pneumonia Bacterial Viral TBC Viral Viral conjunctivitis Bacterial Pneumonia Viral Pneumonia Tuberculosis Infection Infection Infection Infection Code list Taxonomy
  40. But before diagnosis there is no data 62 Infections Append

    icitis Conjunc- tivitis Pneumonia Bacterial Viral TBC Viral Viral conjunctivitis Bacterial Pneumonia Viral Pneumonia Tuberculosis Infection Infection Infection Infection Code list Taxonomy
  41. Ontology Low Temperature High (>38.3) Hyperthermia Fever Cause Internal External

    Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms
  42. The system doesn’t know: fever or hyperthermia Low Temperature High

    (>38.3) Hyperthermia ? Fever ? Cause Internal External Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms data
  43. The professional probably knows whether it’s hyperthermia Low Temperature High

    (>38.3) Hyperthermia Fever Cause Internal External Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms
  44. Or that it is indeed a fever… Low Temperature High

    (>38.3) Hyperthermia Fever Cause Internal External Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms
  45. With the right knowledge the system can infer from the

    data… Low Temperature High (>38.3) Hyperthermia Fever Cause Internal External Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms data data
  46. …in any scenario Low Temperature High (>38.3) Hyperthermia Fever Cause

    Internal External Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms data data
  47. …from any starting point Low Temperature High (>38.3) Hyperthermia Fever

    Cause Internal External Infection Heat Stroke Fire Pneumonia Coughing Abdominal Pain Weather Appendicitis Symptoms referral data: suspected pneumonia data
  48. Storing knowledge as data… …allows healthcare professionals to better administer

    knowledge resulting in more efficient use of human resources …allows you to create a system of depencies rather than a system of rules resulting in adaptive care pathways by handling complexity
  49. Lakehouse The combined data + knowledge platform Consumers Streaming ingestion

    Hot source Incremental ingestion Warm source Batch ingestion Cold source Knowledge platform Hot path
  50. Lakehouse A Shared patient view – for both AI models

    and Apps Consumers Streaming ingestion Hot source Incremental ingestion Warm source Batch ingestion Cold source Knowledge platform Hot path AI model #1 AI model #... Application #... Application #1 shared interpretations
  51. Conclusion A combined data and knowledge platform as a Universal

    Data Connector Two large requirements: store knowledge as data in an ontological form add knowledge to data in near-realtime (hot path) Optimal use of technological and domain expertise A flexible ecosystem that allows adaptive collaborations between modules
  52. We’re (both) hiring! Ben jij een ervaren Platform Engineer die

    graag maatschappelijke impact maakt? Het UMCU zoekt nog deskundige Platform Engineer(s) om het team te versterken en mee te werken aan de bouw, architectuur en opzet van ons Cloud Dataplatform! Heb je interesse? Neem contact op of kom langs bij ons voor een praatje bij de Xebia booth! Contact: [email protected]