Slide 1

Slide 1 text

Role of {Active} Metadata {Lake} in any successful data architecture Big Data Expo {Utrecht} 14th September 2022

Slide 2

Slide 2 text

Agenda 1. What is Metadata 2. Types of Metadata 3. What is Metadata {Lake} mean 4. What is {Active} metadata 5. Metadata use case

Slide 3

Slide 3 text

What is metadata

Slide 4

Slide 4 text

Raw data without metadata 32423 110 30 AF 65433 70 25 AL 45645 50 20 DZ 98777 67 21 EG NULL 200 50 N/A

Slide 5

Slide 5 text

Raw data with metadata Employee # Salary Age Nationality 32423 110 30 AF 65433 70 25 AL 45645 50 150 DZ 98777 -28 21 EG NULL 200 50 N/A Salary of employee in thouthands Age of employee by year ISO Alpha-2-code standard Data Quality Data Gaps Data Mapping Data Transformatoin

Slide 6

Slide 6 text

Metadata is the key to unlock value from data

Slide 7

Slide 7 text

Types of metadata

Slide 8

Slide 8 text

Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. ➔ What this dataset is about? ➔ Formal description of the dataset? ➔ Where in the organization it belongs? ➔ Who is using this dataset? ➔ Where is it stored? Dataset Business details Data ownership ➔ Who is the data owner? ➔ Who is the data steward? ➔ In which department the owner and the steward belong? ➔ What is the data quality of this dataset? ➔ Is there any data gaps in the dataset?

Slide 9

Slide 9 text

Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. ➔ What is the name of the system that contains this dataset? ➔ What kind of technology is being used? ➔ Who is the system application owner who knows the details of the system? System details Dataset Structure details ➔ What kind of database? ➔ Table, column names ➔ Columns data types? ➔ Columns allowed values? ➔ Column description? ➔ Conditions associated with this table, view or stored procedure?

Slide 10

Slide 10 text

Operational metadata is a category of metadata that describes processing aspects of data at run time. It includes information about the operational side of a system for example, when was this file received, when it was processed, how much time it took to distribute it etc. Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. ➔ How much does it take to process 1 file? ➔ How much resources it took to run this process? ➔ Who triggered this process and when? ➔ How was the system utilization before and after running this process? ➔ How much could resource units it tool to run a use case end to end on daily basis? ➔ Are we utilizing the hardware the we own or rent? Operational Details

Slide 11

Slide 11 text

Operational metadata is a category of metadata that describes processing aspects of data at run time. It includes information about the operational side of a system for example, when was this file received, when it was processed, how much time it took to distribute it etc. Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. Social metadata is a category of metadata that describes the user perspective of the data by its consumers: Social media, Tags, labels,notes User experience ➔ Who is logging to the system? ➔ From which departments the users are coming? ➔ What kind of activities that this user usually do in my system? ➔ How can I improve the user experience based on the metadata that we collect about him/her? User details

Slide 12

Slide 12 text

Metadata Lake To be metadata driven company, all types of metadata need to be collected, stored, aggregated and analyzed in one central place to benefit from the amazing potentials of metadata Operational metadata is a category of metadata that describes processing aspects of data at run time. It includes information about the operational side of a system for example, when was this file received, when it was processed, how much time it took to distribute it etc. Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. Social metadata is a category of metadata that describes the user perspective of the data by its consumers: Social media, Tags, labels,notes User experience

Slide 13

Slide 13 text

Metadata {Lake}

Slide 14

Slide 14 text

Metadata Lake is an architecture proposal that will have the ability to enable the capturing, management of the origins, movement, characteristics and transformations of data as it moves through the organisation across various systems, processes and people to support the data lineage needs and some data governance reporting. A lake for Metadata ! Metadata Lake

Slide 15

Slide 15 text

Store all types of metadata (Business, technical, operational and social ) coming from all your company systems that are needed for lineage and all the supported use cases. Combine the metadata when needed to create a complete data set without creating a new dataset. Integrate the connected metadata parts to show lineage across systems. Generate new insights driven from metadata to support operation excellence, regulatory requirements and to show compliance. Profile the harvested metadata to check the quality of it. Verify if the harvested metadata is correct, accurate and delivered based on SLA. Proces all kinds of different metadata in the lake for future use.

Slide 16

Slide 16 text

Data Providers Data User Data Consumers Self-service DIAL onboarding, approving requests Self-service requesting data Metadata Lake NoSQL SQL Metadata Connectors Ready made Custom Harvesting metadata System 1 System 2 System 3 Sub-system 1 Sub-system 2 Sub-system 3 Sub-system 4 Sub-system 5 Sub-system 6 Sub-system 7 Sub-system 8 Sub-system 9 Sub-system 10 Sub-system 11 Sub-system 12 System 4 System 5 . . . Data Marketplace Use Case 1 Use Case 2 Use Case 3 Data Owner/Steward Metadata Lake Reference architecture

Slide 17

Slide 17 text

{Active} Metadata

Slide 18

Slide 18 text

Utilizing the harvested metadata and putting it into action in your day to day daily data activities. To be truly metadata driven organization, you need to move from passive to active metadata Active Metadata What happens when you apply machine learning (ML) to metadata so that it can be used to make decisions and trigger actions? Active metadata needs to be insightful to be useful for action and needs to be stored and made available in a way that enables operational use. Metadata management platforms apply ML to create insight about the metadata and take action on it. The action can occur by triggering a workflow or a platform may take some action automatically. Passive metadata is the standard way of aggregating and storing metadata into a static data catalog. This usually covers basic technical metadata — schemas, data types, models, etc What if? Value Don’t be passive

Slide 19

Slide 19 text

Active metadata use cases Active Metadata for Proactive Data Quality Systems that can analyze data and evaluate data quality may include data quality information as active metadata. Active Metadata for Improved Data Context In most instances, data usually does not come in perfectly labeled and defined columns. Sometimes columns names are so obscure that they don’t look or sound anything like the data in the underlying column or data asset. Active Metadata for Privacy and Regulation Compliance Metadata management platforms that can identify personal and sensitive information generate active metadata that may be used for regulation compliance.

Slide 20

Slide 20 text

{Active} Metadata {Lake} Use case

Slide 21

Slide 21 text

1 2 3 4 5 6 Step 1 Connect to PowerBI metadata Step 2 Extract the metadata and reverse engineer it to understand the relation between all the data points Step 3 MDL Store the harvested metadata into the metadata lake Step 4 Apply basic analytics to being value out of PowerBI metadata Step 5 Connect to dots and link the technical metadata towards the other types of metadata Step 6 Act, respond and improve the organization based on the powerfulness of metadata Analysing PowerBI metadata | use case PowerBI MDL Scanner API

Slide 22

Slide 22 text

Use case outcome | KPIs Data usage Data security Top 10 dashboards Unused dashboards Usage per department User demographics Top used datasets Worst performing dashboards

Slide 23

Slide 23 text

Metadata enables Data Mesh Architecture Detailed article about Active Metadata Lake Mahmoud LinkedIn Let’s stay connected Mahmoud Yassin Senior Data Manager @Booking.com

Slide 24

Slide 24 text

Thank you