Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Role of {Active} Metadata {Lake} in any successful data architecture

Marketing OGZ
September 20, 2022
74

Role of {Active} Metadata {Lake} in any successful data architecture

Marketing OGZ

September 20, 2022
Tweet

Transcript

  1. Role of {Active} Metadata {Lake} in any successful data architecture

    Big Data Expo {Utrecht} 14th September 2022
  2. Agenda 1. What is Metadata 2. Types of Metadata 3.

    What is Metadata {Lake} mean 4. What is {Active} metadata 5. Metadata use case
  3. Raw data without metadata 32423 110 30 AF 65433 70

    25 AL 45645 50 20 DZ 98777 67 21 EG NULL 200 50 N/A
  4. Raw data with metadata Employee # Salary Age Nationality 32423

    110 30 AF 65433 70 25 AL 45645 50 150 DZ 98777 -28 21 EG NULL 200 50 N/A Salary of employee in thouthands Age of employee by year ISO Alpha-2-code standard Data Quality Data Gaps Data Mapping Data Transformatoin
  5. Business metadata Is a category of metadata that describes all

    aspects used for governance, finding & understanding data. ➔ What this dataset is about? ➔ Formal description of the dataset? ➔ Where in the organization it belongs? ➔ Who is using this dataset? ➔ Where is it stored? Dataset Business details Data ownership ➔ Who is the data owner? ➔ Who is the data steward? ➔ In which department the owner and the steward belong? ➔ What is the data quality of this dataset? ➔ Is there any data gaps in the dataset?
  6. Business metadata Is a category of metadata that describes all

    aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. ➔ What is the name of the system that contains this dataset? ➔ What kind of technology is being used? ➔ Who is the system application owner who knows the details of the system? System details Dataset Structure details ➔ What kind of database? ➔ Table, column names ➔ Columns data types? ➔ Columns allowed values? ➔ Column description? ➔ Conditions associated with this table, view or stored procedure?
  7. Operational metadata is a category of metadata that describes processing

    aspects of data at run time. It includes information about the operational side of a system for example, when was this file received, when it was processed, how much time it took to distribute it etc. Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. ➔ How much does it take to process 1 file? ➔ How much resources it took to run this process? ➔ Who triggered this process and when? ➔ How was the system utilization before and after running this process? ➔ How much could resource units it tool to run a use case end to end on daily basis? ➔ Are we utilizing the hardware the we own or rent? Operational Details
  8. Operational metadata is a category of metadata that describes processing

    aspects of data at run time. It includes information about the operational side of a system for example, when was this file received, when it was processed, how much time it took to distribute it etc. Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. Social metadata is a category of metadata that describes the user perspective of the data by its consumers: Social media, Tags, labels,notes User experience ➔ Who is logging to the system? ➔ From which departments the users are coming? ➔ What kind of activities that this user usually do in my system? ➔ How can I improve the user experience based on the metadata that we collect about him/her? User details
  9. Metadata Lake To be metadata driven company, all types of

    metadata need to be collected, stored, aggregated and analyzed in one central place to benefit from the amazing potentials of metadata Operational metadata is a category of metadata that describes processing aspects of data at run time. It includes information about the operational side of a system for example, when was this file received, when it was processed, how much time it took to distribute it etc. Business metadata Is a category of metadata that describes all aspects used for governance, finding & understanding data. Technical metadata Is a category of metadata that describes the structural aspects of data at design time. It includes database system names, table, column names, sizes, data types, allowed values. Social metadata is a category of metadata that describes the user perspective of the data by its consumers: Social media, Tags, labels,notes User experience
  10. Metadata Lake is an architecture proposal that will have the

    ability to enable the capturing, management of the origins, movement, characteristics and transformations of data as it moves through the organisation across various systems, processes and people to support the data lineage needs and some data governance reporting. A lake for Metadata ! Metadata Lake
  11. Store all types of metadata (Business, technical, operational and social

    ) coming from all your company systems that are needed for lineage and all the supported use cases. Combine the metadata when needed to create a complete data set without creating a new dataset. Integrate the connected metadata parts to show lineage across systems. Generate new insights driven from metadata to support operation excellence, regulatory requirements and to show compliance. Profile the harvested metadata to check the quality of it. Verify if the harvested metadata is correct, accurate and delivered based on SLA. Proces all kinds of different metadata in the lake for future use.
  12. Data Providers Data User Data Consumers Self-service DIAL onboarding, approving

    requests Self-service requesting data Metadata Lake NoSQL SQL Metadata Connectors Ready made Custom Harvesting metadata System 1 System 2 System 3 Sub-system 1 Sub-system 2 Sub-system 3 Sub-system 4 Sub-system 5 Sub-system 6 Sub-system 7 Sub-system 8 Sub-system 9 Sub-system 10 Sub-system 11 Sub-system 12 System 4 System 5 . . . Data Marketplace Use Case 1 Use Case 2 Use Case 3 Data Owner/Steward Metadata Lake Reference architecture
  13. Utilizing the harvested metadata and putting it into action in

    your day to day daily data activities. To be truly metadata driven organization, you need to move from passive to active metadata Active Metadata What happens when you apply machine learning (ML) to metadata so that it can be used to make decisions and trigger actions? Active metadata needs to be insightful to be useful for action and needs to be stored and made available in a way that enables operational use. Metadata management platforms apply ML to create insight about the metadata and take action on it. The action can occur by triggering a workflow or a platform may take some action automatically. Passive metadata is the standard way of aggregating and storing metadata into a static data catalog. This usually covers basic technical metadata — schemas, data types, models, etc What if? Value Don’t be passive
  14. Active metadata use cases Active Metadata for Proactive Data Quality

    Systems that can analyze data and evaluate data quality may include data quality information as active metadata. Active Metadata for Improved Data Context In most instances, data usually does not come in perfectly labeled and defined columns. Sometimes columns names are so obscure that they don’t look or sound anything like the data in the underlying column or data asset. Active Metadata for Privacy and Regulation Compliance Metadata management platforms that can identify personal and sensitive information generate active metadata that may be used for regulation compliance.
  15. 1 2 3 4 5 6 Step 1 Connect to

    PowerBI metadata Step 2 Extract the metadata and reverse engineer it to understand the relation between all the data points Step 3 MDL Store the harvested metadata into the metadata lake Step 4 Apply basic analytics to being value out of PowerBI metadata Step 5 Connect to dots and link the technical metadata towards the other types of metadata Step 6 Act, respond and improve the organization based on the powerfulness of metadata Analysing PowerBI metadata | use case PowerBI MDL Scanner API
  16. Use case outcome | KPIs Data usage Data security Top

    10 dashboards Unused dashboards Usage per department User demographics Top used datasets Worst performing dashboards
  17. Metadata enables Data Mesh Architecture Detailed article about Active Metadata

    Lake Mahmoud LinkedIn Let’s stay connected Mahmoud Yassin Senior Data Manager @Booking.com