Updating Data Programs with Responsible and Ethical AI

Updating Data Programs with Responsible and Ethical AI Karen Lopez
InfoAdvisors 1

Karen Lopez Microsoft MVP, Data Platform Microsoft Certified Trainer, vExpert
Data management expert, space enthusiast, and #TeamData evangelist www.datamodel.com @datachick.bksy.social

Bjarni Valdimar Tryggvason Icelandic-born Canadian engineer and a NRC/CSA astronaut.
He served as a Payload Specialist on Space Shuttle mission STS-85 in 1997, a nearly 12-day mission to study changes in the Earth's atmosphere. Bjarni is the first, and as of 2024, only Canadian astronaut of Icelandic birth. 5 https://en.wikipedia.org/w/index.php?title=Bjarni_Tryggvason&oldid=1289917372

2 AI Perspectives Preparing and managing data for AI uses
Leveraging AI for data management and govenrance programs

Why and How?

Why This Matters AI is here We need to be
prepared We need to be responsible We might want to be lazy

Letter Core Principle Explanation F Fairness Avoiding bias, ensuring equity
across demographics, and addressing systemic inequalities in data and outcomes. A Accountability Assigning clear responsibility for AI decisions, enabling auditability, and ensuring recourse mechanisms. S Sustainability Minimizing environmental impact, ensuring long-term viability, and supporting social sustainability. T Transparency Making AI systems explainable, understandable, and open to scrutiny by stakeholders.

Story time… Hallucinations & Horsefeathers

By 2015, it was clear that the system was not
rating candidates in a gender-neutral way because it was built on data accumulated from CVs submitted to the firm mostly from males, Reuters claimed. The system started to penalise CVs which included the word "women". The program was edited to make it neutral to the term but it became clear that the system could not be relied upon, Reuters was told.

Other fake headlines said that U.S. Secretary of Defence nominee
Pete Hegseth had been "fired," that Secretary of State nominee Marco Rubio had been "confirmed," and that Israeli Prime Minister Benjamin Netanyahu had been "arrested." None were true.

IBM Watson for Oncology In 2018, IBM’s Watson for Oncology,
lauded as a revolutionary tool for AI- enabled personalized cancer treatment, encountered significant setbacks due to inaccuracies and unsafe treatment recommendations. The system's reliance on synthetic data, coupled with limited real-world patient data, underscored the critical importance of data quality and diversity in AI-driven healthcare solutions. Consequently, the accuracy and efficacy of the AI- generated outcomes were insufficient, and IBM ultimately decided to discontinue its Watson for Oncology solution. This case exemplifies the imperative for rigorous data validation protocols to generate high-value recommendations; further, an overreliance on synthetic data can diminish AI effectiveness and model accuracy. Further, this AI setback may also be an example of a problem better left to humans – in this case oncologists with years of specialized training, experience, and highly- contextual knowledge of the most complex of systems, the human body. https://www.ethicsc.harvard.edu/blog/post-8-abyss-examining-ai-failures-and-lessons-learned

Air Canada was taken to court and asked to pay
a refund offered by its chatbot, the company tried to argue that “the chatbot is a separate legal entity that is responsible for its own actions.” Air Canada’s argument was that because the chatbot response included a link to a page on the site outlining the policy correctly, Moffat should’ve known better. At the moment, the Air Canada chatbot is not on the website.Feel free to imagine it locked in a room somewhere, having its algorithms hit with hockey sticks, if you like

The real estate listing company took a $304 million inventory
write-down in the third quarter, which it blamed on having recently purchased homes for prices that are higher than it thinks it can sell them. The company saw its stock plunge and it now plans to cut 2,000 jobs, or 25% of its staff. The algorithms continued to assume that the market was still hot and overestimated home prices. In machine learning (ML), this kind of problem is known as “concept drift” and this does appear to be at the heart of the problem with Zillow Offers. https://insideainews.com/2021/12/13/the-500mm-debacle-at- zillow-offers-what-went-wrong-with-the-ai-models/

Data Programs Data Governance, Data Management, and DataOps

Data Governance Components • Business Goals & Stakeholders • Ethics
and Repsonsiblity • Data Privacy, Security, and Compliance • Policies & Standards • Montitoring and Meausring • Data Quality • Metadata • Reference Data Management • Data Catalogs and Inventories

Data Governance • Change Management • Program and Project Management
• Data Lifecycle Management • Data Literacy & Culture • Data Products • Data Contracts • DG Operating Models • Data Governance Goverance ☺

Data Management Components Data Architecture Data Modeling & Design Data
Storage & Operations Data Security Data Integration & Interoperability Document & Content Management Reference & Master Data Data Warehousing & Business Intelligence Metadata Data Quality

DataOps Components • Automated Data Pipelines • CI/CD Approaches •
Data Quality Monitoring • Data Testing and Validation • Version Control • Security & Access • Operational Monitoring • Ochestration • Collaboration • Agile & Iterative

People and Process Problems There’s always someone…

FUD Trust Job loss potential Change fatigue Loss of control
fears Ethical / legal concerns People Challenges

Fix it Start with Co-Piloting & laziness Encourage traceability analysis
and work Engage everyone Measure success rates Lots of training and support

Where AI Can be Used in Data Programs Data Quality
Data Classification Data Preparation Data Security Data Observation & Monitoring Data Support and Literacy Data Lineage Detection Glossary Building Auditing and Compliance Anomaly Detection Data Profiling Data Growth and Capacity Planning Generative use cases – test data Semantic Data Mapping Ethical Risks Data Cleansing

What Karen Worries About Not enough time allocated People being
people Magical thinking Missed disclosures Biased data & files Skills gaps High-risk blindness Data & Model Poisoning Legislation that can’t keep up AI trained on AI slop (model collapse) Model drift

Data Literacy Challenges Lessons learned from the DW era Data
history & stories Stats knowledge Understanding bias Understanding AI limits Dealing with missing data

Data Program AI Readiness Invest in Data Governance Integrate Data
Governance with AI Governance Improve Data Quality processes Implement Bias Detection and Montoring Ensure Documentation Use Synthetic Data

Data Management AI Readiness Refocus on metadata programs Strengthen security
measures Do more automation Build a data-driven culture Human-in-the-loop Respect Data Privacy Data Program AI Readiness

Use Protection Techniques Protect and Assess Training Data Use Third-party
reviewers Design for Accessibility and Inclusion Learn AI Safety Defenses Leverage Ethical Frameworks Share Professional Learning Data Program AI Readiness

10 Tips Do personal learning and education on AI topics
Speak up Build up your Data Governance programs Bring metadata methods back to the forefront Communicate, often, the importance of monitoring and auditing

10 Tips Leverage AI, smartly and ethically to build these
programs Enhance all data management segments with AI Ensure data professionals are part of the strategic planning Evangelize the importance of cross-group collaboration Build Data Literacy programs

Karen Lopez Datachick.bksy.social [email protected]

Updating Data Programs with Responsible and Eth...

Updating Data Programs with Responsible and Ethical AI

Karen Lopez PRO

More Decks by Karen Lopez

Other Decks in Technology

Featured

Transcript