Securing and Protecting Your Data in the AI Era

Securing and Protecting Your Data in the AI Era Karen
Lopez InfoAdvisors 1

Karen Lopez Microsoft MVP, Data Platform Microsoft Certified Trainer, vExpert
Data management expert, space enthusiast, and #TeamData evangelist www.datamodel.com @datachick.bsky.social

ANY TIME Ask questions Raise issues Leave kudos Don’t hold
them until the end.

www.spaceappschallenge.org

SpotTheStation.NASA.gov 7

Why this topic? Because We Love Our Data

What is new? – Adversarial ML (AML) Generative AI generally
connects to data, records, and databases, providing yet another vector to our data Machine Learning processes give attackers more avenues to manipulate and abuse data Vendors typically do not share what data has been used or will be used to train models 9

Vendors might not even tell you where or how AI
is being used 10

Great Focus on Security, Privacy, and Compliance Data privacy is
becoming increasingly important, with regulations placing greater emphasis on data protection. There is a growing need for data security due to increasing cyber attacks and data breaches. Compliance with regulations and standards is crucial for avoiding legal and financial penalties.

More focus on AI and Data Artificial intelligence (AI) is
growing in popularity as a tool for data analysis. Machine learning, a subset of AI, is used to make predictions and decisions based on data. Neural networks, another subset of AI, apply pattern recognition to data. There is a growing need for ethical considerations and responsible use of data. 12

AI Security Concerns –Generative AI Prompt- Protection Sensitive Data Training
Data Manipulation Model Manipulation Ethics

AI Security Concerns – Generative AI Data Bias Data Misunderstanding
Social Engineering / Fakes Compliance Transparency / Validation

Training • Data (Training and Testing) • Labels • Parameters
• Code Model Deployment • Evasion • Privacy Predictive AI Exposure Points

AI Attacks

Data Poisoning Attackers inject malicious data into the training dataset
to manipulate the model's output.

Label Flipping Attackers deliberately mislabel data to manipulate the model's
prediction output

Targeted Poisoning Attackers manipulate subsets of training data to influence
the model to make specific incorrect predictions.

Backdoor Poisoning Adding a tiny change that humans can’t easily
detect and then using that poison on future data. Examples include objects, reflections, and small triggers

Prompt Stealing Taking carefully crafted prompts and using them without
permission or asking the model to share previous prompts and instructions

Prompt Injection Ask the model to give you the data
or telling it to ignore previous instructions

Jailbreaks and Role- playing Instructions Asking the model to respond
as another persona to bypass guardrails or to otherwise break its rules

NIST on AI Manipulation Evasion Attacks Poisoning Attacks Privacy Attacks
Abuse https://www.nist.gov/news-events/news/2024/01/nist-identifies-types- cyberattacks-manipulate-behavior-ai-systems Predictive AI Generative AI

Data Reconstruction Attackers use a variety of methods to determine
training data 25

Membership Attacks Determine if a particular data point was used
to train a machine learning model 26

Memorization Attacks Extracting sensitive information from a machine learning model
that has “memorized” Training data 2 7

Model Extraction Attacks Stealing the architecture and parameters of a
machine learning model. 28

Property Inference Attacks Inferring the sensitive properties of training data
29

Prompt Leaks Accessing sensitive data, files, or other resources provided
in a prompt 30

Demands for Better Data Trust Data Catalog Data Governance Data
Security Transparent Policies Data Contract Continuous Monitoring 31

Data an AI Success Challenges Data Literacy Data Ethics Tools
Not Keeping Up Changing Data Processes

Data and AI Opportunities Using AI for Data Management Data-driven
Projects Will Increase Demand Increased Focus on Data Quality Extend Data Management Capabilities

No time today, but.. We should certainly be looking at
how to use AI tools and techniques to protect data, just like any other tools We’d need to secure those AI systems from the same things we are talking about today All data protection best practices from non-AI systems still apply.

Every Design Decision must be based on Cost, Benefit and
Risk

What Can You Do? - Mitigations

Data Quality Fund and support data quality and data governance
initiatives

Adversarial Training Training a model on both clean and adversarial
data in order to increase the model's robustness.

Randomized Smoothing Adds random noise to training data in order
to increase the model's robustness by averaging out bad data

Differential Privacy Protecting against reconstruction and membership detection

Verification Checking the accuracy of the model's predictions against a
test dataset using mathematical models

The process of selecting and preparing data for use in
machine learning models. It includes continuous monitoring and protection of that data Training Data Curation

Label Monitoring Monitoring the accuracy of labels in a machine
learning dataset, identifying mislabeled data and correcting the labels.

Trigger Identification Triggers can be words, phrases, or other features
that cause a model to produce incorrect predictions.

Model Inspection Model inspection is important for understanding how a
model is making predictions and identifying potential attacks and manipulations

Supply Chain Security Ensuring the security of all components of
a machine learning system: code, data, configurations, models

Machine Unlearning Removing data from a machine learning model's training
dataset.

Reinforcement Learning from Human Feedback (RPHF) Using feedback from humans
in the training of a model.

Takeaways 1. Data protection in AI opens more attack surfaces
2. Data may be at risk in ways most have not thought of 3. We can use AI to help us secure data 49 Training data Models Prompts Instructions Properties and Attributes

Other Resources https://media.defense.gov/2024/Apr/15/2003439257/-1/-1/0/CSI-DEPLOYING- AI-SYSTEMS-SECURELY.PDF https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial- intelligence/guidance-on-ai-and-data-protection/

Thank You @DataChick [email protected] 54

Securing and Protecting Your Data in the AI Era

Securing and Protecting Your Data in the AI Era

More Decks by Karen Lopez

Other Decks in Technology

Featured

Transcript