PPML PyConDE

Privacy Preserving Machine Learning Machine Learning on Data you’re not
allowed to see [email protected] @leriomaggio github.com/leriomaggio/ppml-pyconde speakerdeck.com/leriomaggio/ppml-pyconde

Aim of this Tutorial Privacy-Preserving Machine Learning (PPML) - Approach:
Data Scientist -Always predilige dev/practical aspects (tools & sw) -Work on the full pipeline   - Perspective: Researcher - References and Further Readings to know more - Live Coding 🧑💻 (wish me luck! 🤞 ) - non-live coding bits will have exercises to play with. Provide an overview of the emerging technologies (in the Python ecosystem)   for Privacy Protection with speci f ic Focus on Machine Learning

• Privacy-Preserving Machine Learning (PPML) technologies have the huge potential
to be the   Data Science paradigm of the future • Joint e ff ort of Open Source & ML & Security Communities • I wish to disseminate the knowledge about these new methods and technologies among researchers • Focus on Reproducibility of PPML work fl ows SSI Fellowship Plans : PPML What I would like to do How I would like to do gather.town Any help or suggestions about Use/Data cases or more generally Case studies, or any contribution to shape the repository will be very much appreciated! Looking forward to collaborations and contributions ☺

Warm up DL Basics & PyTorch Quick Refresher

Deep Learning Terminology Everyone on the same page? also ref:
bit.ly/nvidia-dl-glossary Epochs Batches and   mini-batch learning Parameters vs HyperParameters (e.g. weights vs layers) Loss & Optimiser (e.g. Cross Entropy & SGD) Transfer learning Gradient &   Backward Propagation Tensor

Python has its say Machine Learning Deep Learning “There should
be one, and preferably one, way to do it” The Zen of Python

Multiple Frameworks? Data APIs: Standardization of N-dimensional arrays and dataframes,
by Stephannie Jimenez Gacha   https://2022.pycon.de/program/BMFVFG/

Deep Learning Frameworks Static Graph Dynamic Graph X b W
* + σ xTW + b (xTW + b) σ Computational Graph Models Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 epoch 1, batch 1 epoch 1, batch 2 + + y L y’ fc2 fc3 fc1 X fc5 fc4 + + y L y’ fc2 fc3 fc1 X fc5 fc4

Deep Learning Frameworks Static Graph Dynamic Graph X b W
* + σ xTW + b (xTW + b) σ Backwards and Gradient Computation Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + y L y’ fc2 fc3 fc1 X fc5 fc4 Backprop Record Replay Autograd &

Main features overview review of basic PyTorch features we will
see soon

Tensors, NumPy, Devices N u m p y -l i
k e A P I t en s o r -> n d a rr a y t en s o r <- nd a rr a y C UD A s u pp o rt

Neural Module subclassing De fi nition of layers (i.e. tensors)
De fi nition of graph (i.e. network)

Loss and Gradients optimiser criterion   & loss backprop &
update

Dataset and DataLoader transformers Dataset DataLoader

Let’s Talk about Privacy

The Data Vs Privacy AI Dilemma AI models are data
hungry: • The more the data, the better the model • Push for High-quality and Curated* Open Datasets * More on the Curated possible meanings in the next slides! High-sensitive data: we need to keep data safe from both intentional and accidental leakage Data &| Models are kept in silos!

The Data vs Privacy AI Dilemma AI models are data
hungry: • The more the data, the better the model • Push for High-quality and Curated* Open Datasets * More on the Curated possible meanings in the next slide! High-sensitive data: we need to keep data safe from both intentional and accidental leakage Data &| Models are kept in silos! Data accounting for privacy   (privacy preserving data)

Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)
  In the context of k-anonymization problems, a database is a table with n rows and m columns.   Each row of the table represents a record relating to a speci fi c member of a population and the entries in the various rows need not be unique. The values in the various columns are the values of attributes associated with the members of the population. D a t a Anonymity Source: https://venturebeat.com/2020/04/07/2020-census-data-may-not-be-as-anonymous-as-expected/ We then show how these methods can be used in practice to de-anonymize the Netﬂix Prize dataset, a 500,000-record public dataset. Linking Attack

Why don’t we allow analysis without moving data from their
silos at all?

Introducing: Federated Learning

So that’s it ?   Federated Learning to rule them
all ?

Model Vulnerabilities Adversarial Examples

Model Stealing Model Inversion Attacks

Federated Learning   & Encryption

Federated Learning   & Homomorphic Encryption https://blog.openmined.org/ckks-homomorphic-encryption-pytorch-pysyft-seal/

PPML with Differential Privacy https://ppml-workshop.github.io Di ff erential privacy is
a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.

Introducing Di ff erential Privacy Inspired from: Di ff erential
Privacy on PyTorch | PyTorch Developer Day 2020   youtu.be/l6 f bl2CBnq0

Source:   pinterest.com/ agirlandaglobe/

Learning from Aggregates With Di ff erential Privacy • Aggregate
Count on the Data • Computing Mean • (Complex) Train ML model Di ff erential Privacy within the ML Pipeline

So that we could see all of that in action
☺ Let’s switch to code now github.com/leriomaggio/ppml-pyconde

Agenda • Part 1: Model Vulnerabilities and Attacks • Adversarial
Examples ( in Practice ) • Model Inversion Attack   • Part 2: Federated Machine Learning • Federated Data • Federated Learning (SplitNN) • Part 3: DL and Di ff erential Privacy (DP) • Model Training with DP

Thank you very much   for your kind attention Valerio
Maggio [email protected] @leriomaggio github.com/leriomaggio/ppml-pyconde speakerdeck.com/leriomaggio/ppml-pyconde

PPML PyConDE

PPML PyConDE

Valerio Maggio

More Decks by Valerio Maggio

Other Decks in Research

Featured

Transcript

Privacy Preserving Machine Learning Machine Learning on Data you’re not

Aim of this Tutorial Privacy-Preserving Machine Learning (PPML) - Approach:

• Privacy-Preserving Machine Learning (PPML) technologies have the huge potential

Warm up DL Basics & PyTorch Quick Refresher

Deep Learning Terminology Everyone on the same page? also ref:

Python has its say Machine Learning Deep Learning “There should

Multiple Frameworks? Data APIs: Standardization of N-dimensional arrays and dataframes,

Deep Learning Frameworks Static Graph Dynamic Graph X b W

Deep Learning Frameworks Static Graph Dynamic Graph X b W

Main features overview review of basic PyTorch features we will

Tensors, NumPy, Devices N u m p y -l i

Neural Module subclassing De fi nition of layers (i.e. tensors)

Loss and Gradients optimiser criterion   & loss backprop &

Dataset and DataLoader transformers Dataset DataLoader

Let’s Talk about Privacy

The Data Vs Privacy AI Dilemma AI models are data

The Data vs Privacy AI Dilemma AI models are data

Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)

Why don’t we allow analysis without moving data from their

Introducing: Federated Learning

So that’s it ?   Federated Learning to rule them

Model Vulnerabilities Adversarial Examples

Model Stealing Model Inversion Attacks

Federated Learning   & Encryption

Federated Learning   & Homomorphic Encryption https://blog.openmined.org/ckks-homomorphic-encryption-pytorch-pysyft-seal/

PPML with Differential Privacy https://ppml-workshop.github.io Di ff erential privacy is

Introducing Di ff erential Privacy Inspired from: Di ff erential

Source:   pinterest.com/ agirlandaglobe/

Learning from Aggregates With Di ff erential Privacy • Aggregate

So that we could see all of that in action

Agenda • Part 1: Model Vulnerabilities and Attacks • Adversarial

Thank you very much   for your kind attention Valerio