PPML JGI

Privacy Preserving Machine Learning Machine Learning on Data you’re not
allowed to see [email protected] @leriomaggio github.com/leriomaggio/ppml-tutorial speakerdeck.com/leriomaggio/ppml-jgi

Provide an overview of the emerging tools (in the ecosystem)
  for Privacy Enhancing Technologies (a.k.a. PETs) with focus on   Machine Learning Aim of this Tutorial Privacy-Preserving Machine Learning (PPML)

• Privacy-Preserving Machine Learning (PPML) technologies have the huge potential
to be the   Data Science paradigm of the future • Joint e ff ort of Open Source & ML & Security Communities • I wish to disseminate the knowledge about these new methods and technologies among researchers • Focus on Reproducibility of PPML work fl ows SSI Fellowship Plans : PPML What I would like to do How I would like to do gather.town Any help or suggestions about Use/Data cases or more generally Case studies, or any contribution to shape the repository will be very much appreciated! Looking forward to collaborations and contributions ☺ Aw a rded by JGI Seed-Corn Fundings 2021 je a ngoldinginstitute.blogs.bristol. a c.uk/2021/01/07/seed-corn-funding-winner- a nnouncement/

PPML Tutorial - Approach: Data Scientist -Always predilige dev/practical aspects
(tools & sw) -Work on the full pipeline   - Perspective: Researcher - References and Further Readings to know more - Live Coding 🧑💻 (wish me luck! 🤞 ) - non-live coding bits will have exercises to play with. github.com/leriomaggio/ppml-tutorial Let’s switch to code to check that we’re all ready to start

Warm up DL Basics & PyTorch Quick Refresher

Deep Learning Terms Everyone on the same page?

also ref: bit.ly/nvidia-dl-glossary Epochs Batches and   mini-batch learning Parameters
vs HyperParameters (e.g. weights vs layers) Loss & Optimiser (e.g. Cross Entropy & SGD) Transfer learning Gradient & Backward Propagation Tensor Deep Learning Terms Everyone on the same page? Generative Adversarial Networks (GAN)

Python has its say Machine Learning Deep Learning “There should
be one, and preferably one, way to do it” The Zen of Python

Multiple Frameworks? Data APIs: Standardization of N-dimensional arrays and dataframes,
by Stephannie Jimenez Gacha   https://2022.pycon.de/program/BMFVFG/

Main features overview review of basic PyTorch features we will
see soon

Tensors, NumPy, Devices Numpy-like API tensor -> ndarray tensor <-
ndarray CUDA support 🙋 torch.cuda

torch.nn   Module subclassing De fi nition of layers (i.e.
tensors) De fi nition of graph (i.e. network) 🙋

Loss and Gradients optimiser criterion   & loss backprop &
update 🙋 torch.optim torch.nn

Dataset and DataLoader transformers Dataset DataLoader 🙋 torch.utils.data

Let’s Introduce Privacy

The Data Vs Privacy AI Dilemma AI models are data
hungry: • The more the data, the better the model • Push for High-quality and Curated* Open Datasets * More on the Curated possible meanings in the next slides! High-sensitive data: we need to keep data safe from both intentional and accidental leakage Data &| Models are kept in silos!

The Data vs Privacy AI Dilemma AI models are data
hungry: • The more the data, the better the model • Push for High-quality and Curated* Open Datasets * More on the Curated possible meanings in the next slide! High-sensitive data: we need to keep data safe from both intentional and accidental leakage Data &| Models are kept in silos! Data accounting for privacy   (privacy preserving data)

Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)
  In the context of k-anonymization problems, a database is a table with n rows and m columns.   Each row of the table represents a record relating to a speci fi c member of a population and the entries in the various rows need not be unique. The values in the various columns are the values of attributes associated with the members of the population. D a t a Anonymity Dataset 🔒K-Anonymised   Dataset Algorithm #1 Algorithm #2 Algorithm #k Data Sharing https://github.com/leriomaggio/privacy-preserving-data-science

Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)
  In the context of k-anonymization problems, a database is a table with n rows and m columns.   Each row of the table represents a record relating to a speci fi c member of a population and the entries in the various rows need not be unique. The values in the various columns are the values of attributes associated with the members of the population. D a t a Anonymity Issues Source: https://venturebeat.com/2020/04/07/2020-census-data-may-not-be-as-anonymous-as-expected/ […] (we) show how these methods can be used in practice to de-anonymize the Netﬂix Prize dataset, a 500,000-record public dataset. Linking Attack

Why don’t we allow AI without moving data from their
silos?

Introducing: Federated Learning

So that’s it ?   Federated Learning to rule them
all ?

Model Vulnerabilities Adversarial Examples ppml-tutorial/1-fast-gradient-sign-method

Model Stealing Model Inversion Attacks ppml-tutorial/2-model-inversion-attack

Introducing Di ff erential Privacy Inspired from: Di ff erential
Privacy on PyTorch | PyTorch Developer Day 2020   youtu.be/l6 f bl2CBnq0

Source:   pinterest.com/ agirlandaglobe/

PPML with Differential Privacy https://ppml-workshop.github.io Di ff erential privacy is
a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Like k-Anonymity, DP is a formal notion of privacy (i.e. it’s possible to prove that a data release has the property).   Unlike k-Anonymity, however, di ff erential privacy is a property of algorithms, and not a property of data. That is, we can prove that an algorithm satis fi es di ff erential privacy; to show that a dataset satis fi es di ff erential privacy, we must show that the algorithm which produced it satis fi es di ff erential privacy.

Learning from Aggregates With Di ff erential Privacy • Aggregate
Count on the Data • Computing Mean • (Complex) Train ML model Di ff erential Privacy within the ML Pipeline ppml-tutorial/3-di ff erential-privacy

Going back to: Federated Learning

Federated Learning   & Encryption

Federated Learning   & Homomorphic Encryption https://blog.openmined.org/ckks-homomorphic-encryption-pytorch-pysyft-seal/ ppml-tutorial/4-federeted-learning

Thank you very much   for your kind attention Valerio
Maggio [email protected] @leriomaggio github.com/leriomaggio/ppml-tutorial speakerdeck.com/leriomaggio/ppml-jgi

PPML JGI

PPML JGI

Valerio Maggio

More Decks by Valerio Maggio

Other Decks in Research

Featured

Transcript

Privacy Preserving Machine Learning Machine Learning on Data you’re not

Provide an overview of the emerging tools (in the ecosystem)

• Privacy-Preserving Machine Learning (PPML) technologies have the huge potential

PPML Tutorial - Approach: Data Scientist -Always predilige dev/practical aspects

Warm up DL Basics & PyTorch Quick Refresher

Deep Learning Terms Everyone on the same page?

also ref: bit.ly/nvidia-dl-glossary Epochs Batches and   mini-batch learning Parameters

Python has its say Machine Learning Deep Learning “There should

Multiple Frameworks? Data APIs: Standardization of N-dimensional arrays and dataframes,

Main features overview review of basic PyTorch features we will

Tensors, NumPy, Devices Numpy-like API tensor -> ndarray tensor <-

torch.nn   Module subclassing De fi nition of layers (i.e.

Loss and Gradients optimiser criterion   & loss backprop &

Dataset and DataLoader transformers Dataset DataLoader 🙋 torch.utils.data

Let’s Introduce Privacy

The Data Vs Privacy AI Dilemma AI models are data

The Data vs Privacy AI Dilemma AI models are data

Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)

Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)

Why don’t we allow AI without moving data from their

Introducing: Federated Learning

So that’s it ?   Federated Learning to rule them

Model Vulnerabilities Adversarial Examples ppml-tutorial/1-fast-gradient-sign-method

Model Stealing Model Inversion Attacks ppml-tutorial/2-model-inversion-attack

Introducing Di ff erential Privacy Inspired from: Di ff erential

Source:   pinterest.com/ agirlandaglobe/

PPML with Differential Privacy https://ppml-workshop.github.io Di ff erential privacy is

Learning from Aggregates With Di ff erential Privacy • Aggregate

Going back to: Federated Learning

Federated Learning   & Encryption

Federated Learning   & Homomorphic Encryption https://blog.openmined.org/ckks-homomorphic-encryption-pytorch-pysyft-seal/ ppml-tutorial/4-federeted-learning

Thank you very much   for your kind attention Valerio