Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PPML JGI

PPML JGI

**Privacy-Preserving Machine Learning**: Machine learning on data you _cannot see_.

Privacy guarantees are one of the most crucial requirements when it comes to analyse sensitive information. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning (ML) models could also be exploited to _leak_ sensitive data when _attacked_ and no counter-measure is put in place.

*Privacy-preserving machine learning* (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.

This workshop will be mainly organised in **two parts**. In the first part, we will explore one example of ML model exploitation (i.e. _inference attack_ ) to reconstruct original data from a trained model, and we will then see how **differential privacy** can help us protecting the privacy of our model, with _minimum disruption_ to the original pipeline. In the second part of the workshop, we will examine a more complicated ML scenario to train Deep learning networks on encrypted data, with specialised _distributed federated_ _learning_ strategies.

Valerio Maggio

June 15, 2022
Tweet

More Decks by Valerio Maggio

Other Decks in Research

Transcript

  1. Privacy Preserving Machine Learning Machine Learning on Data you’re not

    allowed to see [email protected] @leriomaggio github.com/leriomaggio/ppml-tutorial speakerdeck.com/leriomaggio/ppml-jgi
  2. Provide an overview of the emerging tools (in the ecosystem)

    
 for Privacy Enhancing Technologies (a.k.a. PETs) with focus on 
 Machine Learning Aim of this Tutorial Privacy-Preserving Machine Learning (PPML)
  3. • Privacy-Preserving Machine Learning (PPML) technologies have the huge potential

    to be the 
 Data Science paradigm of the future • Joint e ff ort of Open Source & ML & Security Communities • I wish to disseminate the knowledge about these new methods and technologies among researchers • Focus on Reproducibility of PPML work fl ows SSI Fellowship Plans : PPML What I would like to do How I would like to do gather.town Any help or suggestions about Use/Data cases or more generally Case studies, or any contribution to shape the repository will be very much appreciated! Looking forward to collaborations and contributions ☺ Aw a rded by JGI Seed-Corn Fundings 2021 je a ngoldinginstitute.blogs.bristol. a c.uk/2021/01/07/seed-corn-funding-winner- a nnouncement/
  4. PPML Tutorial - Approach: Data Scientist -Always predilige dev/practical aspects

    (tools & sw) -Work on the full pipeline 
 - Perspective: Researcher - References and Further Readings to know more - Live Coding 🧑💻 (wish me luck! 🤞 ) - non-live coding bits will have exercises to play with. github.com/leriomaggio/ppml-tutorial Let’s switch to code to check that we’re all ready to start
  5. also ref: bit.ly/nvidia-dl-glossary Epochs Batches and 
 mini-batch learning Parameters

    vs HyperParameters (e.g. weights vs layers) Loss & Optimiser (e.g. Cross Entropy & SGD) Transfer learning Gradient & Backward Propagation Tensor Deep Learning Terms Everyone on the same page? Generative Adversarial Networks (GAN)
  6. Python has its say Machine Learning Deep Learning “There should

    be one, and preferably one, way to do it” The Zen of Python
  7. Multiple Frameworks? Data APIs: Standardization of N-dimensional arrays and dataframes,

    by Stephannie Jimenez Gacha 
 https://2022.pycon.de/program/BMFVFG/
  8. torch.nn 
 Module subclassing De fi nition of layers (i.e.

    tensors) De fi nition of graph (i.e. network) 🙋
  9. The Data Vs Privacy AI Dilemma AI models are data

    hungry: • The more the data, the better the model • Push for High-quality and Curated* Open Datasets * More on the Curated possible meanings in the next slides! High-sensitive data: we need to keep data safe from both intentional and accidental leakage Data &| Models are kept in silos!
  10. The Data vs Privacy AI Dilemma AI models are data

    hungry: • The more the data, the better the model • Push for High-quality and Curated* Open Datasets * More on the Curated possible meanings in the next slide! High-sensitive data: we need to keep data safe from both intentional and accidental leakage Data &| Models are kept in silos! Data accounting for privacy 
 (privacy preserving data)
  11. Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)

    
 In the context of k-anonymization problems, a database is a table with n rows and m columns. 
 Each row of the table represents a record relating to a speci fi c member of a population and the entries in the various rows need not be unique. The values in the various columns are the values of attributes associated with the members of the population. D a t a Anonymity Dataset 🔒K-Anonymised 
 Dataset Algorithm #1 Algorithm #2 Algorithm #k Data Sharing https://github.com/leriomaggio/privacy-preserving-data-science
  12. Privacy-Preserving Data Data Anonymisation Techniques: e.g. k-anonimity • (From Wikipedia)

    
 In the context of k-anonymization problems, a database is a table with n rows and m columns. 
 Each row of the table represents a record relating to a speci fi c member of a population and the entries in the various rows need not be unique. The values in the various columns are the values of attributes associated with the members of the population. D a t a Anonymity Issues Source: https://venturebeat.com/2020/04/07/2020-census-data-may-not-be-as-anonymous-as-expected/ […] (we) show how these methods can be used in practice to de-anonymize the Netflix Prize dataset, a 500,000-record public dataset. Linking Attack
  13. Introducing Di ff erential Privacy Inspired from: Di ff erential

    Privacy on PyTorch | PyTorch Developer Day 2020 
 youtu.be/l6 f bl2CBnq0
  14. PPML with Differential Privacy https://ppml-workshop.github.io Di ff erential privacy is

    a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Like k-Anonymity, DP is a formal notion of privacy (i.e. it’s possible to prove that a data release has the property). 
 Unlike k-Anonymity, however, di ff erential privacy is a property of algorithms, and not a property of data. That is, we can prove that an algorithm satis fi es di ff erential privacy; to show that a dataset satis fi es di ff erential privacy, we must show that the algorithm which produced it satis fi es di ff erential privacy.
  15. Learning from Aggregates With Di ff erential Privacy • Aggregate

    Count on the Data • Computing Mean • (Complex) Train ML model Di ff erential Privacy within the ML Pipeline ppml-tutorial/3-di ff erential-privacy
  16. Thank you very much 
 for your kind attention Valerio

    Maggio [email protected] @leriomaggio github.com/leriomaggio/ppml-tutorial speakerdeck.com/leriomaggio/ppml-jgi