Dévoiler les secrets d’un modèle de machine learning : une menace crédible ?

Slide 1

Slide 1 text

Uncovering the secrets of a machine learning model: a credible threat?

Slide 2

Slide 2 text

Giulia BIANCHI Data Scientist @PubSapientEng @Giuliabianchl Johan JUBLANC Data Scientist @PubSapientEng JJublanc 2

Slide 3

Slide 3 text

SOMMAIRE Model attacks Differential privacy A model inversion attack: the secret revealer Are attacks credible and avoidable? 3

Slide 4

Slide 4 text

Model attacks 4

Slide 5

Slide 5 text

What do you mean: “attacking a model” ? 5

Slide 6

Slide 6 text

A lot of ways to attack a model 1. Evasion attacks 2. Poisoning attacks 3. Privacy attacks Classifier Toaster Classifier Classifier Classifier Toaster 6

Slide 7

Slide 7 text

Privacy attacks Attacks on training data Model extraction Membership inference attacks Inversion attacks ? 7 7

Slide 8

Slide 8 text

White box vs Black box accessibility Inputs Outputs Securmax Inputs Outputs Securmax ● model type ● model architecture ● parameters’ values 8 8

Slide 9

Slide 9 text

HELLO DATAMONS! Streamèche Realticèle DS-Li FastFeu Prototys SciProduce Securti Securalto SciProdaffe Buildy ML-Li AI-Li Productor SciProdank Securmax 9

Slide 10

Slide 10 text

Differential privacy 10

Slide 11

Slide 11 text

When DP is useful The Secret Sharer: Evaluating and Testing Unintended Memorisation in Neural Networks ● Black-box attack on sequence generative NN ● They introduced a social security number in training data (the secret) and were able to retrieve it at prediction time ● Differential privacy solved the problem Protection against unintended memorisation https://xkcd.com/2169/ 11

Slide 12

Slide 12 text

What DP does input = n entries Model output input = n-1 entries Model output Training a model 12

Slide 13

Slide 13 text

What DP does input = n entries DP Model output input = n-1 entries DP Model output Training a Differentially Private (DP) model 13

Slide 14

Slide 14 text

What DP is ● Differential privacy addresses the paradox of learning nothing about an individual while learning useful information about a population ● Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation ● Differential privacy will provide privacy by process; in particular it will introduce randomness. The (mathematical) theory 14 14

Slide 15

Slide 15 text

From theory to practice Differential privacy is introduced in deep learning algorithms by adding gaussian noise the stochastic gradient descent Differentially private stochastic gradient descent (sgd) DP Model 15 15

Slide 16

Slide 16 text

How to use DP TensorFlow Privacy implements the differentially private versions of common optimizers ● sgd → DPGradientDescentGaussianOptimizer ● adam → DPAdamGaussianOptimizer ● adagrad → DPAdagradGaussianOptimizer ● RMSProp → DPRMSPropGaussianOptimizer Differentially private sgd implementation 16 16

Slide 17

Slide 17 text

data = # read data model = # instantiate model architecture if train_with_differential_privacy == True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches, learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: # train without differential privacy optimizer = GradientDescentOptimizer(learning_rate=learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) How to use DP privacy.optimizers.dp_optimizer. DPGradientDescentGaussianOptimizer tf.optimizers.SGD 17

Slide 18

Slide 18 text

Without DP With DP How to use DP data = # read data model = # instantiate model architecture if train_with_differential_privacy == True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches, learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: # train without differential privacy optimizer = GradientDescentOptimizer(learning_rate=learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) 18

Slide 19

Slide 19 text

Differential privacy vs Model Performances Does adding noise degrade performances? 19

Slide 20

Slide 20 text

Privacy analysis input = n entries DP Model output input = n-1 entries DP Model output How to measure the achieved level of protection: epsilon The "difference" is at most 20

Slide 21

Slide 21 text

Privacy analysis Differential privacy is measurable according to its mathematical deﬁnition and is expressed by two parameters ● epsilon () upper bound on how much the probability of a particular model output can vary by adding or removing a single training point. Also called privacy budget ● delta () bounds the probability of our privacy guarantee not holding How to measure the achieved level of protection 21 21

Slide 22

Slide 22 text

Privacy analysis Evolution of epsilon, accuracy and training time with noise ● Epsilon () depends on noise_multiplier. The greater the noise, the smaller epsilon, the stronger guarantee ● A good value for delta () is << 1/n (n=input shape) MNIST classiﬁcation with 2 convolutional + pooling layers, trained for 15 epochs, TensorFlow v. 1.15 TensorFlow Privacy v. 0.2.2, 8 vCPUs, 30 GB RAM, 1 GPU NVIDIA Tesla K80. Everything else the same but with classical SGD, training took less than 40 seconds. Model architecture and hyperparameters from ofﬁcial tutorial. 22

Slide 23

Slide 23 text

A model inversion attack 23

Slide 24

Slide 24 text

The secret revealer: context THE SECRET REVEALER: GENERATIVE MODEL-INVERSION ATTACKS AGAINST DEEP NEURAL NETWORKS, 2020 24

Slide 25

Slide 25 text

Target model: a portrait classiﬁer [0,1%] Scyprodank [0,2%] Securmax [0,2%] SciProduce [0,2%] ML-Li [99,1%] Streamèche [0,3%] Buildy The goal of the model is to recognize a person given a person's portrait Target model 25 25

Slide 26

Slide 26 text

Model inversion attack Scyprodank Securmax SciProduce ML-Li Streamèche Buildy The goal of the attack is to reconstruct a portrait given a person's name Target model 26 26

Slide 27

Slide 27 text

Needed information to attack Private data Public Data Information used by the adversary Sensitive data under attack Target model Training Datamons’ names Scyprodank Securmax SciProduce ML-Li Streamèche Buildy The original portraits are never accessible, but the objective is to reconstruct them 27

Slide 28

Slide 28 text

The secret revealer: attack 28

Slide 29

Slide 29 text

The attack structure 1. Learn to generate credible portraits 2. Estimate the difference between generated portraits and the portrait of a targeted person 3. Optimise portrait generation Three big steps 29 29

Slide 30

Slide 30 text

1 Generate credible portraits Generative adversarial neural network (GAN) GENERATOR Random latent vector Attempt to create a credible portraits True portraits Random sampling CRITIC ❌ ✔ 30

Slide 31

Slide 31 text

2.1 Retrieve target output vector 0 0 0 0 1 0 Scyprodank Securmax SciProduce ML-Li Streamèche Buildy The adversary wants to reproduce the portrait of Streamèche 31

Slide 32

Slide 32 text

2.2 Compute proximity between reconstructed prediction and target vector 2 Target model Size m 32

Slide 33

Slide 33 text

3 Optimize portrait generation GENERATOR LOSS Weights of every model (Generator, Critic and Target) are frozen and used to perform the gradient descent on the latent vector ❄ Target model proximity ❄ CRITIC ❌✔ credibility ❄ 33

Slide 34

Slide 34 text

Does it work? 34

Slide 35

Slide 35 text

Evaluate performances A global model to make a fair evaluation Public Data Evaluation model Private data Training 35 35

Slide 36

Slide 36 text

Evaluate performances Target model Attack accuracy VGG16 28% ResNet 152 44% face.evolve 46% Evaluation model Streamèche ? Possible only when doing researches (not in real situation) 36 36

Slide 37

Slide 37 text

Examples of reconstructions from scratch Dataset CelebA 37 37

Slide 38

Slide 38 text

● Blurred portrait ● Square mask ● T-shaped mask over mouth and eyes Use auxiliary information to improve results 38 38

Slide 39

Slide 39 text

Use auxiliary information to improve results GENERATOR Random latent vector Attempt to create a credible portraits Aux data 39 39

Slide 40

Slide 40 text

Target model Attack accuracy Blurred aux. info. Attack accuracy Square mask aux. info. Attack accuracy T mask aux. info. VGG16 43% 78% 58% ResNet 152 50% 80% 63% face.evolve 51% 82% 64% Use auxiliary information to improve results 40

Slide 41

Slide 41 text

Example of reconstitution with auxiliary information Aux. info Original Reconst 41

Slide 42

Slide 42 text

Example of reconstitution with auxiliary information Aux. info Original Reconst Aux. info Original Reconst 42

Slide 43

Slide 43 text

Our Experience ❌ Got the same results ✔ Built an attack on MNIST dataset ✔ Retrieved few credible digits Not enough details in the paper ✔ To sum up Build an attack is costly... We recently got the code Surprise it is in Pytorch… we started in Tensorﬂow BUT NOT IMPOSSIBLE In time In skills In money 43 1 4

Slide 44

Slide 44 text

Are attacks credible and avoidable? 44

Slide 45

Slide 45 text

A success depending on restrictive conditions but not impossible to gather 1. A model trained on sensitive data exists 2. Full white box access to this model 3. Not too expensive attack compared to expected proﬁt 45 45

Slide 46

Slide 46 text

1. Models trained on sensitive data ● Facial recognition already developed and deployed in several countries ● Experiments have already been done in France ● A market for software and services of up to $ 7 billion by 2024 worldwide* An example : facial recognition spreading [*] Facial Recognition Market by Component (Software Tools (2D Recognition, 3D Recognition, and Facial Analytics) and Services), Application Area (Emotion Recognition, Access Control, and Law Enforcement), Vertical, and Region - Global Forecast to 2024 46

Slide 47

Slide 47 text

2. Full white box access to the target model ● Several models are in full access, for instance those used to make transfer learning ● Models are generally not perceived as sensitive data ● Some attacks are designed to retrieve features of models (model extraction attacks) 47 47

Slide 48

Slide 48 text

3. Not too expensive attack compared to expected proﬁt ● Build up an attack takes time and is not always successful ● Criminal organisations might have great interests in retrieving people identities 48

Slide 49

Slide 49 text

Some plausible use-cases? ● A witness who changed identity and which portrait has been anonymised ● A website that offers to broadcast anonymous videos Securmax is the one that finished the cake Welcome to Jojo’s channel 49

Slide 50

Slide 50 text

Good practices ● Prevent white box access to models for random users ● For models used through API, limit the number of possible calls to the model for a given period of time (for instance 10 times a day) ● Consider model as sensitive as the data on which they are trained ● Use differential privacy when training models (mainly for membership attacks) 50 50

Slide 51

Slide 51 text

Conclusion ● Probably not an imminent and critical risk ● But it is still a good idea to adopt good practices to prevent future possible security breaches ● We hope you have few new tools to judge by yourself if your model needs to be protected or not 51 51

Slide 52

Slide 52 text

To go further ● Demystifying Membership Inference Attacks in Machine Learning as a Service, Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, and Wenqi Wei, 2019 ● THE SECRET REVEALER: GENERATIVE MODEL-INVERSION ATTACKS AGAINST DEEP NEURAL NETWORKS, Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song, 2020 ● Privacy attacks on Machine Learning, Medium, Ilja Moisejevs, 2019 ● The Algorithmic Foundations of Differential Privacy ● Deep Learning with Differential Privacy ● TF Privacy tutorial ● The Secret Sharer: Evaluating and Testing Unintended Memorisation in Neural Networks ● https://www.cnil.fr/sites/default/ﬁles/atoms/ﬁles/reconnaissance_faciale.pdf ● https://videos.senat.fr/video.1024595_5c5aeeb301759.audition-de-mme-valerie-pecresse-presidente-de-la-region-ile-de-france-sur-le-grand-paris-et- sur-l?timecode=4712000 ● https://www.lemonde.fr/pixels/article/2019/08/28/reconnaissance-faciale-la-cnil-tique-sur-le-bilan-de-l-experience-nicoise_5503769_4408996.html ● https://www.franceculture.fr/emissions/la-bulle-economique/reconnaissance-faciale-quand-les-industriels-poussent-a-son-developpement ● https://www.arte.tv/fr/videos/083310-000-A/tous-surveilles-7-milliards-de-suspects/ 52

Slide 53

Slide 53 text

CONTACT Giulia Bianchi Johan Jublanc [email protected] [email protected] MERCI 53