Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evaluating Differentially Private Machine Learning in Practice

David Evans
August 16, 2019

Evaluating Differentially Private Machine Learning in Practice

Bargav Jayaraman's talk at USENIX Security Symposium 2019.
Paper: https://arxiv.org/abs/1902.08874

David Evans

August 16, 2019
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Evaluating Differentially Private
    Machine Learning in Practice
    Bargav Jayaraman and David Evans
    Department of Computer Science
    University of Virginia

    View Slide

  2. Data
    Machine
    Learning
    M
    Our Objective
    To evaluate the privacy leakage of private mechanisms
    Leakage is quantified in terms of inference attacks

    View Slide

  3. Result Highlights
    Privacy Budget ϵ
    Accuracy Loss
    Privacy Leakage
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    Theoretical
    Guarantee
    RDP
    RDP Leakage
    Naïve Composition (NC)
    NC Leakage
    Neural Network on
    CIFAR-100

    View Slide

  4. Rest of the Talk
    1. Background on Applying Differential Privacy to
    Machine Learning
    2. Experimental Evaluation of Differentially
    Private Machine Learning Implementations

    View Slide

  5. Applying DP to Machine Learning
    Data
    Machine Learning
    Define Objective
    Function
    Iterate for T epochs:
    Calculate
    Gradients
    Update Model
    M
    Gradient
    Perturbation
    Objective
    Perturbation Output
    Perturbation

    View Slide

  6. [CM09]
    ERM Algorithms using
    2006 2008 2010 2012 2014 2016 2018
    [D06]
    [DMNS06]
    [CMS11]
    [PRR10] [ZZXYW12]
    [JT13]
    [JT14]
    [WFWJN15]
    [HCB16]
    ϵ = 0.2
    ϵ = 0.8
    ϵ = 0.5
    ϵ = 0.1 ϵ = 1
    ϵ = 0.2
    [WLKCJN17]
    ϵ = 0.05
    ϵ ≤ 1
    DP
    Introduced
    Objective
    Perturbation
    Output
    Perturbation
    ϵ = 0.2
    ϵ = 0.2

    View Slide

  7. Applying DP to Deep Learning
    Data
    Machine Learning
    Define Objective
    Function
    Iterate for T epochs:
    Calculate
    Gradients
    Update Model
    M
    Gradient
    Perturbation
    Objective
    Perturbation Output
    Perturbation

    View Slide

  8. Deep Learning requiring high value
    [SS15]
    [ZZWCWZ18]
    ϵ = 100
    ϵ = 369,200
    ϵ
    2006 2008 2010 2012 2014 2016 2018
    [D06]
    [DMNS06]
    [PRR10] [ZZXYW12] [JT14] [HCB16]
    ϵ = 0.2
    ϵ = 0.8
    ϵ = 0.5
    ϵ = 0.1 ϵ = 1
    ϵ = 0.2 ϵ = 0.05
    ϵ = 0.2
    ϵ = 0.2
    [CM09] [CMS11] [JT13] [WFWJN15] [WLKCJN17]

    View Slide

  9. Improving Composition
    If each iteration is -DP
    By composition, model: -DP
    ϵ

    Model is: (O( Tϵ), δ)-DP
    Concentrated DP
    Zero Concentrated DP
    Rènyi DP
    Moments Accountant
    [Dwork et al. (2016)]
    [Bun & Steinke (2016)]
    [Abadi et al. (2016)]
    [Mironov (2017)]
    Data
    Machine Learning
    Define Objective
    Function
    Iterate for T epochs:
    Calculate
    Gradients
    Update Model
    M
    Gradient
    Perturbation

    View Slide

  10. [ACGMMTZ16]
    [PAEGT16]
    ϵ = 8
    ϵ = 8
    Lower value with recent DP notions
    [BDFKR18]
    [HCS18]
    [YLPGT19]
    [GKN17]
    ϵ = 3
    ϵ = 4
    ϵ = 21.5
    ϵ
    [SS15]
    [ZZWCWZ18]
    ϵ = 100
    ϵ = 369,200
    2006 2008 2010 2012 2014 2016 2018
    [D06]
    [DMNS06]
    [PRR10] [ZZXYW12] [JT14] [HCB16]
    ϵ = 0.2
    ϵ = 0.8
    ϵ = 0.5
    ϵ = 0.1 ϵ = 1
    ϵ = 0.2 ϵ = 0.05
    ϵ = 0.2
    ϵ = 0.2
    [CM09] [CMS11] [JT13] [WFWJN15] [WLKCJN17]
    ϵ = 8
    Privacy Budget ϵ
    Privacy Leakage
    Accuracy Loss
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    RDP Acc Loss
    NC Acc Loss
    NC Leakage
    RDP Leakage
    Theoretical
    Guarantee
    ϵ = 3

    View Slide

  11. Experiments
    Task
    Model Evaluation Metric
    Logistic Regression
    Neural Network
    100 class classification

    on CIFAR-100
    100 class classification

    on Purchase-100
    Accuracy Loss
    Privacy Leakage
    Code Available: https://github.com/bargavj/EvaluatingDPML

    View Slide

  12. Training and Testing
    M
    Machine
    Learning
    Training Set
    Test Set
    Accuracy Loss
    Data Set
    (
    1 −
    Accuracy of Private Model
    Accuracy of Non-Private Model)

    View Slide

  13. 0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    NC
    zCDP
    RDP
    RDP has 0.10 accuracy loss
    at = 10 and NC at = 500
    Privacy Budget ϵ
    Accuracy Loss
    ϵ ϵ
    Logistic Regression on
    CIFAR-100

    View Slide

  14. Membership Inference Attacks
    Predict
    Membership
    Data Set
    M
    (TPR − FPR)
    M1 M2 Mk
    A
    Expected Training Loss
    1
    n
    n

    i=1
    ℓ(di
    , θ)
    Reza Shokri,
    Marco Stronati,
    Congzheng Song,
    Vitaly Shmatikov
    (S&P 2017)
    Samuel Yeom,
    Irene Giacomelli,
    Matt Fredrikson,
    Somesh Jha
    (CSF 2018)
    Privacy Leakage

    View Slide

  15. 0.00
    0.03
    0.06
    0.09
    0.12
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    NC
    zCDP
    RDP
    Privacy Budget ϵ
    Privacy Leakage
    Theoretical
    Guarantee
    0.55
    PPV
    Non-private model has 

    0.12 leakage with 0.56 PPV
    RDP has 0.06 leakage
    at = 10 and NC at = 500
    ϵ ϵ
    Logistic Regression on
    CIFAR-100

    View Slide

  16. Neural Networks
    NN has 103,936 trainable parameters so it has more capacity to learn on training data
    Input Layer
    Hidden Layer 1 Hidden Layer 2
    Output Layer
    50 Neurons
    256 Neurons
    100 Neurons
    256 Neurons

    View Slide

  17. 0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    NC
    zCDP
    RDP
    Privacy Budget ϵ
    Accuracy Loss
    RDP has 0.53 accuracy loss
    at = 10 and NC at = 500
    ϵ ϵ
    Neural Network on
    CIFAR-100

    View Slide

  18. 0.00
    0.13
    0.25
    0.38
    0.50
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    0.71
    PPV
    0.74
    PPV
    0.71
    PPV
    NC
    zCDP
    RDP
    Privacy Budget ϵ
    Privacy Leakage
    Theoretical
    Guarantee
    Non-private model has 

    0.72 leakage with 0.94 PPV
    Neural Network on
    CIFAR-100
    RDP has 0.07 leakage
    at = 10 and NC at = 500
    ϵ ϵ

    View Slide

  19. Run 1
    Run 2
    4370
    1080
    FP (2126)
    TP (6150)
    FP (2157)
    TP (6156)
    0.74 PPV
    0.74 PPV
    0.80 PPV
    New results, see updated paper in arXiv

    View Slide

  20. 0.00
    0.25
    0.50
    0.75
    1.00
    >= 0 >= 1 >= 2 >= 3 >= 4 = 5
    Number of times identified as member (out of 5 runs)
    True Members
    Non Members 0.822 PPV
    0.817 PPV
    0.797 PPV
    0.749 PPV
    0.656 PPV
    0.500 PPV
    Fraction of Data Set
    Random,
    Independent
    Predictions

    View Slide

  21. 0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    Privacy Budget ϵ
    Accuracy Loss
    Privacy Leakage
    Theoretical
    Guarantee
    RDP Acc Loss
    RDP Leakage
    NC Acc Loss
    Conclusion
    Non-private model has 

    0.12 leakage with 0.56 PPV
    0.55
    PPV
    There is privacy leakage,
    but not considerable, even
    for non-private model
    Logistic Regression on
    CIFAR-100
    NC Leakage

    View Slide

  22. 0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    Privacy Budget ϵ
    Accuracy Loss
    Privacy Leakage
    Theoretical
    Guarantee
    RDP Acc Loss
    RDP Leakage
    NC Acc Loss
    NC Leakage
    Bridging the gap between
    theoretical bound on leakage and
    the leakage of practical attacks
    Conclusion
    Neural Network on
    CIFAR-100
    Non-private model has 

    0.72 leakage with 0.94 PPV
    0.74
    PPV
    Privacy doesn’t come for free

    View Slide

  23. 0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000
    Privacy Budget ϵ
    Accuracy Loss
    Privacy Leakage
    Theoretical
    Guarantee
    RDP Acc Loss
    RDP Leakage
    NC Acc Loss
    Bridging the gap between
    theoretical bound on leakage and
    the leakage of practical attacks
    Conclusion
    Questions?
    Thank You!
    Bargav Jayaraman

    [email protected]
    Privacy doesn’t come for free
    NC Leakage
    https://github.com/bargavj/EvaluatingDPML

    View Slide