Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CIDM 2017 Talk

CIDM 2017 Talk

Gregory Ditzler

November 27, 2017
Tweet

More Decks by Gregory Ditzler

Other Decks in Research

Transcript

  1. Preliminaries Related Works Current Work Conclusions Fine Tuning Lasso in

    an Adversarial Environment Against Gradient Attacks Gregory Ditzler1 and Ashley Prater2 1The University of Arizona, Department of Electrical & Computer Engineering 2Air Force Research Laboratory, Information Directorate [email protected], [email protected] 29 November 2017 SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  2. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Overview Plan of Attack for the Next 20 minutes 1 Overview of Variable Selection 2 Why is an Adversary a Nuisance? 3 Looking at New Avenues of VS Objectives 4 Preliminary Results 5 Applications & Conclusion SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  3. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Learning in Nonstationary Environments Concept drift Concept drift can be modeled as a change in a probability distribution, P(X, Ω). The change can be in P(X), P(X|Ω), P(Ω), or joint changes in P(Ω|X). P(Ω|X) = P(X|Ω)P(Ω) P(X) We generally reserve names for specific types of drift (e.g., real and virtual) Drift types: sudden, gradual, incremental, & reoccurring General Examples: electricity demand, financial, climate, epidemiological, and spam (to name a few) 1G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, “Adaptive strategies for learning in nonstationary environments: a survey,” IEEE Computational Intelligence Magazine, 2015, vol. 10, no. 4, pp. 12–25. 2G. Ditzler, G. Rosen, and R. Polikar, “Domain Adaptation Bounds for Multiple Expert Systems Under Concept Drift,” International Joint Conference on Neural Networks, 2014. (best student paper award) SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  4. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Learning in Nonstationary Environments Concept drift Concept drift can be modeled as a change in a probability distribution, P(X, Ω). The change can be in P(X), P(X|Ω), P(Ω), or joint changes in P(Ω|X). P(Ω|X) = P(X|Ω)P(Ω) P(X) We generally reserve names for specific types of drift (e.g., real and virtual) Drift types: sudden, gradual, incremental, & reoccurring General Examples: electricity demand, financial, climate, epidemiological, and spam (to name a few) NSE and Traditional Classification Algorithms Many traditional learning algorithms assume that data are sampled i.i.d. (CART, Adaboost, etc); however, rarely is that the case Learning in NSE generally requires that the drift in the stream is “structured”. E.g., there is some common information between the different data sets in the stream 1G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, “Adaptive strategies for learning in nonstationary environments: a survey,” IEEE Computational Intelligence Magazine, 2015, vol. 10, no. 4, pp. 12–25. 2G. Ditzler, G. Rosen, and R. Polikar, “Domain Adaptation Bounds for Multiple Expert Systems Under Concept Drift,” International Joint Conference on Neural Networks, 2014. (best student paper award) SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  5. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection feature 1
 feature 2 .… feature 1
 feature 2 .… feature 1
 feature 2 .… x1 x2 xM 2 RK Feature Selection legitimate malicious legitimate y1 y2 yM x0 1 x0 2 x0 M 2 Rk Classification Knowledge
 Discovery Stick figures courtesy of xkcd.com Relevance, Weak Relevance
 and Irrelevance k < K y = (x) SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  6. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection feature 1
 feature 2 .… feature 1
 feature 2 .… feature 1
 feature 2 .… x1 x2 xM 2 RK Feature Selection legitimate malicious legitimate y1 y2 yM x0 1 x0 2 x0 M 2 Rk Classification Knowledge
 Discovery Stick figures courtesy of xkcd.com Relevance, Weak Relevance
 and Irrelevance k < K y = (x) Adversary Abilities Poison: Adversary inserts/deletes/manipulates samples at training time to thwart the objective of a learning algorithm SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  7. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection feature 1
 feature 2 .… feature 1
 feature 2 .… feature 1
 feature 2 .… x1 x2 xM 2 RK Feature Selection legitimate malicious legitimate y1 y2 yM x0 1 x0 2 x0 M 2 Rk Classification Knowledge
 Discovery Stick figures courtesy of xkcd.com Relevance, Weak Relevance
 and Irrelevance k < K y = (x) Adversary Abilities Poison: Adversary inserts/deletes/manipulates samples at training time to thwart the objective of a learning algorithm Evasion: Adversary manipulates malicious samples at evaluation time to attempt to come off as legitimate SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  8. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection x1 x2 xM 2 RK legitimate malicious legitimate y1 y2 yM Stick figures courtesy of xkcd.com I know what you’re using b y legitimate? malicious? f(x) Adversary Abilities Poison: Adversary inserts/deletes/manipulates samples at training time to thwart the objective of a learning algorithm Evasion: Adversary manipulates malicious samples at evaluation time to attempt to come off as legitimate SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  9. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection Review Wrapper Methods Find a subset of features F ⊂ X that provide a minimal loss with a classifier C. Typically provide a smaller loss on a data sets than embedded and filter-based feature selection methods. F may vary depending on the choice of the classifier. Examples: SVM-RFE, distributed wrappers, small loss + high complexity SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  10. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection Review Wrapper Methods Find a subset of features F ⊂ X that provide a minimal loss with a classifier C. Typically provide a smaller loss on a data sets than embedded and filter-based feature selection methods. F may vary depending on the choice of the classifier. Examples: SVM-RFE, distributed wrappers, small loss + high complexity Embedded Methods Optimize the parameters of the classifier and the feature selection at the same time. Examples: lasso, elastic-net, streamwise feature selection, online feature selection SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  11. Preliminaries Related Works Current Work Conclusions Feature Selection and Adversaries

    Feature Selection Review Wrapper Methods Find a subset of features F ⊂ X that provide a minimal loss with a classifier C. Typically provide a smaller loss on a data sets than embedded and filter-based feature selection methods. F may vary depending on the choice of the classifier. Examples: SVM-RFE, distributed wrappers, small loss + high complexity Embedded Methods Optimize the parameters of the classifier and the feature selection at the same time. Examples: lasso, elastic-net, streamwise feature selection, online feature selection Filter Methods Find a subset of features F ⊂ X that maximize a function J that is not tied to classification loss (classifier independent). Generally faster than wrapper and embedded methods, but we cannot assume F will produce minimal loss. Examples: RELIEF, Cond. likelihood maximization, JMI SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  12. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Generalized Regularization Methods LASSO, Elastic Nets and Ridge Regression LASSO, Elastic Nets and Ridge Regression fit linear models of the form: arg min θ∈Rp 1 2n n i=1 yi − θTxi 2 + λ1 p j=1 |θj| + λ2 2 p j=1 (θj)2 SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  13. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Generalized Regularization Methods LASSO, Elastic Nets and Ridge Regression LASSO, Elastic Nets and Ridge Regression fit linear models of the form: arg min θ∈Rp 1 2n n i=1 yi − θTxi 2 + λ1 p j=1 |θj| + λ2 2 p j=1 (θj)2 Configurations Ridge: λ1 = 0 and λ2 = 0 (Tikhonov regularization) LASSO: λ1 > 0 and λ2 = 0 Elastic Nets: λ1 > 0 and λ2 > 0 SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  14. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Generalized Regularization Methods LASSO, Elastic Nets and Ridge Regression LASSO, Elastic Nets and Ridge Regression fit linear models of the form: arg min θ∈Rp 1 2n n i=1 yi − θTxi 2 + λ1 p j=1 |θj| + λ2 2 p j=1 (θj)2 Configurations Ridge: λ1 = 0 and λ2 = 0 (Tikhonov regularization) LASSO: λ1 > 0 and λ2 = 0 Elastic Nets: λ1 > 0 and λ2 > 0 LASSO (n p) and Elastic Nets (n < p) can be used to perform variable selection SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  15. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Where does the adversary gain an edge? Given new data a prediction is made by yi = θTxi and if the true label yi = 1 then the adversary has the objective is to modify xi with f such that θTf(xi ) < 0 < θTxi yi θTf(xi ) = yi θT(xi + ηi ) < 0 yi θTxi < −yi θTηi = −yi τi Therefore err0−1 = 1 n n i=1 yi θTxi < −yi τi sign(yi ) sign(τi ) sign(θTxi ) yi θTxi < −yi τi Outcome + + + + < − No error + + - − < − Get away with it + - + + < + Adversary advantage + - - − < + Legit error - + + − < + Not (as) interesting - + - + < + Not (as) interesting - - + − < − Not (as) interesting - - - + < − Not (as) interesting SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  16. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Learning in the Presence of an Adversary Is Feature Selection Secure? Attacker’s goal is to maximally increase the classification error of methods such as LASSO, Elastic Nets and Ridge Regression 1H. Xiao et al., “Is feature selection secure against training data poisoning?,” International Conference on Machine Learning, 2015. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  17. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Learning in the Presence of an Adversary Is Feature Selection Secure? Attacker’s goal is to maximally increase the classification error of methods such as LASSO, Elastic Nets and Ridge Regression Xiao et al. provided a framework to essentially “break” LASSO by adding in a few new data samples into the training data, such that D := D ∪ xc max xc 1 2m m i=1 (yi − θTxi)2 + λ p j=1 |θj | 1H. Xiao et al., “Is feature selection secure against training data poisoning?,” International Conference on Machine Learning, 2015. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  18. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Learning in the Presence of an Adversary Is Feature Selection Secure? Attacker’s goal is to maximally increase the classification error of methods such as LASSO, Elastic Nets and Ridge Regression Xiao et al. provided a framework to essentially “break” LASSO by adding in a few new data samples into the training data, such that D := D ∪ xc max xc 1 2m m i=1 (yi − θTxi)2 + λ p j=1 |θj | The adversary has access to λ and the training data, which could be a bit far reaching assumption Acts as a wrapper to LASSO by finding xc via a (sub)gradient-ascent algorithm 1H. Xiao et al., “Is feature selection secure against training data poisoning?,” International Conference on Machine Learning, 2015. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  19. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Learning in the Presence of an Adversary Is Feature Selection Secure? Attacker’s goal is to maximally increase the classification error of methods such as LASSO, Elastic Nets and Ridge Regression Xiao et al. provided a framework to essentially “break” LASSO by adding in a few new data samples into the training data, such that D := D ∪ xc max xc 1 2m m i=1 (yi − θTxi)2 + λ p j=1 |θj | The adversary has access to λ and the training data, which could be a bit far reaching assumption Acts as a wrapper to LASSO by finding xc via a (sub)gradient-ascent algorithm Impacts LASSO-based feature selection is quite susceptible to a meticulously carried out attack LASSO can be broken! How can we fix it? 0 0.05 0.1 0.15 0.2 % Poison 0.15 0.2 0.25 0.3 0.35 Error p=150 p=200 p=250 p=300 1H. Xiao et al., “Is feature selection secure against training data poisoning?,” International Conference on Machine Learning, 2015. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  20. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Adversarial Machine Learning and Feature Selection A Wrapper Method Previous work has focused on devising adversary-aware classification algorithms to counter evasion attempts. 1F. Zhang et al., “Adversarial feature selection against evasion attacks,” IEEE Transactions on Cybernetics, 2016. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  21. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Adversarial Machine Learning and Feature Selection A Wrapper Method Previous work has focused on devising adversary-aware classification algorithms to counter evasion attempts. Little work has been considered with the problem of learning secure feature selection-based classifiers against evasion attacks 1F. Zhang et al., “Adversarial feature selection against evasion attacks,” IEEE Transactions on Cybernetics, 2016. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  22. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Adversarial Machine Learning and Feature Selection A Wrapper Method Previous work has focused on devising adversary-aware classification algorithms to counter evasion attempts. Little work has been considered with the problem of learning secure feature selection-based classifiers against evasion attacks Zhang et al. introduced a wrapper for feature selection 1F. Zhang et al., “Adversarial feature selection against evasion attacks,” IEEE Transactions on Cybernetics, 2016. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  23. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary Adversarial Machine Learning and Feature Selection A Wrapper Method Previous work has focused on devising adversary-aware classification algorithms to counter evasion attempts. Little work has been considered with the problem of learning secure feature selection-based classifiers against evasion attacks Zhang et al. introduced a wrapper for feature selection 1: Input: x: the malicious sample; x(0): the initial location of the attack sample; η: step size; : small positive const.; m: max. iterations. 2: i = 1 3: while c(x(i), x) − c(x(i−1), x) < or i ≥ m do 4: if g(x(i)) ≥ 0 then 5: x(i) = x(i−1) − η∇g(x(i−1)) 6: else 7: x(i) = x(i−1) − η∇c(x(i−1), x) 8: end if 9: i = i + 1 10: end while 1F. Zhang et al., “Adversarial feature selection against evasion attacks,” IEEE Transactions on Cybernetics, 2016. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  24. Preliminaries Related Works Current Work Conclusions Lariat, Riata, or Reata

    The Adversary A Game-Theoretic Perspective (Related Work) A Non Zero-Sum Game The adversary pays a cost proportional to the size of “attack”. Game-theoretic model which formulates the interactions between the data miner (the classifier) and the adversary as a non zero-sum game. 1: Input: Training data D = {xi , yi}n i=1 , minimum budget MB, λw, λα and Norm p 2: w = arg min (wT xi , yi) + λw w p 3: while c < MB do 4: α = arg min (wT (xi + α), −1) + λα α p 5: xi = xi + α // for (+) data 6: w = arg min (wT xi , yi) + λw w p 7: c+ = α 1 8: end while SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  25. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Preliminaries What going on with the Adversary? We are provided training data that we can assume is i.i.d. from a source domain DS. The test data are sampled from a target DT , such that kl(DS , DT ) > 0. The adversary controls the target domain, DT , by causing perturbations SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  26. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Preliminaries What going on with the Adversary? We are provided training data that we can assume is i.i.d. from a source domain DS. The test data are sampled from a target DT , such that kl(DS , DT ) > 0. The adversary controls the target domain, DT , by causing perturbations Divergences and Bounds on Learning The H∆H distance measures the maximum difference in expected loss between h, h ∈ H, on two distributions (Kifer et al.) dH∆H (DT , Dk) = 2 sup h,h ∈H |ET [ (h, h )] − Ek[ (h, h )]| Ben-David et al. bounded the loss of a single hypothesis being evaluated on an unknown distribution. ET (h, fT ) ≤ ES (h, f) + λ + 1 2 dH∆H (UT , US) 0 5 10 x 105 0 0.5 1 1.5 2 2.5 3 N VC confidence ν=100 ν=200 ν=500 ν=5000 SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  27. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Thinking Out Loud A Glimpse at the Error Ultimately, we are interested in knowing what the error of h will be on the target domain (i.e., legitimate, malicious, evasive-malicious) ET (h, fT ) SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  28. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Thinking Out Loud A Glimpse at the Error Ultimately, we are interested in knowing what the error of h will be on the target domain (i.e., legitimate, malicious, evasive-malicious) ET (h, fT ) ≤ ES (h, fS) Training Error SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  29. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Thinking Out Loud A Glimpse at the Error Ultimately, we are interested in knowing what the error of h will be on the target domain (i.e., legitimate, malicious, evasive-malicious) ET (h, fT ) ≤ ES (h, fS) Training Error + Divergence on Distributions 1 2 dH∆H (UT , US) + ES (h∗, fS) + ET (h∗, fT ) Terms needs to be taken into account, but an adversary doesn’t give it up where h∗ = arg min ES (h∗, fS) + ET (h∗, fT ) . SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  30. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Thinking Out Loud A Glimpse at the Error Ultimately, we are interested in knowing what the error of h will be on the target domain (i.e., legitimate, malicious, evasive-malicious) ET (h, fT ) ≤ ES (h, fS) Training Error + Divergence on Distributions 1 2 dH∆H (UT , US) + ES (h∗, fS) + ET (h∗, fT ) Terms needs to be taken into account, but an adversary doesn’t give it up where h∗ = arg min ES (h∗, fS) + ET (h∗, fT ) . What would we like to do? LASSO minimizes the sum-squared error and the L1-norm of θ; however, we want to examine a modified objective that tunes the model with adversarial information: θ∗ = arg min β∈Φ              (θ, D) model loss + regularization λΩ(θ) + αΛ(θ, D) adversary              where λ, α > 0 are regularization parameters that control the weight tied to the complexity and adversary, respectfully. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  31. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Thinking Out Loud A Glimpse at the Error Ultimately, we are interested in knowing what the error of h will be on the target domain (i.e., legitimate, malicious, evasive-malicious) ET (h, fT ) ≤ ES (h, fS) Training Error + Divergence on Distributions 1 2 dH∆H (UT , US) + ES (h∗, fS) + ET (h∗, fT ) Terms needs to be taken into account, but an adversary doesn’t give it up where h∗ = arg min ES (h∗, fS) + ET (h∗, fT ) . What would we like to do? LASSO minimizes the sum-squared error and the L1-norm of θ; however, we want to examine a modified objective that tunes the model with adversarial information: θ∗ = arg min β∈Φ              (θ, D) model loss + regularization λΩ(θ) + αΛ(θ, D) adversary              where λ, α > 0 are regularization parameters that control the weight tied to the complexity and adversary, respectfully. What does Λ look like? SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  32. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Thinking Out Loud A Glimpse at the Error Ultimately, we are interested in knowing what the error of h will be on the target domain (i.e., legitimate, malicious, evasive-malicious) ET (h, fT ) ≤ ES (h, fS) Training Error + Divergence on Distributions 1 2 dH∆H (UT , US) + ES (h∗, fS) + ET (h∗, fT ) Terms needs to be taken into account, but an adversary doesn’t give it up where h∗ = arg min ES (h∗, fS) + ET (h∗, fT ) . What would we like to do? LASSO minimizes the sum-squared error and the L1-norm of θ; however, we want to examine a modified objective that tunes the model with adversarial information: θ∗ = arg min β∈Φ              (θ, D) model loss + regularization λΩ(θ) + αΛ(θ, D) adversary              where λ, α > 0 are regularization parameters that control the weight tied to the complexity and adversary, respectfully. What does Λ look like? Ideally convex? SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  33. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments A Modified Objective What do we know? There are “textbook” attack strategies for families of classifiers and the degree of impact is controlled by the knowledge the adversary has about the classifier SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  34. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments A Modified Objective What do we know? There are “textbook” attack strategies for families of classifiers and the degree of impact is controlled by the knowledge the adversary has about the classifier The attacker would likely only be interested in evasion at evaluation time, so causing maximum damage is generally not the objective SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  35. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments A Modified Objective What do we know? There are “textbook” attack strategies for families of classifiers and the degree of impact is controlled by the knowledge the adversary has about the classifier The attacker would likely only be interested in evasion at evaluation time, so causing maximum damage is generally not the objective Given positive (+) data, we can generate evasion sample SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  36. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments A Modified Objective What do we know? There are “textbook” attack strategies for families of classifiers and the degree of impact is controlled by the knowledge the adversary has about the classifier The attacker would likely only be interested in evasion at evaluation time, so causing maximum damage is generally not the objective Given positive (+) data, we can generate evasion sample Modifying the Objective Function In the spirit of LASSO and the Elastic net, add in a new (convex term) that is meant to “fine tune” the classifier and feature selector for possible evasions based on a known malicious sample. θ = arg min β∈Φ 1 2n n i=1 (yi − θTxi)2 where xi are evasion samples and yi = +1. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  37. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments A Modified Objective What do we know? There are “textbook” attack strategies for families of classifiers and the degree of impact is controlled by the knowledge the adversary has about the classifier The attacker would likely only be interested in evasion at evaluation time, so causing maximum damage is generally not the objective Given positive (+) data, we can generate evasion sample Modifying the Objective Function In the spirit of LASSO and the Elastic net, add in a new (convex term) that is meant to “fine tune” the classifier and feature selector for possible evasions based on a known malicious sample. θ = arg min β∈Φ 1 2n n i=1 (yi − θTxi)2 + λ p j=1 |θj| where xi are evasion samples and yi = +1. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  38. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments A Modified Objective What do we know? There are “textbook” attack strategies for families of classifiers and the degree of impact is controlled by the knowledge the adversary has about the classifier The attacker would likely only be interested in evasion at evaluation time, so causing maximum damage is generally not the objective Given positive (+) data, we can generate evasion sample Modifying the Objective Function In the spirit of LASSO and the Elastic net, add in a new (convex term) that is meant to “fine tune” the classifier and feature selector for possible evasions based on a known malicious sample. θ = arg min β∈Φ 1 2n n i=1 (yi − θTxi)2 + λ p j=1 |θj| + α 2m m i=1 (yi − θTxi)2 where xi are evasion samples and yi = +1. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  39. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Experimental Setup Generating the Data A noisy linear system is generate such that yi = θT ∗ xi + η where θ∗ is the “ground truth” sparse model, x ∈ Rp, and η ∼ N(0, Σ). All data sets generated were from underdetermined linear systems for the purpose of this work. Other Implementation Notes Most recent results are benchmarked against GAME: a game-theoretic perspective of adversarial FS LASSO, LASSO-RL and GAME are implemented using CVX Generating the Attacks Zhang et al. presented an approach for developing attacks on linear models. Adversary has partial knowledge of the systems (i.e., noisy view of θ∗) SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  40. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Results err = 1 m m i=1 sign(yi ) sign(θTxi ) , err · 0 = θ 0 , err · 2 2 = θ∗ − θ 2 2 θ∗ 2 2 SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  41. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Results err = 1 m m i=1 sign(yi ) sign(θTxi ) , err · 0 = θ 0 , err · 2 2 = θ∗ − θ 2 2 θ∗ 2 2 Table: Classification error, model error, sparsity and evaluation time for LASSO, LASSO-RL and a game-theoretic approach. Original DS Evasion DT LASSO LASSO-RL GAME LASSO LASSO-RL GAME err 0.181 0.182 0.248 0.2904 0.22472 0.15962 err · 2 2 0.551 0.559 0.716 – – – err · 0 36.48 32.98 48.5 – – – AUC 0.88 0.87 0.81 – – – Time (s) 0.38 11.54 23.86 – – – SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  42. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Results err = 1 m m i=1 sign(yi ) sign(θTxi ) , err · 0 = θ 0 , err · 2 2 = θ∗ − θ 2 2 θ∗ 2 2 Table: Classification error, model error, sparsity and evaluation time for LASSO, LASSO-RL and a game-theoretic approach. Original DS Evasion DT LASSO LASSO-RL GAME LASSO LASSO-RL GAME err 0.181 0.182 0.248 0.2904 0.22472 0.15962 err · 2 2 0.551 0.559 0.716 – – – err · 0 36.48 32.98 48.5 – – – AUC 0.88 0.87 0.81 – – – Time (s) 0.38 11.54 23.86 – – – Take Away Examining a worst-case adversary has negative effects on the classification error SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  43. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Results err = 1 m m i=1 sign(yi ) sign(θTxi ) , err · 0 = θ 0 , err · 2 2 = θ∗ − θ 2 2 θ∗ 2 2 Table: Classification error, model error, sparsity and evaluation time for LASSO, LASSO-RL and a game-theoretic approach. Original DS Evasion DT LASSO LASSO-RL GAME LASSO LASSO-RL GAME err 0.181 0.182 0.248 0.2904 0.22472 0.15962 err · 2 2 0.551 0.559 0.716 – – – err · 0 36.48 32.98 48.5 – – – AUC 0.88 0.87 0.81 – – – Time (s) 0.38 11.54 23.86 – – – Take Away Examining a worst-case adversary has negative effects on the classification error LASSO-LR provides a middle ground between LASSO and the game-theoretic approach SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  44. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Results err = 1 m m i=1 sign(yi ) sign(θTxi ) , err · 0 = θ 0 , err · 2 2 = θ∗ − θ 2 2 θ∗ 2 2 Table: Classification error, model error, sparsity and evaluation time for LASSO, LASSO-RL and a game-theoretic approach. Original DS Evasion DT LASSO LASSO-RL GAME LASSO LASSO-RL GAME err 0.181 0.182 0.248 0.2904 0.22472 0.15962 err · 2 2 0.551 0.559 0.716 – – – err · 0 36.48 32.98 48.5 – – – AUC 0.88 0.87 0.81 – – – Time (s) 0.38 11.54 23.86 – – – Take Away Examining a worst-case adversary has negative effects on the classification error LASSO-LR provides a middle ground between LASSO and the game-theoretic approach Wrapper-based feature selection approaches are generally computationally burdensome (and this is not a large data set) SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  45. Preliminaries Related Works Current Work Conclusions Domain Adaptation A Different

    Objective Experiments Results p = 300, n = 25, and k = 15 Original (DS) Malicious + Evasion (DT ) LASSO LASSO-RL GAME LASSO LASSO-RL GAME err0−1 0.3936 0.408 0.4064 0.32229 0.2784 0.26971 θ 0 23.08 21.64 24.96 – – – AUC 0.64288 0.63087 0.61449 – – – err θ 2 2 1.0054 1.0068 1.0315 – – – Time (s) 0.54484 8.6093 23.2468 – – – p = 300, n = 75, and k = 15 Original (DS) Malicious + Evasion (DT ) LASSO LASSO-RL GAME LASSO LASSO-RL GAME err0−1 0.013867 0.033067 0.13227 0.31593 0.17638 0.089747 θ 0 18.8 21.12 68.08 – – – AUC 0.99874 0.99642 0.94118 – – – err θ 2 2 0.10314 0.16197 0.4414 – – – Time (s) 0.56901 40.0743 40.5725 – – – p = 300, n = 100, and k = 15 Original (DS) Malicious + Evasion (DT ) LASSO LASSO-RL GAME LASSO LASSO-RL GAME err0−1 0.0112 0.0252 0.0748 0.33636 0.21267 0.014797 θ 0 15.68 18.16 75.48 – – – AUC 0.99951 0.99753 0.98266 – – – err θ 2 2 0.074277 0.11905 0.22257 – – – Time (s) 0.61795 57.8623 47.562 – – – SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  46. Preliminaries Related Works Current Work Conclusions Conclusions Questions Conclusions and

    Future Work Conclusions Adversarial learning is quickly becoming a concern about the security of a classifier and off-the-shelf classifiers do not account for the adversary in the environment. We presented a straightforward approach to integrate known adversarial data for a linear model that is based on a gradient attack. Our experimental results show a trade-off between the model’s accuracy on the source distribution and the adversarial distribution. SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  47. Preliminaries Related Works Current Work Conclusions Conclusions Questions Conclusions and

    Future Work Conclusions Adversarial learning is quickly becoming a concern about the security of a classifier and off-the-shelf classifiers do not account for the adversary in the environment. We presented a straightforward approach to integrate known adversarial data for a linear model that is based on a gradient attack. Our experimental results show a trade-off between the model’s accuracy on the source distribution and the adversarial distribution. What else can we work on? How can we formulate Λ in a feature selection objective that does not include a wrapper? We have presented one approach, but this model assumes we know our classifier’s “faults”. How can we improve a trade-off between accuracy, complexity and recall? What strategies can we develop to make the classifier more resilient at testing time? SSCI: Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks
  48. Preliminaries Related Works Current Work Conclusions Conclusions Questions Questions? SSCI:

    Computational Intelligence in Data Mining (29 November 2017) Fine Tuning Lasso in an Adversarial Environment Against Gradient Attacks