Privacy-Preserving Data Mining and Collusion Resistance

C8ac5985121a65cbe37e11e5cdfeb02c?s=47 Andro Chen
January 03, 2012

Privacy-Preserving Data Mining and Collusion Resistance

2012-01-03 Privacy-Preserving Data Mining and Collusion Resistance
Final Project of 資料勘測, 陳銘憲 @ NTUEE

C8ac5985121a65cbe37e11e5cdfeb02c?s=128

Andro Chen

January 03, 2012
Tweet

Transcript

  1. 2.

    PPDM •  Privacy  concern  grows   •  Main  idea  

    –  Preserving privacy of data sets while still being able to extract valid data mining results   –  Allow data mining from a modified version of the data that contains no sensitive information   •  Centralized  model  V.S.  Distributed  model   –  RandomizaLon,  encrypLon   –  Secure  MulLparty  ComputaLon  problem   2 NTUEE  b97  Chen  Chun  An
  2. 3.

    PPDM  –  General  Tools[1] •  Privacy  Preserving  Distributed  Data  Mining

      –  SMC  problem   •  Efficient  method   –  Secure  Sum   –  Secure  Set  Union   –  Secure  Size  of  Set  IntersecLon   –  Scalar  Product   3 NTUEE  b97  Chen  Chun  An
  3. 4.

    Secure  MulLparty  ComputaLon •  CollaboraLon  may  brings  knowledge   • 

    VerLcally  parLLoning  of  data   –  CollecLng  different  feature  sets   –  Using  Li/Ion  baYery  leads  to  brain  tumors?   •  Horizontally  parLLoning  of  data   –  Union  of  individual  databases   •  Secure  Sum  Protocol   –  Let  the  value  of  each  site’s  input  be  masked  while  the  global  sum   of  all  inputs  is  universally  unknown   4 NTUEE  b97  Chen  Chun  An
  4. 5.

    Secure  Sum[2] •  Mechanism   –  Values  are  known  to

     lie  in  [0…n]   –  IniLator  M1  generate  random  R     and  sends  1+  to  next  site   –  Site  i  adds  its  local  value  Vi   –  Total  Sum  =  R+∑=0↑−1▒    –  Other  sites  learn  nothing   •  Collusion   –  Sites  i  –  1  and  i  +  1  can  compare  the  values  they  send  and  receive   to  determine  the  exact  value  for  xi   5 NTUEE  b97  Chen  Chun  An
  5. 6.

    Collusion-­‐Resistant  PPDM •  Requirement   –  Secret  value  cannot  be

     determined  even  if  other  nodes  share   informaLon  together  or  collude   •  Measurement  Index   –  Ability  to  preserve  privacy   –  Ability  to  resist  collusion   –  Distributed  processing  ability  without  needing  any  third-­‐party  sites   •  Example:  $35  X’mas  party   –  Number  of  people/nodes  counts 6 NTUEE  b97  Chen  Chun  An
  6. 8.

    Secure  Sum  &  CRDM[3] •  Anonymous  communicaLon  –  SecureSum – 

    Collusion-­‐Resistant  Data  Mining  algorithm  presented  in  this  paper   is  based  on  SecureSum     •  Measurement  Index   –  Security:  Degree  of  collusion-­‐resistant  is  −2   –  Efficiency:  CommunicaLon  cost  of  message  number  is  ​M(M −1)  /2 ,     Lme  requirement  is  (−1)T   •  Extending  CRDM   –  Restrain  communicaLon  cost  while  maintaining  privacy  level     8 NTUEE  b97  Chen  Chun  An
  7. 9.

    Cycle-­‐ParLLoned  Secure  Sum[4] •  Count  V2  by  collusion   – 

    Cycle1  =  1-­‐2-­‐3-­‐5-­‐4   Cycle2  =  1-­‐3-­‐4-­‐2-­‐5   –  N3 ,  N5  on  the  receiving  end,   N1 ,  N4  on  the  sending  end.   –  K-­‐collusion  resistant   •  Hamiltonian  Cycle   –  For  C  edge-­‐disjoint  Hamiltonian     cycles,  M  nodes  (M  >  4),   CPSS  is  =2‧−1 9 NTUEE  b97  Chen  Chun  An
  8. 10.

    Protocol-­‐based  soluLon[5] •  Basic  Secure  Product  of  SummaLon  Protocol  

    –            •  CondiLons   –  Inputs,  summaLon  of  χs  and  γs  are  revealed  to  no  one   10 NTUEE  b97  Chen  Chun  An
  9. 12.

    Protocol-­‐based  soluLon[5] •  SPoS  Stage  2  –  CompuLng  the  Result

      –                •  Measurement  Index   –  Security:  SPoS  is  full-­‐private  (−1  private)   –  Efficiency:  Total  cost  is  ​M(M−1)T  /2 ,     Lme  cost  is  (−1)T  when  M  is  even,  T  when  M  is  odd     12 NTUEE  b97  Chen  Chun  An
  10. 13.

    Comparison •  Security   –  CRDM:  Degree  of  collusion-­‐resistant  is

     −2   –  CPSS:  For  C  edge-­‐disjoint  Hamiltonian  cycles,  =2‧−1   –  SPoS:  SPoS  is  full-­‐private  (−1  private)   •  Efficiency   –  CRDM:  CommunicaLon  cost  of  message  number  is  ​M(M−1)  / 2 ,     Lme  requirement  is  (−1)T   –  SPoS:  Total  cost  is  ​M(M−1)T  /2 ,     Lme  cost  is  (−1)T  when  M  is  even,  T  when  M  is  odd   13 NTUEE  b97  Chen  Chun  An
  11. 15.

    References •  [1]  C Clifton, M Kantarcioglu, J Vaidya,  X

    Lin, M.Y. Zhu.    Tools for                Privacy Preserving Distributed Data Mining.   •  [2]  Vaidya, J., Clifton, C. Privacy-preserving data mining:  why,              how, and when.   •  [3]  S.  Urabe,  J.  Wang,  E.  Kodama  and  T.  Takata.  A High Collusion-                    Resistant Approach to Distributed Privacy-Preserving Data Mining   •  [4]  Samuel  Shepard,  Ray  Kresman,  Larry  Dunning.  Data  Mining  and                Collusion  Resistance.   •  [5]  Bin  Yang,  Hiroshi  Nakagawa,  Issei  Sato,  Jun  Sakuma.  Collusion-­‐                    Resistant  Privacy-­‐Preserving  Data  Mining.   •  See  more  in  Privacy-­‐Preserving  Data  Mining  and  Collusion  Resistance.   15 NTUEE  b97  Chen  Chun  An
  12. 16.

                       

     Thank  you 16 NTUEE  b97  Chen  Chun  An