– Preserving privacy of data sets while still being able to extract valid data mining results – Allow data mining from a modified version of the data that contains no sensitive information • Centralized model V.S. Distributed model – RandomizaLon, encrypLon – Secure MulLparty ComputaLon problem 2 NTUEE b97 Chen Chun An
VerLcally parLLoning of data – CollecLng different feature sets – Using Li/Ion baYery leads to brain tumors? • Horizontally parLLoning of data – Union of individual databases • Secure Sum Protocol – Let the value of each site’s input be masked while the global sum of all inputs is universally unknown 4 NTUEE b97 Chen Chun An
lie in [0…n] – IniLator M1 generate random R and sends 1+ to next site – Site i adds its local value Vi – Total Sum = R+∑=0↑−1▒ – Other sites learn nothing • Collusion – Sites i – 1 and i + 1 can compare the values they send and receive to determine the exact value for xi 5 NTUEE b97 Chen Chun An
determined even if other nodes share informaLon together or collude • Measurement Index – Ability to preserve privacy – Ability to resist collusion – Distributed processing ability without needing any third-‐party sites • Example: $35 X’mas party – Number of people/nodes counts 6 NTUEE b97 Chen Chun An
Collusion-‐Resistant Data Mining algorithm presented in this paper is based on SecureSum • Measurement Index – Security: Degree of collusion-‐resistant is −2 – Efficiency: CommunicaLon cost of message number is M(M −1) /2 , Lme requirement is (−1)T • Extending CRDM – Restrain communicaLon cost while maintaining privacy level 8 NTUEE b97 Chen Chun An
Cycle1 = 1-‐2-‐3-‐5-‐4 Cycle2 = 1-‐3-‐4-‐2-‐5 – N3 , N5 on the receiving end, N1 , N4 on the sending end. – K-‐collusion resistant • Hamiltonian Cycle – For C edge-‐disjoint Hamiltonian cycles, M nodes (M > 4), CPSS is =2‧−1 9 NTUEE b97 Chen Chun An
– • Measurement Index – Security: SPoS is full-‐private (−1 private) – Efficiency: Total cost is M(M−1)T /2 , Lme cost is (−1)T when M is even, T when M is odd 12 NTUEE b97 Chen Chun An
−2 – CPSS: For C edge-‐disjoint Hamiltonian cycles, =2‧−1 – SPoS: SPoS is full-‐private (−1 private) • Efficiency – CRDM: CommunicaLon cost of message number is M(M−1) / 2 , Lme requirement is (−1)T – SPoS: Total cost is M(M−1)T /2 , Lme cost is (−1)T when M is even, T when M is odd 13 NTUEE b97 Chen Chun An
Lin, M.Y. Zhu. Tools for Privacy Preserving Distributed Data Mining. • [2] Vaidya, J., Clifton, C. Privacy-preserving data mining: why, how, and when. • [3] S. Urabe, J. Wang, E. Kodama and T. Takata. A High Collusion- Resistant Approach to Distributed Privacy-Preserving Data Mining • [4] Samuel Shepard, Ray Kresman, Larry Dunning. Data Mining and Collusion Resistance. • [5] Bin Yang, Hiroshi Nakagawa, Issei Sato, Jun Sakuma. Collusion-‐ Resistant Privacy-‐Preserving Data Mining. • See more in Privacy-‐Preserving Data Mining and Collusion Resistance. 15 NTUEE b97 Chen Chun An