Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Data Collaboration Meets Privacy: Privacy-...

When Data Collaboration Meets Privacy: Privacy-enhancing Technologies on AWS

Presented at AWS Community Day Bay Area Security Edition 2024

Richard Fan

March 21, 2024
Tweet

More Decks by Richard Fan

Other Decks in Technology

Transcript

  1. Privacy-enhancing Technologies (PETs) • Homomorphic encryption (HE) • Trusted Execution

    Environments (TEE) • Secure multi-party computation (SMPC) • Differential Privacy (DP) • Federated learning (FL)
  2. Trusted Execution Environment (TEE) Secured area of a processor Isolated

    from the rest of the system Cannot accessed/modified by system admins or OS E.g. AWS Nitro Enclaves, Intel SGX, AMD SEV-SNP, Apple Secure Enclave, Arm TrustZone, NVIDIA Confidential Computing
  3. AWS Nitro Enclaves Isolated virtual machine Communication over a secure

    channel No admin access No persistent storage No external networking https://aws.amazon.com/blogs/aws/aws-nitro-enclaves-isolated-ec2-environments-to-process-confidential-data/
  4. Federated Learning Machine Learning Training data stay locally Central node

    aggregate models https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/From-Privacy-to-Partnership.pdf
  5. Federated Learning – Use cases Medical ML models COINSTAC –

    FL tools on Neuroimages Estimate brain age from MRIs Accuracy is similar to tradition models https://www.biorxiv.org/content/10.1101/2021.05.10.443469v2.full.pdf
  6. Amazon SageMaker – Distributed Training Run training jobs in parallel

    Scale across instances Make training fast https://aws.amazon.com/campaigns/sagemaker/
  7. Amazon SageMaker – Federated Learning Connect accounts via VPC Peering

    Federated Learning Libraries: • FedML • OpenFL • Flower • TensorFlow Federated https://aws.amazon.com/blogs/machine-learning/machine-learning-with-decentralized-training-data-using-federated-learning-on-amazon-sagemaker/ https://aws.amazon.com/blogs/machine-learning/part-1-federated-learning-on-aws-with-fedml-health-analytics-without-sharing-sensitive-data/ https://aws.amazon.com/blogs/machine-learning/enable-data-sharing-through-federated-learning-a-policy-approach-for-chief-digital-officers/
  8. Revealing medical records of Governor, 1997 Ethnicity Visit date Diagnosis

    Procedure Medication Total charge Birth date Gender ZIP Name Address Date registered Affiliation Date last vote Voter registry Medical record https://epic.org/wp-content/uploads/privacy/reidentification/Sweeney_Article.pdf
  9. Birth date Gender ZIP Revealing medical records of Governor, 1997

    Ethnicity Visit date Diagnosis Procedure Medication Total charge 6 people Medical record Voter registry Name Address Date registered Affiliation Date last vote https://epic.org/wp-content/uploads/privacy/reidentification/Sweeney_Article.pdf
  10. Revealing medical records of Governor, 1997 Ethnicity Visit date Diagnosis

    Procedure Medication Total charge 3 people Medical record Voter registry Birth date Gender ZIP Name Address Date registered Affiliation Date last vote https://epic.org/wp-content/uploads/privacy/reidentification/Sweeney_Article.pdf
  11. Revealing medical records of Governor, 1997 Ethnicity Visit date Diagnosis

    Procedure Medication Total charge 1 person Medical record Voter registry Birth date Gender ZIP Name Address Date registered Affiliation Date last vote https://epic.org/wp-content/uploads/privacy/reidentification/Sweeney_Article.pdf
  12. De-anonymization on Netflix Prize Dataset, 2006 Subscribe #12345 Catch Me

    If You Can Forrest Gump Fahrenheit 9/11 Jesus of Nazareth John Catch Me If You Can Forrest Gump https://arxiv.org/pdf/cs/0610105.pdf
  13. De-anonymization on Netflix Prize Dataset, 2006 John Catch Me If

    You Can Forrest Gump Fahrenheit 9/11 Jesus of Nazareth John Catch Me If You Can Forrest Gump Fahrenheit 9/11 Jesus of Nazareth https://arxiv.org/pdf/cs/0610105.pdf
  14. Differential Privacy (DP) A statistical measure How much an individual

    impact a dataset Important concepts: • Privacy Budget (ε) • Noise
  15. Lower ε (Leak less privacy) Name Country City Alice US

    LA … … … Bob US Chicago Peter UK London Privacy Budget (ε) – How best John can hide from my question? How many people live in US? Name Country City John US Monterey Alice US LA … … … Bob US Chicago Peter UK London 100 99
  16. Higher ε (Leak more privacy) Name Country City Alice US

    LA … … … Bob US Chicago Peter UK London Privacy Budget (ε) – How best John can hide from my question? How many people live in Monterey? Name Country City John US Monterey Alice US LA … … … Bob US Chicago Peter UK London 1 0
  17. Name Country City John US Monterey Alice US LA Bob

    US Chicago … … Peter UK London Noise – Helping John hide How many people live in Monterey? 0? 1? 2? 10?
  18. Differential Privacy (DP) Lower ε → More privacy preserved Higher

    noise → Lower ε → More privacy preserved Higher noise → Lower accuracy → Lower usability
  19. AWS Clean Rooms Sharing Glue Tables across accounts Analysis Rules

    limit query capability Differential Privacy (In Preview)
  20. AWS Clean Rooms – Demo Member ID City Gender Education

    … … … … 123456 Charlottetown Male Doctor … … … … … … … … Loyalty member database
  21. Recap Trusted Execution Environment (TEE) → AWS Nitro Enclaves Federated

    Learning (FL) → Amazon SageMaker Differential Privacy (DP) → AWS Clean Rooms