Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing MongoDB Queryable Encryption (Preview)

Introducing MongoDB Queryable Encryption (Preview)

Presentation at the MongoDB World 2022 conference in New York announcing Public Preview for Queryable Encryption

Kenn White

June 07, 2022

More Decks by Kenn White

Other Decks in Technology


  1. Agenda Security 101 Introducing Queryable Encryption Our journey and roadmap

    What is this magic? CSFLE vs Queryable Encryption How can I try it out? Key Management Enhancements
  2. Moving and storing data: most databases have it covered In-flight,

    over the network TLS Encryption Data is decrypted when it's received on the DB server Reminder: TLS is to protect against network eavesdropping
  3. Moving and storing data: most databases have it covered At-rest,

    on disk Volume Encryption Storage Engine Encryption Network Data is decrypted when the DB starts up Reminder: At-rest encryption is (mostly) to protect non-running databases & backups
  4. But what about data here? Network Disk In-use, in memory

    Very few practical solutions exist to protect data in-use
  5. Data is in plaintext while its being processed by the

    database Data is vulnerable to insider access and active database breaches: • Authorized and compromised administrators, DBAs & privileged users • RAM scraping • Process inspection • Cloud providers In-use, in memory
  6. Data is in plaintext while its being processed by the

    database Data is vulnerable to insider access and active database breaches: • Authorized and compromised administrators , DBAs & users • RAM scraping • Process inspection In-use, in memory This is why we built Client-Side Field Level Encryption!
  7. Option#1 – No encryption of data from client side Query

    to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn = “901-10-4312” 10 records fetched with ssn = “901-10-4312” • Fast querying • But data is not secure in-use
  8. Option#2 – Using popular cloud SDK client-side encryption Query to

    ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn: “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn: “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted field 10 records fetched with ssn = “901-10-4312” All 1 million records fetched • Client-side processing & decryption • Filtering of records on the client side (performance hit) Problem: You can't actually directly search encrypted fields. Not feasible for many use cases.
  9. • Encrypt the sensitive data (fields) • Easy development cycle

    • No crypto experience required • Encrypted throughout the data lifecycle • Rich expressive queries • MongoDB is the only platform to implement fast searchable encryption scheme • Server-side processing of encrypted data • Server does not know anything about the data Queryable Encryption Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted fields 10 records fetched with ssn = “901-10-4312” MongoDB’s Approach
  10. Let’s look closer Query from an authenticated client 1 ssn:

    "901-10-4312" db.billing.find ( { } Encrypted fields are always stored, transmitted, processed, and retrieved as ciphertext, including queries 6 "Jones Glee", "2223-0031-2200-3222", "jones-glee@example.net", "+1-212-555-1234", "901-10-4312" payer: cardNum: email: phone: ssn: { { MongoDB Driver 2 Customer Provisioned Key Provider Cloud Provider KMS, On-prem HSM/Key Service, Cross-cloud KMS "Jones Glee", "r6EaUcgZ41Gerrwd”, "iu233oh35sdso743", "oR72CW4WferrSE3j", "d76b3ad038c0e0ed" payer: cardNum: email: phone: ssn: 4 "6fbbb3f8c3a9f7a" "f72a9a1103d88b6" 3 encrypted search key: "er493grtee4erw" 5
  11. Use Cases Industry: Financial Services Bank application needs to find

    transactions using a range of dates or dollar amounts for fraud detection Industry: Human Resources HR system allows searching for employees by the last 4 digits of their social security number Industry: Health Care Customer support agents needs to find patient records by searching for the first few characters of their name
  12. Queryable Encryption – Key Benefits Rich querying on encrypted data

    Run expressive queries like range, equality, prefix, suffix, substring, and more on encrypted data Ground-breaking query technology, standards-based cryptography Based on strong, standards-based cryptographic primitives End-to-end fully randomized encryption Data never exists in the clear outside of the client Dramatically reduces attack surface Faster app development No crypto experience required Intuitive and easy for developers to set up and use Strong technical controls for critical data privacy use cases Meet the strictest data privacy requirements for confidentiality on security critical workloads Reduce institutional risk Confident in storing and processing your sensitive workloads in MongoDB Atlas (Cloud)
  13. Our Journey & Roadmap June 2022 Aroki Systems acquisition Pioneers

    in Encrypted Search 2019 Post 6.0 Client-Side Field Level Encryption (CSFLE) Equality search on Deterministic encryption 2021 Queryable Encryption Preview Structured Encryption core functionality; Equality search on randomized encryption Post 6.0 Queryable Encryption v1.1 Addition of Range query capabilities Queryable Encryption v1.2 Addition of prefix,suffix, substring query capabilities Future New privacy-enhancing cryptography capabilities Tarik Moataz Seny Kamara Formation of Advanced Cryptography Research group Seny Kamara, Tarik Moataz, and a team of PhD cryptography researchers May 2022
  14. What is in 6.0 Public Preview? Foundational work that enables

    equality and future query types Crypto framework Equality comparisons on randomly encrypted data Equality Queryable Encryption Configuration Decryption Compass
  15. What is a Public Preview? • Available with 6.0 RC

    release ◦ Evaluation only ◦ May be breaking changes ◦ Not recommended for production workloads
  16. Security Model Multi-Snapshot-Secure ◦ A snapshot adversary has (possibly successive)

    point-in-time access to the entire memory & disk of the database server ◦ At that instant, adversary can access the entire DB, any keys stored in memory, all CPU state including L1-3 cache, and all logs
  17. Security Model Formal security guarantees ◦ Encrypted fields of user

    documents are verifiably CCA-secure (secure against chosen ciphertext attacks) ▪ Ciphertexts don't reveal information about the plaintext, beyond encrypted document size… ▪ …even to adversaries that can adaptively query an encryption oracle ◦ Encrypted indexes are verifiably adaptively multi-snapshot secure
  18. The boring part: ◦ Document content encrypted with standard AEAD

    encryption ▪ Encrypt-then-MAC authenticated encryption ◦ AES-CTR-256 with HMAC-SHA256 ◦ Document encryption/decryption only happens on the client, in the application ◦ Top-level 96 byte composite user key on client comprising: ▪ 256-bit AES-CTR key ▪ 256-bit HMAC key ▪ 256-bit key for encrypted search operations How does it work?
  19. How does it work? The boring part: ◦ Key management

    ▪ Envelope encryption protects composite user keys ("field keys") ▪ Database can never access raw key material ▪ Backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service, local key…
  20. How does it work? The innovation: ◦ New type of

    functional search index is introduced ◦ Based on a novel Structured Encryption construction ▪ client is stateless; database maintains (blinded) state ▪ distributed, highly available, highly scalable by design ▪ scheme is robust to client failures, high contention, dropped sessions ▪ index data structures are Encrypted Multi-Maps (EMMs) ◦ a type of reverse or inverted index of encrypted label/tuple pairs ◦ labels are pseudorandom function (PRF) evaluations ◦ fast, efficient EMM lookups ◦ PRFs here == keyed HMACs (HMAC-SHA-256) Reminder: HMACs are secret key digests/tags; the database does not have the key
  21. Both Are Supported CSFLE is NOT being deprecated CSFLE to

    Queryable Encryption migration not (yet) supported CSFLE workloads should stay on CSFLE Queryable Encryption net new only Must be specified at collection creation
  22. How are they the same? CSFLE Queryable Encryption Highest Levels

    of Confidentiality and Integrity (client-side encryption) ✔ ✔ Queryable and Non-Queryable Options ✔ ✔ Authenticated AES Encryption*, 256-bit keys ✔ ✔ Common Key Management Features ✔ ✔ Encrypted data is stored in field as BinData ✔ ✔ Shared Library (replaces mongocryptd) (Coming Soon) ✔ *CSFLE uses CBC mode, Queryable Encryption uses CTR mode
  23. CSFLE Queryable Encryption • Client-side encryption • Server is (largely)

    unaware • Queryability ◦ Equality only - Deterministic ▪ Data leakage on low entropy fields • Flexible key usage ◦ unique key per field ◦ 1 key for all fields ◦ per-document keys • No additional data elements • Client-side encryption • Server is integral • Queryability ◦ New functional search index ◦ Equality - Fully random ▪ No snapshot leakage, even on low entropy fields ◦ Range, prefix, suffix and substring • Requires a unique key per field • Additional data ◦ 1 new field per document ▪ __safeContent__ ◦ 3 new system collections: enxcol_.* ◦ Do not modify any of these!
  24. Trade-offs Inserts Find (equality) Find (range, prefix, suffix, substring) Storage

    Overhead Frequency Leakage FLE Fast Fast No Minimal Possibly Queryable Encryption Slower Fast Yes Yes None Performance testing is in process
  25. How is this different from other solutions Queryable Encryption, using

    Structured Encryption • Ideal for real-time database operations that MongoDB customers need • Software implementation, hardware agnostic • Optimized for sublinear searching Fully or Partial Homomorphic Encryption [FHE] • Not natively offered in any major/commercial general purpose database • For encrypted search, FHE is a poor choice due to weak performance – search speed is linear • Queries slow down significantly as the data set grows • Typically incurs a very heavy computational overhead • Better suited for certain types of secure private computation - sums, statistical means, etc Secure Enclaves • Requires specialized hardware, often cloud-provider proprietary • Keys are still managed by the cloud provider - albeit in hardware • Enclaves are not as powerful as general purpose CPUs, security guarantees unclear
  26. Components and licensing • Core Cryptography Library (drivers, server) ◦

    Atlas, Enterprise Advanced, or Community database ◦ Client-side core library (libmongocrypt) & drivers all Apache licensed ◦ Encourages peer review & feedback from research community ◦ No black-box/proprietary crypto • Automatic Encryption ◦ Atlas & Enterprise Advanced ◦ Enables encryption without app changes, vs helper methods ◦ Shared library crypt_shared replaces cryptd (mongocryptd) package ◦ Shared library package available on Enterprise downloads page
  27. What's needed to try the Queryable Encryption Preview? • 6.0

    Server (rc8+) • Queryable Encryption-aware drivers available now: ◦ Node.js, Java (Sync & Async), Go, Python, C, C# .NET, Ruby, PHP ◦ Coming weeks: (C++, Scala) ◦ On roadmap: Rust, Swift • Automatic encryption (via crypt_shared library) ◦ Atlas (including free tier!) ◦ Enterprise Advanced • Explicit encryption (via ClientEncryption object) ◦ Atlas or Enterprise Advanced ◦ Community
  28. Packages ◦ Amazon Linux 2 / Amazon Linux 2 ARM

    64 ◦ Debian 10.0 / 11.0 ◦ macOS / macOS ARM 64 (Mojave 10.14+) ◦ RedHat / CentOS 7.0 ◦ RedHat / CentOS 7.2 s390x ◦ RedHat / CentOS 8.0 ◦ RedHat / CentOS 8.1 ppc64le ◦ RedHat / CentOS 8.2 ARM 64 ◦ RedHat / CentOS 8.3 s390x ◦ SUSE 12 / 15 ◦ Ubuntu 18.04 / 18.04 ARM 64 ◦ Ubuntu 20.04 / 20.04 ARM 64 ◦ Windows 10 / Server 2016 / Server 2019
  29. • Rotate the entire MongoDB key vault • Previously, only

    new data encryption keys were protected by a new CMK • Key Vault rotation replaces all former versions of CMK seamlessly, via a single API call Key Rotation
  30. • Changing from one key provider to another used to

    require decrypting and re-encrypting all of your data • A single API call now seamlessly migrates your keys from any supported key provider to another one ◦ AWS - GCP ◦ Local - Azure ◦ GCP - KMIP • With no impact to your application or data Key Migration
  31. • Customer can provide Data Encryption Keys used • Meets

    strict compliance requirements for key generation in an HSM • API Custom Key Management
  32. It takes a village Andrew Anna Asya Bernie Boris Clyde

    Craig Dave Davi Divjot Dmitry Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jesse Judah Julie Kaitlin Katia Kevin Mark Mat Nathan Naomi Nick Oz Pramod Ravind Rachael Rachelle Ross Sam Sara Sergei Shane Shreyas Spencer Vincent