Agenda Security 101 Introducing Queryable Encryption Our journey and roadmap What is this magic? CSFLE vs Queryable Encryption How can I try it out? Key Management Enhancements
Moving and storing data: most databases have it covered In-flight, over the network TLS Encryption Data is decrypted when it's received on the DB server Reminder: TLS is to protect against network eavesdropping
Moving and storing data: most databases have it covered At-rest, on disk Volume Encryption Storage Engine Encryption Network Data is decrypted when the DB starts up Reminder: At-rest encryption is (mostly) to protect non-running databases & backups
Data is in plaintext while its being processed by the database Data is vulnerable to insider access and active database breaches: ● Authorized and compromised administrators, DBAs & privileged users ● RAM scraping ● Process inspection ● Cloud providers In-use, in memory
Data is in plaintext while its being processed by the database Data is vulnerable to insider access and active database breaches: ● Authorized and compromised administrators , DBAs & users ● RAM scraping ● Process inspection In-use, in memory This is why we built Client-Side Field Level Encryption!
Option#1 – No encryption of data from client side Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn = “901-10-4312” 10 records fetched with ssn = “901-10-4312” ● Fast querying ● But data is not secure in-use
Option#2 – Using popular cloud SDK client-side encryption Query to ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn: “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn: “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted field 10 records fetched with ssn = “901-10-4312” All 1 million records fetched ● Client-side processing & decryption ● Filtering of records on the client side (performance hit) Problem: You can't actually directly search encrypted fields. Not feasible for many use cases.
● Encrypt the sensitive data (fields) ● Easy development cycle ● No crypto experience required ● Encrypted throughout the data lifecycle ● Rich expressive queries ● MongoDB is the only platform to implement fast searchable encryption scheme ● Server-side processing of encrypted data ● Server does not know anything about the data Queryable Encryption Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted fields 10 records fetched with ssn = “901-10-4312” MongoDB’s Approach
Use Cases Industry: Financial Services Bank application needs to find transactions using a range of dates or dollar amounts for fraud detection Industry: Human Resources HR system allows searching for employees by the last 4 digits of their social security number Industry: Health Care Customer support agents needs to find patient records by searching for the first few characters of their name
Queryable Encryption – Key Benefits Rich querying on encrypted data Run expressive queries like range, equality, prefix, suffix, substring, and more on encrypted data Ground-breaking query technology, standards-based cryptography Based on strong, standards-based cryptographic primitives End-to-end fully randomized encryption Data never exists in the clear outside of the client Dramatically reduces attack surface Faster app development No crypto experience required Intuitive and easy for developers to set up and use Strong technical controls for critical data privacy use cases Meet the strictest data privacy requirements for confidentiality on security critical workloads Reduce institutional risk Confident in storing and processing your sensitive workloads in MongoDB Atlas (Cloud)
Our Journey & Roadmap June 2022 Aroki Systems acquisition Pioneers in Encrypted Search 2019 Post 6.0 Client-Side Field Level Encryption (CSFLE) Equality search on Deterministic encryption 2021 Queryable Encryption Preview Structured Encryption core functionality; Equality search on randomized encryption Post 6.0 Queryable Encryption v1.1 Addition of Range query capabilities Queryable Encryption v1.2 Addition of prefix,suffix, substring query capabilities Future New privacy-enhancing cryptography capabilities Tarik Moataz Seny Kamara Formation of Advanced Cryptography Research group Seny Kamara, Tarik Moataz, and a team of PhD cryptography researchers May 2022
What is in 6.0 Public Preview? Foundational work that enables equality and future query types Crypto framework Equality comparisons on randomly encrypted data Equality Queryable Encryption Configuration Decryption Compass
Security Model Multi-Snapshot-Secure ○ A snapshot adversary has (possibly successive) point-in-time access to the entire memory & disk of the database server ○ At that instant, adversary can access the entire DB, any keys stored in memory, all CPU state including L1-3 cache, and all logs
Security Model Formal security guarantees ○ Encrypted fields of user documents are verifiably CCA-secure (secure against chosen ciphertext attacks) ■ Ciphertexts don't reveal information about the plaintext, beyond encrypted document size… ■ …even to adversaries that can adaptively query an encryption oracle ○ Encrypted indexes are verifiably adaptively multi-snapshot secure
The boring part: ○ Document content encrypted with standard AEAD encryption ■ Encrypt-then-MAC authenticated encryption ○ AES-CTR-256 with HMAC-SHA256 ○ Document encryption/decryption only happens on the client, in the application ○ Top-level 96 byte composite user key on client comprising: ■ 256-bit AES-CTR key ■ 256-bit HMAC key ■ 256-bit key for encrypted search operations How does it work?
How does it work? The boring part: ○ Key management ■ Envelope encryption protects composite user keys ("field keys") ■ Database can never access raw key material ■ Backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service, local key…
How does it work? The innovation: ○ New type of functional search index is introduced ○ Based on a novel Structured Encryption construction ■ client is stateless; database maintains (blinded) state ■ distributed, highly available, highly scalable by design ■ scheme is robust to client failures, high contention, dropped sessions ■ index data structures are Encrypted Multi-Maps (EMMs) ○ a type of reverse or inverted index of encrypted label/tuple pairs ○ labels are pseudorandom function (PRF) evaluations ○ fast, efficient EMM lookups ○ PRFs here == keyed HMACs (HMAC-SHA-256) Reminder: HMACs are secret key digests/tags; the database does not have the key
Both Are Supported CSFLE is NOT being deprecated CSFLE to Queryable Encryption migration not (yet) supported CSFLE workloads should stay on CSFLE Queryable Encryption net new only Must be specified at collection creation
How are they the same? CSFLE Queryable Encryption Highest Levels of Confidentiality and Integrity (client-side encryption) ✔ ✔ Queryable and Non-Queryable Options ✔ ✔ Authenticated AES Encryption*, 256-bit keys ✔ ✔ Common Key Management Features ✔ ✔ Encrypted data is stored in field as BinData ✔ ✔ Shared Library (replaces mongocryptd) (Coming Soon) ✔ *CSFLE uses CBC mode, Queryable Encryption uses CTR mode
CSFLE Queryable Encryption ● Client-side encryption ● Server is (largely) unaware ● Queryability ○ Equality only - Deterministic ■ Data leakage on low entropy fields ● Flexible key usage ○ unique key per field ○ 1 key for all fields ○ per-document keys ● No additional data elements ● Client-side encryption ● Server is integral ● Queryability ○ New functional search index ○ Equality - Fully random ■ No snapshot leakage, even on low entropy fields ○ Range, prefix, suffix and substring ● Requires a unique key per field ● Additional data ○ 1 new field per document ■ __safeContent__ ○ 3 new system collections: enxcol_.* ○ Do not modify any of these!
Trade-offs Inserts Find (equality) Find (range, prefix, suffix, substring) Storage Overhead Frequency Leakage FLE Fast Fast No Minimal Possibly Queryable Encryption Slower Fast Yes Yes None Performance testing is in process
How is this different from other solutions Queryable Encryption, using Structured Encryption ● Ideal for real-time database operations that MongoDB customers need ● Software implementation, hardware agnostic ● Optimized for sublinear searching Fully or Partial Homomorphic Encryption [FHE] ● Not natively offered in any major/commercial general purpose database ● For encrypted search, FHE is a poor choice due to weak performance – search speed is linear ● Queries slow down significantly as the data set grows ● Typically incurs a very heavy computational overhead ● Better suited for certain types of secure private computation - sums, statistical means, etc Secure Enclaves ● Requires specialized hardware, often cloud-provider proprietary ● Keys are still managed by the cloud provider - albeit in hardware ● Enclaves are not as powerful as general purpose CPUs, security guarantees unclear
• Rotate the entire MongoDB key vault • Previously, only new data encryption keys were protected by a new CMK • Key Vault rotation replaces all former versions of CMK seamlessly, via a single API call Key Rotation
• Changing from one key provider to another used to require decrypting and re-encrypting all of your data • A single API call now seamlessly migrates your keys from any supported key provider to another one ○ AWS - GCP ○ Local - Azure ○ GCP - KMIP • With no impact to your application or data Key Migration
It takes a village Andrew Anna Asya Bernie Boris Clyde Craig Dave Davi Divjot Dmitry Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jesse Judah Julie Kaitlin Katia Kevin Mark Mat Nathan Naomi Nick Oz Pramod Ravind Rachael Rachelle Ross Sam Sara Sergei Shane Shreyas Spencer Vincent