Slide 1

Slide 1 text

Product Talk Queryable Encryption Kenn White Security Principal World 2022 Cynthia Braund Sr Product Manager

Slide 2

Slide 2 text

Agenda Security 101 Introducing Queryable Encryption Our journey and roadmap What is this magic? CSFLE vs Queryable Encryption How can I try it out? Key Management Enhancements

Slide 3

Slide 3 text

Security 101

Slide 4

Slide 4 text

Moving and storing data: most databases have it covered In-flight, over the network TLS Encryption Data is decrypted when it's received on the DB server Reminder: TLS is to protect against network eavesdropping

Slide 5

Slide 5 text

Moving and storing data: most databases have it covered At-rest, on disk Volume Encryption Storage Engine Encryption Network Data is decrypted when the DB starts up Reminder: At-rest encryption is (mostly) to protect non-running databases & backups

Slide 6

Slide 6 text

But what about data here? Network Disk In-use, in memory Very few practical solutions exist to protect data in-use

Slide 7

Slide 7 text

Data is in plaintext while its being processed by the database Data is vulnerable to insider access and active database breaches: ● Authorized and compromised administrators, DBAs & privileged users ● RAM scraping ● Process inspection ● Cloud providers In-use, in memory

Slide 8

Slide 8 text

It's a trust problem

Slide 9

Slide 9 text

Data is in plaintext while its being processed by the database Data is vulnerable to insider access and active database breaches: ● Authorized and compromised administrators , DBAs & users ● RAM scraping ● Process inspection In-use, in memory This is why we built Client-Side Field Level Encryption!

Slide 10

Slide 10 text

Option#1 – No encryption of data from client side Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn = “901-10-4312” 10 records fetched with ssn = “901-10-4312” ● Fast querying ● But data is not secure in-use

Slide 11

Slide 11 text

Option#2 – Using popular cloud SDK client-side encryption Query to ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn: “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn: “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted field 10 records fetched with ssn = “901-10-4312” All 1 million records fetched ● Client-side processing & decryption ● Filtering of records on the client side (performance hit) Problem: You can't actually directly search encrypted fields. Not feasible for many use cases.

Slide 12

Slide 12 text

Introducing Queryable Encryption

Slide 13

Slide 13 text

● Encrypt the sensitive data (fields) ● Easy development cycle ● No crypto experience required ● Encrypted throughout the data lifecycle ● Rich expressive queries ● MongoDB is the only platform to implement fast searchable encryption scheme ● Server-side processing of encrypted data ● Server does not know anything about the data Queryable Encryption Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted fields 10 records fetched with ssn = “901-10-4312” MongoDB’s Approach

Slide 14

Slide 14 text

Let’s look closer Query from an authenticated client 1 ssn: "901-10-4312" db.billing.find ( { } Encrypted fields are always stored, transmitted, processed, and retrieved as ciphertext, including queries 6 "Jones Glee", "2223-0031-2200-3222", "[email protected]", "+1-212-555-1234", "901-10-4312" payer: cardNum: email: phone: ssn: { { MongoDB Driver 2 Customer Provisioned Key Provider Cloud Provider KMS, On-prem HSM/Key Service, Cross-cloud KMS "Jones Glee", "r6EaUcgZ41Gerrwd”, "iu233oh35sdso743", "oR72CW4WferrSE3j", "d76b3ad038c0e0ed" payer: cardNum: email: phone: ssn: 4 "6fbbb3f8c3a9f7a" "f72a9a1103d88b6" 3 encrypted search key: "er493grtee4erw" 5

Slide 15

Slide 15 text

Use Cases Industry: Financial Services Bank application needs to find transactions using a range of dates or dollar amounts for fraud detection Industry: Human Resources HR system allows searching for employees by the last 4 digits of their social security number Industry: Health Care Customer support agents needs to find patient records by searching for the first few characters of their name

Slide 16

Slide 16 text

Queryable Encryption – Key Benefits Rich querying on encrypted data Run expressive queries like range, equality, prefix, suffix, substring, and more on encrypted data Ground-breaking query technology, standards-based cryptography Based on strong, standards-based cryptographic primitives End-to-end fully randomized encryption Data never exists in the clear outside of the client Dramatically reduces attack surface Faster app development No crypto experience required Intuitive and easy for developers to set up and use Strong technical controls for critical data privacy use cases Meet the strictest data privacy requirements for confidentiality on security critical workloads Reduce institutional risk Confident in storing and processing your sensitive workloads in MongoDB Atlas (Cloud)

Slide 17

Slide 17 text

Our journey & roadmap

Slide 18

Slide 18 text

Our Journey & Roadmap June 2022 Aroki Systems acquisition Pioneers in Encrypted Search 2019 Post 6.0 Client-Side Field Level Encryption (CSFLE) Equality search on Deterministic encryption 2021 Queryable Encryption Preview Structured Encryption core functionality; Equality search on randomized encryption Post 6.0 Queryable Encryption v1.1 Addition of Range query capabilities Queryable Encryption v1.2 Addition of prefix,suffix, substring query capabilities Future New privacy-enhancing cryptography capabilities Tarik Moataz Seny Kamara Formation of Advanced Cryptography Research group Seny Kamara, Tarik Moataz, and a team of PhD cryptography researchers May 2022

Slide 19

Slide 19 text

What is in 6.0 Public Preview? Foundational work that enables equality and future query types Crypto framework Equality comparisons on randomly encrypted data Equality Queryable Encryption Configuration Decryption Compass

Slide 20

Slide 20 text

What is a Public Preview? ● Available with 6.0 RC release ○ Evaluation only ○ May be breaking changes ○ Not recommended for production workloads

Slide 21

Slide 21 text

Post 6.0 - Additional Query Types Range Prefix/Suffix Substring

Slide 22

Slide 22 text

What is this magic?

Slide 23

Slide 23 text

Security Model Multi-Snapshot-Secure ○ A snapshot adversary has (possibly successive) point-in-time access to the entire memory & disk of the database server ○ At that instant, adversary can access the entire DB, any keys stored in memory, all CPU state including L1-3 cache, and all logs

Slide 24

Slide 24 text

Security Model Formal security guarantees ○ Encrypted fields of user documents are verifiably CCA-secure (secure against chosen ciphertext attacks) ■ Ciphertexts don't reveal information about the plaintext, beyond encrypted document size… ■ …even to adversaries that can adaptively query an encryption oracle ○ Encrypted indexes are verifiably adaptively multi-snapshot secure

Slide 25

Slide 25 text

The boring part: ○ Document content encrypted with standard AEAD encryption ■ Encrypt-then-MAC authenticated encryption ○ AES-CTR-256 with HMAC-SHA256 ○ Document encryption/decryption only happens on the client, in the application ○ Top-level 96 byte composite user key on client comprising: ■ 256-bit AES-CTR key ■ 256-bit HMAC key ■ 256-bit key for encrypted search operations How does it work?

Slide 26

Slide 26 text

How does it work? The boring part: ○ Key management ■ Envelope encryption protects composite user keys ("field keys") ■ Database can never access raw key material ■ Backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service, local key…

Slide 27

Slide 27 text

How does it work? The innovation: ○ New type of functional search index is introduced ○ Based on a novel Structured Encryption construction ■ client is stateless; database maintains (blinded) state ■ distributed, highly available, highly scalable by design ■ scheme is robust to client failures, high contention, dropped sessions ■ index data structures are Encrypted Multi-Maps (EMMs) ○ a type of reverse or inverted index of encrypted label/tuple pairs ○ labels are pseudorandom function (PRF) evaluations ○ fast, efficient EMM lookups ○ PRFs here == keyed HMACs (HMAC-SHA-256) Reminder: HMACs are secret key digests/tags; the database does not have the key

Slide 28

Slide 28 text

CSFLE vs Queryable Encryption

Slide 29

Slide 29 text

Both Are Supported CSFLE is NOT being deprecated CSFLE to Queryable Encryption migration not (yet) supported CSFLE workloads should stay on CSFLE Queryable Encryption net new only Must be specified at collection creation

Slide 30

Slide 30 text

How are they the same? CSFLE Queryable Encryption Highest Levels of Confidentiality and Integrity (client-side encryption) ✔ ✔ Queryable and Non-Queryable Options ✔ ✔ Authenticated AES Encryption*, 256-bit keys ✔ ✔ Common Key Management Features ✔ ✔ Encrypted data is stored in field as BinData ✔ ✔ Shared Library (replaces mongocryptd) (Coming Soon) ✔ *CSFLE uses CBC mode, Queryable Encryption uses CTR mode

Slide 31

Slide 31 text

CSFLE Queryable Encryption ● Client-side encryption ● Server is (largely) unaware ● Queryability ○ Equality only - Deterministic ■ Data leakage on low entropy fields ● Flexible key usage ○ unique key per field ○ 1 key for all fields ○ per-document keys ● No additional data elements ● Client-side encryption ● Server is integral ● Queryability ○ New functional search index ○ Equality - Fully random ■ No snapshot leakage, even on low entropy fields ○ Range, prefix, suffix and substring ● Requires a unique key per field ● Additional data ○ 1 new field per document ■ __safeContent__ ○ 3 new system collections: enxcol_.* ○ Do not modify any of these!

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Trade-offs Inserts Find (equality) Find (range, prefix, suffix, substring) Storage Overhead Frequency Leakage FLE Fast Fast No Minimal Possibly Queryable Encryption Slower Fast Yes Yes None Performance testing is in process

Slide 34

Slide 34 text

How is this different from other solutions Queryable Encryption, using Structured Encryption ● Ideal for real-time database operations that MongoDB customers need ● Software implementation, hardware agnostic ● Optimized for sublinear searching Fully or Partial Homomorphic Encryption [FHE] ● Not natively offered in any major/commercial general purpose database ● For encrypted search, FHE is a poor choice due to weak performance – search speed is linear ● Queries slow down significantly as the data set grows ● Typically incurs a very heavy computational overhead ● Better suited for certain types of secure private computation - sums, statistical means, etc Secure Enclaves ● Requires specialized hardware, often cloud-provider proprietary ● Keys are still managed by the cloud provider - albeit in hardware ● Enclaves are not as powerful as general purpose CPUs, security guarantees unclear

Slide 35

Slide 35 text

How can I try it out?

Slide 36

Slide 36 text

Components and licensing ● Core Cryptography Library (drivers, server) ○ Atlas, Enterprise Advanced, or Community database ○ Client-side core library (libmongocrypt) & drivers all Apache licensed ○ Encourages peer review & feedback from research community ○ No black-box/proprietary crypto ● Automatic Encryption ○ Atlas & Enterprise Advanced ○ Enables encryption without app changes, vs helper methods ○ Shared library crypt_shared replaces cryptd (mongocryptd) package ○ Shared library package available on Enterprise downloads page

Slide 37

Slide 37 text

What's needed to try the Queryable Encryption Preview? ● 6.0 Server (rc8+) ● Queryable Encryption-aware drivers available now: ○ Node.js, Java (Sync & Async), Go, Python, C, C# .NET, Ruby, PHP ○ Coming weeks: (C++, Scala) ○ On roadmap: Rust, Swift ● Automatic encryption (via crypt_shared library) ○ Atlas (including free tier!) ○ Enterprise Advanced ● Explicit encryption (via ClientEncryption object) ○ Atlas or Enterprise Advanced ○ Community

Slide 38

Slide 38 text

Packages ○ Amazon Linux 2 / Amazon Linux 2 ARM 64 ○ Debian 10.0 / 11.0 ○ macOS / macOS ARM 64 (Mojave 10.14+) ○ RedHat / CentOS 7.0 ○ RedHat / CentOS 7.2 s390x ○ RedHat / CentOS 8.0 ○ RedHat / CentOS 8.1 ppc64le ○ RedHat / CentOS 8.2 ARM 64 ○ RedHat / CentOS 8.3 s390x ○ SUSE 12 / 15 ○ Ubuntu 18.04 / 18.04 ARM 64 ○ Ubuntu 20.04 / 20.04 ARM 64 ○ Windows 10 / Server 2016 / Server 2019

Slide 39

Slide 39 text

Key Management Enhancements (Coming Soon)

Slide 40

Slide 40 text

• Rotate the entire MongoDB key vault • Previously, only new data encryption keys were protected by a new CMK • Key Vault rotation replaces all former versions of CMK seamlessly, via a single API call Key Rotation

Slide 41

Slide 41 text

• Changing from one key provider to another used to require decrypting and re-encrypting all of your data • A single API call now seamlessly migrates your keys from any supported key provider to another one ○ AWS - GCP ○ Local - Azure ○ GCP - KMIP • With no impact to your application or data Key Migration

Slide 42

Slide 42 text

• Customer can provide Data Encryption Keys used • Meets strict compliance requirements for key generation in an HSM • API Custom Key Management

Slide 43

Slide 43 text

It takes a village Andrew Anna Asya Bernie Boris Clyde Craig Dave Davi Divjot Dmitry Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jesse Judah Julie Kaitlin Katia Kevin Mark Mat Nathan Naomi Nick Oz Pramod Ravind Rachael Rachelle Ross Sam Sara Sergei Shane Shreyas Spencer Vincent

Slide 44

Slide 44 text

Thank you!