Slide 1

Slide 1 text

The Edge of Developed Practice in Encrypted Search Enigma Security 2023 Kenn White MongoDB

Slide 2

Slide 2 text

/about work: security principal @mongodb focus: applied cryptography distributed systems high-value target workloads usable security online: [email protected] [email protected]

Slide 3

Slide 3 text

Popular misconceptions around encryption

Slide 4

Slide 4 text

Popular misconceptions around encryption The true security guarantees of... network encryption - TLS/HTTPS, "data in-transit" storage encryption - volume/FDE, "data at-rest" (we rarely are) database encryption - what does that even mean?

Slide 5

Slide 5 text

A few examples Isn't this is a solved problem? It absolutely isn't. Hasn't [X] database had transparent encryption for over a decade? Yes, server-side transparent encryption. With delegated keys. What about secure hardware enclaves, don't they take care of this? Popular solutions like Intel SGX have had… challenges. Plaintext exists inside the enclave, possibly entire databases. First principles or Pinky Promise as a Service? Popular misconceptions around encrypted search

Slide 6

Slide 6 text

This is a shared lexicon issue

Slide 7

Slide 7 text

What developers, CIOs, and cryptographers mean by "encrypted search" or "encrypted databases" is often completely different. This is a shared lexicon issue

Slide 8

Slide 8 text

What developers, CIOs, and cryptographers mean by "encrypted search" or "encrypted databases" is often completely different. Here, we use encrypted search in the way it's used by cryptographers: end-to-end encrypted data that can be privately queried. Queries can be performed against a database server holding the encrypted data but which does not have access to the keys. This is a shared lexicon issue

Slide 9

Slide 9 text

The trust problem

Slide 10

Slide 10 text

In a database, what are the true security guarantees against... "authorized" operators? privileged (server-side) users? DBAs? system admins? custodians of backups? The trust problem

Slide 11

Slide 11 text

The trust problem

Slide 12

Slide 12 text

What are the true security guarantees of end-to-end encryption? For secure messaging apps, the claims are (mostly?) understood. But what defines an "end"? What is the threat model and what are its actual guarantees? Where do the data flow? Who holds the keys? The turtles problem: who holds the key to the keys? Who can see the secrets? The trust problem

Slide 13

Slide 13 text

The trust problem “A warrant is not a cryptographic primitive.” -- Matt Blaze

Slide 14

Slide 14 text

In the context of a database… What can an attacker discover? What information does the database leak? Are leaks exploitable, in what context, and over what period of time? Can this leakage be formally described? (spoiler: yes, we believe so) Digging deeper…

Slide 15

Slide 15 text

Preliminaries

Slide 16

Slide 16 text

You probably shouldn't freelance cryptography

Slide 17

Slide 17 text

Security models should be as rigorous as possible

Slide 18

Slide 18 text

DIY + "fail fast" is probably not the best approach

Slide 19

Slide 19 text

Developer usability

Slide 20

Slide 20 text

Then Hacker News weighed in…

Slide 21

Slide 21 text

Dirty secrets of distributed (cloud) databases

Slide 22

Slide 22 text

What we don't talk about much Things break. A lot. Networks are unreliable. Robust replication is hard. Partitions are a problem. Sometimes things just… die. Worse, sometimes client apps only semi-die. Dirty secrets of distributed (cloud) databases

Slide 23

Slide 23 text

Stepping back a little…

Slide 24

Slide 24 text

A simple question: why do we use a database? What are some of the useful things a database does? Searching! At what scale? Horizontal or vertical? How about concurrency? Stepping back a little…

Slide 25

Slide 25 text

What types of searches are we talking about?

Slide 26

Slide 26 text

The baseline: simple matches on encrypted data find all records with: DOB == "1989-Dec-13" What types of searches are we talking about?

Slide 27

Slide 27 text

The baseline: simple matches on encrypted data find all records with: DOB == "1989-Dec-13" The dream: rich, expressive queries on encrypted data find all records with: DOB ≥ 1920 AND < 2000 find all last names starting with "Rodrig" find all records with SSN ending with "7192" find accounts with credit card # ending in "8210" find all complaints containing "slow" What types of searches are we talking about?

Slide 28

Slide 28 text

What types of data are we talking about?

Slide 29

Slide 29 text

What types of data are we talking about? The kinds of data users care about: strings dates numbers decimal data types (e.g., precision financial values) objects documents (JSON)

Slide 30

Slide 30 text

View from the front lines

Slide 31

Slide 31 text

View from the front lines Scale: global managed distributed database service ● 2.5M active clusters ● 200+ data centers around the world ● 8 major cloud providers deployments range from tiny developer test/sandbox instances to PB-scale global multi-cloud sharded clusters 100M+ cluster node certs generated

Slide 32

Slide 32 text

Context: open source NoSQL document database probably not the kind of documents most here think about native JSON documents, including complex sub-documents distributed architecture by design vibrant developer community 260M+ downloads to date 1.5M online University students 7M+ developers globally View from the front lines

Slide 33

Slide 33 text

Context: open source NoSQL document database probably not the kind of documents most here think about native JSON documents, including complex sub-documents distributed architecture by design vibrant developer community 260M+ downloads to date 1.5M online University students 7M+ developers globally View from the front lines

Slide 34

Slide 34 text

“What's a ‘JSON document’?” View from the front lines

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Journey from academia to global-scale workloads

Slide 37

Slide 37 text

Journey from academia to global-scale workloads A story about rethinking research models of databases prototypes challenges of large, messy networks removing developer pain & false choices apps in every major modern programming language growing an in-house advanced crypto R&D team formal analysis & proofs of real-world systems

Slide 38

Slide 38 text

Journey from academia to global-scale workloads A story about tradeoffs & compromises real-world performance security properties that are explicitly not guaranteed everyone has Strong Thoughts on key management usability, usability, usability things we got wrong lessons learned

Slide 39

Slide 39 text

The encrypted search problem

Slide 40

Slide 40 text

The encrypted search problem Approaches & cryptographic schemes: Tokenization Property-Preserving Encryption (PPE) Oblivious RAM (ORAM) Fully Homomorphic Encryption (FHE) Functional encryption Garbled RAM Structured Encryption (STE) Are any of these suitable to general purpose databases?

Slide 41

Slide 41 text

The encrypted search problem Approaches & cryptographic schemes: Tokenization Property-Preserving Encryption (PPE) Oblivious RAM (ORAM) Fully Homomorphic Encryption (FHE) Functional encryption Garbled RAM Structured Encryption (STE) Are any of these suitable to general purpose databases?

Slide 42

Slide 42 text

In the beginning…

Slide 43

Slide 43 text

Ya’qub al-Kindi and the golden era of Arab cryptanalysis In the beginning…

Slide 44

Slide 44 text

In the beginning… Ya’qub al-Kindi and the golden era of Arab cryptanalysis ■ 9th century scholar born ca. 805 in Iraq, educated in Baghdad ■ Polymath philosopher, scientist, mathematician, musician ■ Authored manuscript Treatise on Decrypting Cryptographic Messages ■ Formal analysis of core "analytic principles" / letter frequencies ■ Methods & attacks developed by al-Kindi 1,100+ years ago still apply to encrypted systems today https://plato.stanford.edu/entries/al-kindi/ https://muslimheritage.com/wp-content/uploads/2018/05/cryptology01.pdf https://membres-ljk.imag.fr/Bernard.Ycart/mel/hm/AlKadi_cryptology.pdf https://archive.is/R2vvu

Slide 45

Slide 45 text

(jumping ahead 1,174 years)

Slide 46

Slide 46 text

A modest request…

Slide 47

Slide 47 text

A modest request…

Slide 48

Slide 48 text

A year later…

Slide 49

Slide 49 text

A year later…

Slide 50

Slide 50 text

The next spring, a conversation in 2020 over coffee

Slide 51

Slide 51 text

And thus began the journey of Queryable Encryption

Slide 52

Slide 52 text

Design requirements

Slide 53

Slide 53 text

Design requirements Engineering & app developer needs queries that are actually useful - expressiveness legacy friendly (where "legacy" = 2019-2022) multiple clients concurrently query/writing same resource support existing replication & sharding protocols stateless & no client filtering efficiency; sub-linear for all operations

Slide 54

Slide 54 text

High level flow

Slide 55

Slide 55 text

High level flow Query from an authenticated client 1 ssn: "901-10-4312" db.billing.find ( { } Encrypted fields are always stored, transmitted, processed, and retrieved as ciphertext, including queries 6 "Jones Glee", "2223-0031-2200-3222", "[email protected]", "+1-212-555-1234", "901-10-4312" name: cardNum: email: phone: ssn: { { Driver 2 Provisioned Key Provider Cloud Provider KMS, On-prem HSM/Key Service, Cross-cloud KMS "Jones Glee", "r6EaUcgZ41Gerrwd”, "iu233oh35sdso743", "oR72CW4WferrSE3j", "d76b3ad038c0e0ed" name: cardNum: email: phone: ssn: 4 "6fbbb3f8c3a9f7a" "f72a9a1103d88b6" 3 encrypted search tokens: "er493grt4erw..." 5

Slide 56

Slide 56 text

How does it work?

Slide 57

Slide 57 text

The boring part: Document content encrypted with standard AEAD encryption Encrypt-then-MAC authenticated encryption AES-256 with HMAC-SHA256 document encryption/decryption only happens in the client app top-level 96 byte composite user key on client comprising: 256-bit AES-CTR key 256-bit HMAC key 256-bit key for encrypted search operations How does it work?

Slide 58

Slide 58 text

The boring part: Key management envelope encryption protects composite user keys ("field keys") database can never access raw key material backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service, local key… How does it work?

Slide 59

Slide 59 text

The innovation: introduces a new type of functional search index based on a novel Structured Encryption construction client is stateless; database maintains (blinded) state distributed, highly available, highly scalable by design scheme is robust to client failures, high contention, dropped sessions How does it work?

Slide 60

Slide 60 text

The innovation: based on a novel Structured Encryption construction index data structures are Encrypted Multi-Maps a type of reverse or inverted index of encrypted label/tuple pairs labels are pseudorandom function (PRF) evaluations fast, efficient EMM lookups PRFs here == keyed HMACs (HMAC-SHA-256) How does it work?

Slide 61

Slide 61 text

Queryable Encryption core scheme: OST-1

Slide 62

Slide 62 text

Queryable Encryption core scheme: OST-1

Slide 63

Slide 63 text

About those JSON documents…

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

Trade-Offs

Slide 67

Slide 67 text

Storage Impact - preliminary* metrics new auxiliary collections for every user collection still evaluating the storage:performance impact collection sizes can be decreased via compaction One additional field per document, negligible size Test example: 3.4M documents → 6.3GB user collection Punchline: Expect at least 2-4X additional storage cost vs unencrypted Trade-Offs: Storage *performance analysis/optimization for production-sized test workloads is ongoing

Slide 68

Slide 68 text

Performance Impact - preliminary* metrics Reads - very fast find() queries: millisecond results on 5M+ document test DBs Writes - mixed low-volume (<100) insert() & update(): negligible impact high volume/batch (1K+) inserts(): still analyzing throughput bulk writes via insertMany() in test Trade-Offs: Performance *performance analysis/optimization for production-sized test workloads is ongoing

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

QE Public Preview

Slide 71

Slide 71 text

Goals beta-level stability & behavior not for production use (literally on every page of docs & tutorials) there will be breaking changes (ibid) experimental protocols / data format changes will be frequent primary focus is usability & features expect architectural changes user experience: where can we reduce or eliminate choices? constant march towards more opinionated UX & safer defaults QE Public Preview

Slide 72

Slide 72 text

Open source Apache licensed: Core cryptography library, libmongocrypt All drivers [Node.js, Java (Sync & Async), Go, Python, C/C++, C# .NET, Ruby, PHP, Rust, Scala] Compass & mongosh All commits are public QE support in driver (client) releases are beta / experimental releases can be 5+ months behind nightly branches QE Public Preview

Slide 73

Slide 73 text

Packages Amazon Linux 2 x86, ARM 64 Debian 10.0 / 11.0 macOS / macOS ARM 64 (Mojave 10.14+) RedHat / CentOS 8.0 - 8.3 x86, ppc64le, ARM 64, s390x Z-Series mainframe SUSE 12 / 15 x86 Ubuntu 18.04 / 20.04 x86, ARM 64 Windows 10 / Server 2016 / Server 2019

Slide 74

Slide 74 text

Seny Kamara Tarik Moataz Cynthia Braund Core Eng & Server Security Drivers team Query team Docs & University tutorials teams (we ❤ you!) Special thanks

Slide 75

Slide 75 text

It takes a village Andrew Anna Asya Bernie Boris Clyde Craig Colby Cynthia Dave Davi Divjot Dmitry Dmitry Duran Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jeremy Jesse Judah Julie Julias Kaitlin Katia Kevin Mark Mat Matt Nathan Naomi Neal Nick Oleg Oz Pramod Preston Ravind Rachael Rachelle Roberto Ross Sam Sara Sergei Shane Shreyas Spencer Vincent

Slide 76

Slide 76 text

Takeaways

Slide 77

Slide 77 text

Recap on the current landscape in encrypted search what's practical today promising developments looking forward the measures of our (collective) success Takeaways

Slide 78

Slide 78 text

The future of practical encrypted search is bright! Kenn White MongoDB Enigma Security 2023