Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Edge of Developed Practice in Encrypted Search

Kenn White
February 04, 2023

The Edge of Developed Practice in Encrypted Search

Talk from Enigma 2023

Kenn White

February 04, 2023
Tweet

More Decks by Kenn White

Other Decks in Technology

Transcript

  1. Popular misconceptions around encryption The true security guarantees of... network

    encryption - TLS/HTTPS, "data in-transit" storage encryption - volume/FDE, "data at-rest" (we rarely are) database encryption - what does that even mean?
  2. A few examples Isn't this is a solved problem? It

    absolutely isn't. Hasn't [X] database had transparent encryption for over a decade? Yes, server-side transparent encryption. With delegated keys. What about secure hardware enclaves, don't they take care of this? Popular solutions like Intel SGX have had… challenges. Plaintext exists inside the enclave, possibly entire databases. First principles or Pinky Promise as a Service? Popular misconceptions around encrypted search
  3. What developers, CIOs, and cryptographers mean by "encrypted search" or

    "encrypted databases" is often completely different. This is a shared lexicon issue
  4. What developers, CIOs, and cryptographers mean by "encrypted search" or

    "encrypted databases" is often completely different. Here, we use encrypted search in the way it's used by cryptographers: end-to-end encrypted data that can be privately queried. Queries can be performed against a database server holding the encrypted data but which does not have access to the keys. This is a shared lexicon issue
  5. In a database, what are the true security guarantees against...

    "authorized" operators? privileged (server-side) users? DBAs? system admins? custodians of backups? The trust problem
  6. What are the true security guarantees of end-to-end encryption? For

    secure messaging apps, the claims are (mostly?) understood. But what defines an "end"? What is the threat model and what are its actual guarantees? Where do the data flow? Who holds the keys? The turtles problem: who holds the key to the keys? Who can see the secrets? The trust problem
  7. In the context of a database… What can an attacker

    discover? What information does the database leak? Are leaks exploitable, in what context, and over what period of time? Can this leakage be formally described? (spoiler: yes, we believe so) Digging deeper…
  8. What we don't talk about much Things break. A lot.

    Networks are unreliable. Robust replication is hard. Partitions are a problem. Sometimes things just… die. Worse, sometimes client apps only semi-die. Dirty secrets of distributed (cloud) databases
  9. A simple question: why do we use a database? What

    are some of the useful things a database does? Searching! At what scale? Horizontal or vertical? How about concurrency? Stepping back a little…
  10. The baseline: simple matches on encrypted data find all records

    with: DOB == "1989-Dec-13" What types of searches are we talking about?
  11. The baseline: simple matches on encrypted data find all records

    with: DOB == "1989-Dec-13" The dream: rich, expressive queries on encrypted data find all records with: DOB ≥ 1920 AND < 2000 find all last names starting with "Rodrig" find all records with SSN ending with "7192" find accounts with credit card # ending in "8210" find all complaints containing "slow" What types of searches are we talking about?
  12. What types of data are we talking about? The kinds

    of data users care about: strings dates numbers decimal data types (e.g., precision financial values) objects documents (JSON)
  13. View from the front lines Scale: global managed distributed database

    service • 2.5M active clusters • 200+ data centers around the world • 8 major cloud providers deployments range from tiny developer test/sandbox instances to PB-scale global multi-cloud sharded clusters 100M+ cluster node certs generated
  14. Context: open source NoSQL document database probably not the kind

    of documents most here think about native JSON documents, including complex sub-documents distributed architecture by design vibrant developer community 260M+ downloads to date 1.5M online University students 7M+ developers globally View from the front lines
  15. Context: open source NoSQL document database probably not the kind

    of documents most here think about native JSON documents, including complex sub-documents distributed architecture by design vibrant developer community 260M+ downloads to date 1.5M online University students 7M+ developers globally View from the front lines
  16. Journey from academia to global-scale workloads A story about rethinking

    research models of databases prototypes challenges of large, messy networks removing developer pain & false choices apps in every major modern programming language growing an in-house advanced crypto R&D team formal analysis & proofs of real-world systems
  17. Journey from academia to global-scale workloads A story about tradeoffs

    & compromises real-world performance security properties that are explicitly not guaranteed everyone has Strong Thoughts on key management usability, usability, usability things we got wrong lessons learned
  18. The encrypted search problem Approaches & cryptographic schemes: Tokenization Property-Preserving

    Encryption (PPE) Oblivious RAM (ORAM) Fully Homomorphic Encryption (FHE) Functional encryption Garbled RAM Structured Encryption (STE) Are any of these suitable to general purpose databases?
  19. The encrypted search problem Approaches & cryptographic schemes: Tokenization Property-Preserving

    Encryption (PPE) Oblivious RAM (ORAM) Fully Homomorphic Encryption (FHE) Functional encryption Garbled RAM Structured Encryption (STE) Are any of these suitable to general purpose databases?
  20. In the beginning… Ya’qub al-Kindi and the golden era of

    Arab cryptanalysis ▪ 9th century scholar born ca. 805 in Iraq, educated in Baghdad ▪ Polymath philosopher, scientist, mathematician, musician ▪ Authored manuscript Treatise on Decrypting Cryptographic Messages ▪ Formal analysis of core "analytic principles" / letter frequencies ▪ Methods & attacks developed by al-Kindi 1,100+ years ago still apply to encrypted systems today https://plato.stanford.edu/entries/al-kindi/ https://muslimheritage.com/wp-content/uploads/2018/05/cryptology01.pdf https://membres-ljk.imag.fr/Bernard.Ycart/mel/hm/AlKadi_cryptology.pdf https://archive.is/R2vvu
  21. Design requirements Engineering & app developer needs queries that are

    actually useful - expressiveness legacy friendly (where "legacy" = 2019-2022) multiple clients concurrently query/writing same resource support existing replication & sharding protocols stateless & no client filtering efficiency; sub-linear for all operations
  22. High level flow Query from an authenticated client 1 ssn:

    "901-10-4312" db.billing.find ( { } Encrypted fields are always stored, transmitted, processed, and retrieved as ciphertext, including queries 6 "Jones Glee", "2223-0031-2200-3222", "[email protected]", "+1-212-555-1234", "901-10-4312" name: cardNum: email: phone: ssn: { { Driver 2 Provisioned Key Provider Cloud Provider KMS, On-prem HSM/Key Service, Cross-cloud KMS "Jones Glee", "r6EaUcgZ41Gerrwd”, "iu233oh35sdso743", "oR72CW4WferrSE3j", "d76b3ad038c0e0ed" name: cardNum: email: phone: ssn: 4 "6fbbb3f8c3a9f7a" "f72a9a1103d88b6" 3 encrypted search tokens: "er493grt4erw..." 5
  23. The boring part: Document content encrypted with standard AEAD encryption

    Encrypt-then-MAC authenticated encryption AES-256 with HMAC-SHA256 document encryption/decryption only happens in the client app top-level 96 byte composite user key on client comprising: 256-bit AES-CTR key 256-bit HMAC key 256-bit key for encrypted search operations How does it work?
  24. The boring part: Key management envelope encryption protects composite user

    keys ("field keys") database can never access raw key material backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service, local key… How does it work?
  25. The innovation: introduces a new type of functional search index

    based on a novel Structured Encryption construction client is stateless; database maintains (blinded) state distributed, highly available, highly scalable by design scheme is robust to client failures, high contention, dropped sessions How does it work?
  26. The innovation: based on a novel Structured Encryption construction index

    data structures are Encrypted Multi-Maps a type of reverse or inverted index of encrypted label/tuple pairs labels are pseudorandom function (PRF) evaluations fast, efficient EMM lookups PRFs here == keyed HMACs (HMAC-SHA-256) How does it work?
  27. Storage Impact - preliminary* metrics new auxiliary collections for every

    user collection still evaluating the storage:performance impact collection sizes can be decreased via compaction One additional field per document, negligible size Test example: 3.4M documents → 6.3GB user collection Punchline: Expect at least 2-4X additional storage cost vs unencrypted Trade-Offs: Storage *performance analysis/optimization for production-sized test workloads is ongoing
  28. Performance Impact - preliminary* metrics Reads - very fast find()

    queries: millisecond results on 5M+ document test DBs Writes - mixed low-volume (<100) insert() & update(): negligible impact high volume/batch (1K+) inserts(): still analyzing throughput bulk writes via insertMany() in test Trade-Offs: Performance *performance analysis/optimization for production-sized test workloads is ongoing
  29. Goals beta-level stability & behavior not for production use (literally

    on every page of docs & tutorials) there will be breaking changes (ibid) experimental protocols / data format changes will be frequent primary focus is usability & features expect architectural changes user experience: where can we reduce or eliminate choices? constant march towards more opinionated UX & safer defaults QE Public Preview
  30. Open source Apache licensed: Core cryptography library, libmongocrypt All drivers

    [Node.js, Java (Sync & Async), Go, Python, C/C++, C# .NET, Ruby, PHP, Rust, Scala] Compass & mongosh All commits are public QE support in driver (client) releases are beta / experimental releases can be 5+ months behind nightly branches QE Public Preview
  31. Packages Amazon Linux 2 x86, ARM 64 Debian 10.0 /

    11.0 macOS / macOS ARM 64 (Mojave 10.14+) RedHat / CentOS 8.0 - 8.3 x86, ppc64le, ARM 64, s390x Z-Series mainframe SUSE 12 / 15 x86 Ubuntu 18.04 / 20.04 x86, ARM 64 Windows 10 / Server 2016 / Server 2019
  32. Seny Kamara Tarik Moataz Cynthia Braund Core Eng & Server

    Security Drivers team Query team Docs & University tutorials teams (we ❤ you!) Special thanks
  33. It takes a village Andrew Anna Asya Bernie Boris Clyde

    Craig Colby Cynthia Dave Davi Divjot Dmitry Dmitry Duran Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jeremy Jesse Judah Julie Julias Kaitlin Katia Kevin Mark Mat Matt Nathan Naomi Neal Nick Oleg Oz Pramod Preston Ravind Rachael Rachelle Roberto Ross Sam Sara Sergei Shane Shreyas Spencer Vincent
  34. Recap on the current landscape in encrypted search what's practical

    today promising developments looking forward the measures of our (collective) success Takeaways