Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Edge of Developed Practice in Encrypted Search

Kenn White
February 04, 2023

The Edge of Developed Practice in Encrypted Search

Talk from Enigma 2023

Kenn White

February 04, 2023
Tweet

More Decks by Kenn White

Other Decks in Technology

Transcript

  1. The Edge of Developed Practice in Encrypted Search
    Enigma Security 2023
    Kenn White
    MongoDB

    View Slide

  2. /about
    work:
    security principal @mongodb
    focus:
    applied cryptography
    distributed systems
    high-value target workloads
    usable security
    online:
    [email protected] [email protected]

    View Slide

  3. Popular misconceptions around encryption

    View Slide

  4. Popular misconceptions around encryption
    The true security guarantees of...
    network encryption - TLS/HTTPS, "data in-transit"
    storage encryption - volume/FDE, "data at-rest" (we rarely are)
    database encryption - what does that even mean?

    View Slide

  5. A few examples
    Isn't this is a solved problem?
    It absolutely isn't.
    Hasn't [X] database had transparent encryption for over a decade?
    Yes, server-side transparent encryption. With delegated keys.
    What about secure hardware enclaves, don't they take care of this?
    Popular solutions like Intel SGX have had… challenges.
    Plaintext exists inside the enclave, possibly entire databases.
    First principles or Pinky Promise as a Service?
    Popular misconceptions around encrypted search

    View Slide

  6. This is a shared lexicon issue

    View Slide

  7. What developers, CIOs, and cryptographers mean by "encrypted
    search" or "encrypted databases" is often completely different.
    This is a shared lexicon issue

    View Slide

  8. What developers, CIOs, and cryptographers mean by "encrypted
    search" or "encrypted databases" is often completely different.
    Here, we use encrypted search in the way it's used by cryptographers:
    end-to-end encrypted data that can be privately queried.
    Queries can be performed against a database server holding the
    encrypted data but which does not have access to the keys.
    This is a shared lexicon issue

    View Slide

  9. The trust problem

    View Slide

  10. In a database, what are the true security guarantees against...
    "authorized" operators?
    privileged (server-side) users?
    DBAs?
    system admins?
    custodians of backups?
    The trust problem

    View Slide

  11. The trust problem

    View Slide

  12. What are the true security guarantees of end-to-end encryption?
    For secure messaging apps, the claims are (mostly?) understood.
    But what defines an "end"?
    What is the threat model and what are its actual guarantees?
    Where do the data flow?
    Who holds the keys?
    The turtles problem: who holds the key to the keys?
    Who can see the secrets?
    The trust problem

    View Slide

  13. The trust problem
    “A warrant is not a cryptographic primitive.”
    -- Matt Blaze

    View Slide

  14. In the context of a database…
    What can an attacker discover?
    What information does the database leak?
    Are leaks exploitable, in what context, and over what period of
    time?
    Can this leakage be formally described? (spoiler: yes, we believe so)
    Digging deeper…

    View Slide

  15. Preliminaries

    View Slide

  16. You probably shouldn't freelance cryptography

    View Slide

  17. Security models should be as rigorous as possible

    View Slide

  18. DIY + "fail fast" is probably not the best approach

    View Slide

  19. Developer usability

    View Slide

  20. Then Hacker News weighed in…

    View Slide

  21. Dirty secrets of distributed (cloud) databases

    View Slide

  22. What we don't talk about much
    Things break. A lot.
    Networks are unreliable.
    Robust replication is hard.
    Partitions are a problem.
    Sometimes things just… die.
    Worse, sometimes client apps only semi-die.
    Dirty secrets of distributed (cloud) databases

    View Slide

  23. Stepping back a little…

    View Slide

  24. A simple question: why do we use a database?
    What are some of the useful things a database does?
    Searching!
    At what scale? Horizontal or vertical?
    How about concurrency?
    Stepping back a little…

    View Slide

  25. What types of searches are we talking about?

    View Slide

  26. The baseline: simple matches on encrypted data
    find all records with: DOB == "1989-Dec-13"
    What types of searches are we talking about?

    View Slide

  27. The baseline: simple matches on encrypted data
    find all records with: DOB == "1989-Dec-13"
    The dream: rich, expressive queries on encrypted data
    find all records with: DOB ≥ 1920 AND < 2000
    find all last names starting with "Rodrig"
    find all records with SSN ending with "7192"
    find accounts with credit card # ending in "8210"
    find all complaints containing "slow"
    What types of searches are we talking about?

    View Slide

  28. What types of data are we talking about?

    View Slide

  29. What types of data are we talking about?
    The kinds of data users care about:
    strings
    dates
    numbers
    decimal data types (e.g., precision financial values)
    objects
    documents (JSON)

    View Slide

  30. View from the front lines

    View Slide

  31. View from the front lines
    Scale:
    global managed distributed database service
    ● 2.5M active clusters
    ● 200+ data centers around the world
    ● 8 major cloud providers
    deployments range from tiny developer test/sandbox instances to
    PB-scale global multi-cloud sharded clusters
    100M+ cluster node certs generated

    View Slide

  32. Context:
    open source NoSQL document database
    probably not the kind of documents most here think about
    native JSON documents, including complex sub-documents
    distributed architecture by design
    vibrant developer community
    260M+ downloads to date
    1.5M online University students
    7M+ developers globally
    View from the front lines

    View Slide

  33. Context:
    open source NoSQL document database
    probably not the kind of documents most here think about
    native JSON documents, including complex sub-documents
    distributed architecture by design
    vibrant developer community
    260M+ downloads to date
    1.5M online University students
    7M+ developers globally
    View from the front lines

    View Slide

  34. “What's a ‘JSON document’?”
    View from the front lines

    View Slide

  35. View Slide

  36. Journey from academia to global-scale workloads

    View Slide

  37. Journey from academia to global-scale workloads
    A story about
    rethinking research models of databases
    prototypes
    challenges of large, messy networks
    removing developer pain & false choices
    apps in every major modern programming language
    growing an in-house advanced crypto R&D team
    formal analysis & proofs of real-world systems

    View Slide

  38. Journey from academia to global-scale workloads
    A story about
    tradeoffs & compromises
    real-world performance
    security properties that are explicitly not guaranteed
    everyone has Strong Thoughts on key management
    usability, usability, usability
    things we got wrong
    lessons learned

    View Slide

  39. The encrypted search problem

    View Slide

  40. The encrypted search problem
    Approaches & cryptographic schemes:
    Tokenization
    Property-Preserving Encryption (PPE)
    Oblivious RAM (ORAM)
    Fully Homomorphic Encryption (FHE)
    Functional encryption
    Garbled RAM
    Structured Encryption (STE)
    Are any of these suitable to general purpose databases?

    View Slide

  41. The encrypted search problem
    Approaches & cryptographic schemes:
    Tokenization
    Property-Preserving Encryption (PPE)
    Oblivious RAM (ORAM)
    Fully Homomorphic Encryption (FHE)
    Functional encryption
    Garbled RAM
    Structured Encryption (STE)
    Are any of these suitable to general purpose databases?

    View Slide

  42. In the beginning…

    View Slide

  43. Ya’qub al-Kindi and the golden era of Arab cryptanalysis
    In the beginning…

    View Slide

  44. In the beginning…
    Ya’qub al-Kindi and the golden era of Arab cryptanalysis

    9th century scholar born ca. 805 in Iraq, educated in Baghdad

    Polymath philosopher, scientist, mathematician, musician

    Authored manuscript Treatise on Decrypting Cryptographic Messages

    Formal analysis of core "analytic principles" / letter frequencies

    Methods & attacks developed by al-Kindi 1,100+ years ago still apply to
    encrypted systems today
    https://plato.stanford.edu/entries/al-kindi/
    https://muslimheritage.com/wp-content/uploads/2018/05/cryptology01.pdf
    https://membres-ljk.imag.fr/Bernard.Ycart/mel/hm/AlKadi_cryptology.pdf
    https://archive.is/R2vvu

    View Slide

  45. (jumping ahead 1,174 years)

    View Slide

  46. A modest request…

    View Slide

  47. A modest request…

    View Slide

  48. A year later…

    View Slide

  49. A year later…

    View Slide

  50. The next spring, a conversation in 2020 over coffee

    View Slide

  51. And thus began the journey of Queryable Encryption

    View Slide

  52. Design requirements

    View Slide

  53. Design requirements
    Engineering & app developer needs
    queries that are actually useful - expressiveness
    legacy friendly (where "legacy" = 2019-2022)
    multiple clients concurrently query/writing same resource
    support existing replication & sharding protocols
    stateless & no client filtering
    efficiency; sub-linear for all operations

    View Slide

  54. High level flow

    View Slide

  55. High level flow
    Query from an
    authenticated
    client
    1
    ssn: "901-10-4312"
    db.billing.find (
    {
    }
    Encrypted fields are always stored, transmitted, processed, and retrieved as ciphertext, including queries
    6
    "Jones Glee",
    "2223-0031-2200-3222",
    "[email protected]t",
    "+1-212-555-1234",
    "901-10-4312"
    name:
    cardNum:
    email:
    phone:
    ssn:
    {
    {
    Driver
    2
    Provisioned
    Key Provider
    Cloud Provider KMS,
    On-prem HSM/Key Service,
    Cross-cloud KMS
    "Jones Glee",
    "r6EaUcgZ41Gerrwd”,
    "iu233oh35sdso743",
    "oR72CW4WferrSE3j",
    "d76b3ad038c0e0ed"
    name:
    cardNum:
    email:
    phone:
    ssn:
    4
    "6fbbb3f8c3a9f7a"
    "f72a9a1103d88b6"
    3
    encrypted search tokens: "er493grt4erw..."
    5

    View Slide

  56. How does it work?

    View Slide

  57. The boring part: Document content
    encrypted with standard AEAD encryption
    Encrypt-then-MAC authenticated encryption
    AES-256 with HMAC-SHA256
    document encryption/decryption only happens in the client app
    top-level 96 byte composite user key on client comprising:
    256-bit AES-CTR key
    256-bit HMAC key
    256-bit key for encrypted search operations
    How does it work?

    View Slide

  58. The boring part: Key management
    envelope encryption protects composite user keys ("field keys")
    database can never access raw key material
    backing vault key management:
    cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service,
    local key…
    How does it work?

    View Slide

  59. The innovation:
    introduces a new type of functional search index
    based on a novel Structured Encryption construction
    client is stateless; database maintains (blinded) state
    distributed, highly available, highly scalable by design
    scheme is robust to client failures, high contention, dropped
    sessions
    How does it work?

    View Slide

  60. The innovation:
    based on a novel Structured Encryption construction
    index data structures are Encrypted Multi-Maps
    a type of reverse or inverted index of encrypted label/tuple pairs
    labels are pseudorandom function (PRF) evaluations
    fast, efficient EMM lookups
    PRFs here == keyed HMACs (HMAC-SHA-256)
    How does it work?

    View Slide

  61. Queryable Encryption core scheme: OST-1

    View Slide

  62. Queryable Encryption core scheme: OST-1

    View Slide

  63. About those JSON documents…

    View Slide

  64. View Slide

  65. View Slide

  66. Trade-Offs

    View Slide

  67. Storage Impact - preliminary* metrics
    new auxiliary collections for every user collection
    still evaluating the storage:performance impact
    collection sizes can be decreased via compaction
    One additional field per document, negligible size
    Test example: 3.4M documents → 6.3GB user collection
    Punchline:
    Expect at least 2-4X additional storage cost vs unencrypted
    Trade-Offs: Storage
    *performance analysis/optimization for production-sized test workloads is ongoing

    View Slide

  68. Performance Impact - preliminary* metrics
    Reads - very fast
    find() queries: millisecond results on 5M+ document test DBs
    Writes - mixed
    low-volume (<100) insert() & update(): negligible impact
    high volume/batch (1K+) inserts(): still analyzing throughput
    bulk writes via insertMany() in test
    Trade-Offs: Performance
    *performance analysis/optimization for production-sized test workloads is ongoing

    View Slide

  69. View Slide

  70. QE Public Preview

    View Slide

  71. Goals
    beta-level stability & behavior
    not for production use (literally on every page of docs & tutorials)
    there will be breaking changes (ibid)
    experimental protocols / data format changes will be frequent
    primary focus is usability & features
    expect architectural changes
    user experience: where can we reduce or eliminate choices?
    constant march towards more opinionated UX & safer defaults
    QE Public Preview

    View Slide

  72. Open source
    Apache licensed:
    Core cryptography library, libmongocrypt
    All drivers [Node.js, Java (Sync & Async), Go, Python, C/C++, C#
    .NET, Ruby, PHP, Rust, Scala]
    Compass & mongosh
    All commits are public
    QE support in driver (client) releases are beta / experimental
    releases can be 5+ months behind nightly branches
    QE Public Preview

    View Slide

  73. Packages
    Amazon Linux 2
    x86, ARM 64
    Debian 10.0 / 11.0
    macOS / macOS ARM 64 (Mojave 10.14+)
    RedHat / CentOS 8.0 - 8.3
    x86, ppc64le, ARM 64, s390x Z-Series mainframe
    SUSE 12 / 15 x86
    Ubuntu 18.04 / 20.04
    x86, ARM 64
    Windows 10 / Server 2016 / Server 2019

    View Slide

  74. Seny Kamara
    Tarik Moataz
    Cynthia Braund
    Core Eng & Server Security
    Drivers team
    Query team
    Docs & University tutorials teams (we ❤ you!)
    Special thanks

    View Slide

  75. It takes a village
    Andrew Anna Asya Bernie Boris Clyde
    Craig Colby Cynthia Dave Davi Divjot
    Dmitry Dmitry Duran Elizabeth Emily Eric
    Erwin Esha Ezra Jacob Jeff Jeremy
    Jesse Judah Julie Julias Kaitlin Katia
    Kevin Mark Mat Matt Nathan Naomi
    Neal Nick Oleg Oz Pramod Preston
    Ravind Rachael Rachelle Roberto Ross Sam
    Sara Sergei Shane Shreyas Spencer Vincent

    View Slide

  76. Takeaways

    View Slide

  77. Recap on the current landscape in encrypted search
    what's practical today
    promising developments
    looking forward
    the measures of our (collective) success
    Takeaways

    View Slide

  78. The future of practical encrypted search is bright!
    Kenn White
    MongoDB
    Enigma Security 2023

    View Slide