Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing MongoDB Queryable Encryption (Preview)

Introducing MongoDB Queryable Encryption (Preview)

Presentation at the MongoDB World 2022 conference in New York announcing Public Preview for Queryable Encryption

Kenn White

June 07, 2022
Tweet

More Decks by Kenn White

Other Decks in Technology

Transcript

  1. Product Talk
    Queryable Encryption
    Kenn White
    Security Principal
    World 2022
    Cynthia Braund
    Sr Product Manager

    View Slide

  2. Agenda
    Security 101
    Introducing Queryable Encryption
    Our journey and roadmap
    What is this magic?
    CSFLE vs Queryable Encryption
    How can I try it out?
    Key Management Enhancements

    View Slide

  3. Security 101

    View Slide

  4. Moving and storing data: most databases have it
    covered
    In-flight, over the
    network
    TLS Encryption
    Data is decrypted when it's received on the
    DB server
    Reminder: TLS is to protect
    against network eavesdropping

    View Slide

  5. Moving and storing data: most databases have it
    covered
    At-rest, on disk
    Volume Encryption
    Storage Engine Encryption
    Network
    Data is decrypted when the DB starts up
    Reminder: At-rest encryption is
    (mostly) to protect non-running
    databases & backups

    View Slide

  6. But what about data here?
    Network
    Disk
    In-use, in memory
    Very few practical solutions
    exist to protect data in-use

    View Slide

  7. Data is in plaintext while its being processed by
    the database
    Data is vulnerable to insider access and
    active database breaches:
    ● Authorized and compromised
    administrators, DBAs & privileged users
    ● RAM scraping
    ● Process inspection
    ● Cloud providers
    In-use, in memory

    View Slide

  8. It's a trust problem

    View Slide

  9. Data is in plaintext while its being processed by
    the database
    Data is vulnerable to insider access
    and active database breaches:
    ● Authorized and compromised
    administrators , DBAs & users
    ● RAM scraping
    ● Process inspection
    In-use, in memory
    This is why we built Client-Side
    Field Level Encryption!

    View Slide

  10. Option#1 – No encryption of data from client side
    Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”}
    {payer: “Jones Inc”, ssn: “901-10-4312”}

    {payer: “Baker Co”, ssn: “901-10-4312”}
    1 million records total
    10 records with ssn = “901-10-4312”
    10 records fetched with ssn = “901-10-4312”
    ● Fast querying
    ● But data is not secure in-use

    View Slide

  11. Option#2 – Using popular cloud SDK client-side encryption
    Query to ssn = “901-10-4312”
    {payer: “Acme Corp”, ssn: “901-10-4312”}
    {payer: “Jones Inc”, ssn: “901-10-4312”}

    {payer: “Baker Co”, ssn: “901-10-4312”}
    1 million records total
    10 records with ssn: “901-10-4312”
    {payer: “Acme Corp”, ssn: “3DwK354xz”}
    {payer: “Jones Inc”, ssn: “23awW124xz”}

    {payer: “Baker Co”, ssn: “75fdwswed”}
    1 million records total
    10 Randomly encrypted field
    10 records fetched with ssn = “901-10-4312”
    All 1 million records fetched
    ● Client-side processing & decryption
    ● Filtering of records on the client side
    (performance hit)
    Problem: You can't actually directly search
    encrypted fields. Not feasible for many use cases.

    View Slide

  12. Introducing
    Queryable Encryption

    View Slide

  13. ● Encrypt the sensitive data (fields)
    ● Easy development cycle
    ● No crypto experience required
    ● Encrypted throughout the data lifecycle
    ● Rich expressive queries
    ● MongoDB is the only platform to implement
    fast searchable encryption scheme
    ● Server-side processing of encrypted data
    ● Server does not know anything about the data
    Queryable Encryption
    Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”}
    {payer: “Jones Inc”, ssn “23awW124xz”}

    {payer: “Baker Co”, ssn: “75fdwswed”}
    1 million records total
    10 Randomly encrypted fields
    10 records fetched with ssn = “901-10-4312”
    MongoDB’s Approach

    View Slide

  14. Let’s look closer
    Query from an
    authenticated
    client
    1
    ssn: "901-10-4312"
    db.billing.find (
    {
    }
    Encrypted fields are always stored, transmitted, processed, and retrieved as ciphertext, including queries
    6
    "Jones Glee",
    "2223-0031-2200-3222",
    "[email protected]",
    "+1-212-555-1234",
    "901-10-4312"
    payer:
    cardNum:
    email:
    phone:
    ssn:
    {
    {
    MongoDB
    Driver
    2
    Customer Provisioned
    Key Provider
    Cloud Provider KMS,
    On-prem HSM/Key Service,
    Cross-cloud KMS
    "Jones Glee",
    "r6EaUcgZ41Gerrwd”,
    "iu233oh35sdso743",
    "oR72CW4WferrSE3j",
    "d76b3ad038c0e0ed"
    payer:
    cardNum:
    email:
    phone:
    ssn:
    4
    "6fbbb3f8c3a9f7a"
    "f72a9a1103d88b6"
    3
    encrypted search key: "er493grtee4erw"
    5

    View Slide

  15. Use Cases
    Industry: Financial Services
    Bank application needs to find transactions using a range of dates or dollar amounts for fraud
    detection
    Industry: Human Resources
    HR system allows searching for employees by the last 4 digits of their social security number
    Industry: Health Care
    Customer support agents needs to find patient records by searching for the first few characters
    of their name

    View Slide

  16. Queryable Encryption – Key Benefits
    Rich querying on encrypted
    data
    Run expressive queries like range, equality,
    prefix, suffix, substring, and more on
    encrypted data
    Ground-breaking query
    technology, standards-based
    cryptography
    Based on strong, standards-based
    cryptographic primitives
    End-to-end fully randomized
    encryption
    Data never exists in the clear outside of the client
    Dramatically reduces attack surface
    Faster app development
    No crypto experience required
    Intuitive and easy for developers to set up and use
    Strong technical controls for
    critical data privacy use cases
    Meet the strictest data privacy requirements
    for confidentiality on security critical
    workloads
    Reduce institutional risk
    Confident in storing and processing your
    sensitive workloads in MongoDB Atlas (Cloud)

    View Slide

  17. Our journey & roadmap

    View Slide

  18. Our Journey & Roadmap
    June 2022
    Aroki Systems acquisition
    Pioneers in Encrypted Search
    2019 Post 6.0
    Client-Side Field Level
    Encryption (CSFLE)
    Equality search on
    Deterministic encryption
    2021
    Queryable Encryption Preview
    Structured Encryption core
    functionality; Equality search on
    randomized encryption
    Post 6.0
    Queryable Encryption v1.1
    Addition of Range query
    capabilities
    Queryable Encryption v1.2
    Addition of prefix,suffix,
    substring query capabilities
    Future
    New privacy-enhancing
    cryptography capabilities
    Tarik
    Moataz
    Seny
    Kamara
    Formation of Advanced
    Cryptography Research group
    Seny Kamara, Tarik Moataz, and a
    team of PhD cryptography
    researchers
    May 2022

    View Slide

  19. What is in 6.0 Public Preview?
    Foundational work that
    enables equality and
    future query types
    Crypto framework
    Equality comparisons
    on randomly encrypted
    data
    Equality
    Queryable Encryption
    Configuration
    Decryption
    Compass

    View Slide

  20. What is a Public Preview?
    ● Available with 6.0 RC release
    ○ Evaluation only
    ○ May be breaking changes
    ○ Not recommended for production workloads

    View Slide

  21. Post 6.0 - Additional Query Types
    Range Prefix/Suffix Substring

    View Slide

  22. What is this magic?

    View Slide

  23. Security Model
    Multi-Snapshot-Secure
    ○ A snapshot adversary has (possibly successive) point-in-time access to the
    entire memory & disk of the database server
    ○ At that instant, adversary can access the entire DB, any keys stored in
    memory, all CPU state including L1-3 cache, and all logs

    View Slide

  24. Security Model
    Formal security guarantees
    ○ Encrypted fields of user documents are verifiably CCA-secure (secure against
    chosen ciphertext attacks)
    ■ Ciphertexts don't reveal information about the plaintext, beyond
    encrypted document size…
    ■ …even to adversaries that can adaptively query an encryption oracle
    ○ Encrypted indexes are verifiably adaptively multi-snapshot secure

    View Slide

  25. The boring part:
    ○ Document content encrypted with standard AEAD encryption
    ■ Encrypt-then-MAC authenticated encryption
    ○ AES-CTR-256 with HMAC-SHA256
    ○ Document encryption/decryption only happens on the client, in the application
    ○ Top-level 96 byte composite user key on client comprising:
    ■ 256-bit AES-CTR key
    ■ 256-bit HMAC key
    ■ 256-bit key for encrypted search operations
    How does it work?

    View Slide

  26. How does it work?
    The boring part:
    ○ Key management
    ■ Envelope encryption protects composite user keys ("field keys")
    ■ Database can never access raw key material
    ■ Backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault,
    custom key service, local key…

    View Slide

  27. How does it work?
    The innovation:
    ○ New type of functional search index is introduced
    ○ Based on a novel Structured Encryption construction
    ■ client is stateless; database maintains (blinded) state
    ■ distributed, highly available, highly scalable by design
    ■ scheme is robust to client failures, high contention, dropped sessions
    ■ index data structures are Encrypted Multi-Maps (EMMs)
    ○ a type of reverse or inverted index of encrypted label/tuple pairs
    ○ labels are pseudorandom function (PRF) evaluations
    ○ fast, efficient EMM lookups
    ○ PRFs here == keyed HMACs (HMAC-SHA-256)
    Reminder: HMACs are secret key digests/tags; the database does not have the key

    View Slide

  28. CSFLE vs Queryable
    Encryption

    View Slide

  29. Both Are
    Supported
    CSFLE is NOT being deprecated
    CSFLE to Queryable Encryption migration
    not (yet) supported
    CSFLE workloads should stay on CSFLE
    Queryable Encryption net new only
    Must be specified at collection creation

    View Slide

  30. How are they the same?
    CSFLE Queryable
    Encryption
    Highest Levels of Confidentiality and Integrity (client-side
    encryption)
    ✔ ✔
    Queryable and Non-Queryable Options ✔ ✔
    Authenticated AES Encryption*, 256-bit keys ✔ ✔
    Common Key Management Features ✔ ✔
    Encrypted data is stored in field as BinData ✔ ✔
    Shared Library (replaces mongocryptd) (Coming Soon) ✔
    *CSFLE uses CBC mode, Queryable Encryption uses CTR mode

    View Slide

  31. CSFLE Queryable Encryption
    ● Client-side encryption
    ● Server is (largely) unaware
    ● Queryability
    ○ Equality only - Deterministic
    ■ Data leakage on low entropy
    fields
    ● Flexible key usage
    ○ unique key per field
    ○ 1 key for all fields
    ○ per-document keys
    ● No additional data elements
    ● Client-side encryption
    ● Server is integral
    ● Queryability
    ○ New functional search index
    ○ Equality - Fully random
    ■ No snapshot leakage, even on
    low entropy fields
    ○ Range, prefix, suffix and substring
    ● Requires a unique key per field
    ● Additional data
    ○ 1 new field per document
    ■ __safeContent__
    ○ 3 new system collections: enxcol_.*
    ○ Do not modify any of these!

    View Slide

  32. View Slide

  33. Trade-offs
    Inserts Find
    (equality)
    Find (range,
    prefix, suffix,
    substring)
    Storage
    Overhead
    Frequency
    Leakage
    FLE Fast Fast No Minimal Possibly
    Queryable
    Encryption
    Slower Fast Yes Yes None
    Performance testing is in process

    View Slide

  34. How is this different from other solutions
    Queryable Encryption, using Structured Encryption
    ● Ideal for real-time database operations that MongoDB customers need
    ● Software implementation, hardware agnostic
    ● Optimized for sublinear searching
    Fully or Partial Homomorphic Encryption [FHE]
    ● Not natively offered in any major/commercial general purpose database
    ● For encrypted search, FHE is a poor choice due to weak performance – search speed is linear
    ● Queries slow down significantly as the data set grows
    ● Typically incurs a very heavy computational overhead
    ● Better suited for certain types of secure private computation - sums, statistical means, etc
    Secure Enclaves
    ● Requires specialized hardware, often cloud-provider proprietary
    ● Keys are still managed by the cloud provider - albeit in hardware
    ● Enclaves are not as powerful as general purpose CPUs, security guarantees unclear

    View Slide

  35. How can I try it out?

    View Slide

  36. Components and licensing
    ● Core Cryptography Library (drivers, server)
    ○ Atlas, Enterprise Advanced, or Community database
    ○ Client-side core library (libmongocrypt) & drivers all Apache licensed
    ○ Encourages peer review & feedback from research community
    ○ No black-box/proprietary crypto
    ● Automatic Encryption
    ○ Atlas & Enterprise Advanced
    ○ Enables encryption without app changes, vs helper methods
    ○ Shared library crypt_shared replaces cryptd (mongocryptd) package
    ○ Shared library package available on Enterprise downloads page

    View Slide

  37. What's needed to try the Queryable Encryption Preview?
    ● 6.0 Server (rc8+)
    ● Queryable Encryption-aware drivers available now:
    ○ Node.js, Java (Sync & Async), Go, Python, C, C# .NET, Ruby, PHP
    ○ Coming weeks: (C++, Scala)
    ○ On roadmap: Rust, Swift
    ● Automatic encryption (via crypt_shared library)
    ○ Atlas (including free tier!)
    ○ Enterprise Advanced
    ● Explicit encryption (via ClientEncryption object)
    ○ Atlas or Enterprise Advanced
    ○ Community

    View Slide

  38. Packages
    ○ Amazon Linux 2 / Amazon Linux 2 ARM 64
    ○ Debian 10.0 / 11.0
    ○ macOS / macOS ARM 64 (Mojave 10.14+)
    ○ RedHat / CentOS 7.0
    ○ RedHat / CentOS 7.2 s390x
    ○ RedHat / CentOS 8.0
    ○ RedHat / CentOS 8.1 ppc64le
    ○ RedHat / CentOS 8.2 ARM 64
    ○ RedHat / CentOS 8.3 s390x
    ○ SUSE 12 / 15
    ○ Ubuntu 18.04 / 18.04 ARM 64
    ○ Ubuntu 20.04 / 20.04 ARM 64
    ○ Windows 10 / Server 2016 / Server 2019

    View Slide

  39. Key Management
    Enhancements
    (Coming Soon)

    View Slide

  40. • Rotate the entire MongoDB
    key vault
    • Previously, only new data
    encryption keys were
    protected by a new CMK
    • Key Vault rotation replaces
    all former versions of CMK
    seamlessly, via a single API
    call
    Key Rotation

    View Slide

  41. • Changing from one key
    provider to another used to
    require decrypting and
    re-encrypting all of your data
    • A single API call now
    seamlessly migrates your keys
    from any supported key
    provider to another one
    ○ AWS - GCP
    ○ Local - Azure
    ○ GCP - KMIP
    • With no impact to your
    application or data
    Key Migration

    View Slide

  42. • Customer can provide Data
    Encryption Keys used
    • Meets strict compliance
    requirements for key
    generation in an HSM
    • API
    Custom Key
    Management

    View Slide

  43. It takes a village
    Andrew Anna Asya Bernie Boris Clyde
    Craig Dave Davi Divjot Dmitry Elizabeth
    Emily Eric Erwin Esha Ezra Jacob
    Jeff Jesse Judah Julie Kaitlin Katia
    Kevin Mark Mat Nathan Naomi Nick
    Oz Pramod Ravind Rachael Rachelle Ross
    Sam Sara Sergei Shane Shreyas Spencer
    Vincent

    View Slide

  44. Thank you!

    View Slide