Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Search over encrypted records: from academic dreams to production-ready tool

Search over encrypted records: from academic dreams to production-ready tool

The search over encrypted data is the modern cryptographic engineering problem. We will talk about existing approaches (both well-known and modern), and concentrate on practical solution based on blind index technique to search data in databases. What’s inside: cryptographic and functional schemes, implementation details, practical security evaluation (risk modelling and potential attacks). We will show how theoretical models turn into real, usable, maintainable, security tools.

Artem Storozhuk

May 17, 2019
Tweet

More Decks by Artem Storozhuk

Other Decks in Programming

Transcript

  1. Search over encrypted records:
    from academic dreams to
    production-ready tool

    View Slide

  2. Artem Storozhuk
    Security Software Engineer
    at Cossack Labs
    [email protected]

    View Slide

  3. Database as a Service (DBaaS)

    View Slide

  4. DBaaS security drawbacks
    non-sensitive data
    sensitive data

    View Slide

  5. DBaaS security drawbacks
    non-sensitive data
    sensitive data
    1. Untrusted DBA.
    2. Hacker with root access.
    3. Change of storage provider ownership.

    View Slide

  6. Encryption is a solution
    1. Whole database encryption

    View Slide

  7. Encryption is a solution
    1. Whole database encryption
    2. Searchable encryption

    View Slide

  8. Searchable encryption techniques

    View Slide

  9. Searchable encryption techniques
    SWP, Goh, CM-I, CM-II, CGK+-I, CGK+-II, ABO, LSD-I, LSD-II, CK, KO, KPR, KP, GSW-I, GSW-II, BKM, RT,
    WWP-III, CJJ+, PKL+, ABC+, SSW, LWW+, BTH+, KIK, BC, RVB+, YLW, BCO+, ABC++, BSS-I, CS, Khader, BSS-II,
    RPS+-I, TC, ZI, RPS+-II, INH+, PKL, PCL, HL, BW, SBC+, BCK, BBO, DRD-I, DRD-II, BDD+, HLm, WWP-I,
    WWP-II, WWP-IIIm, WWP-IV.

    View Slide

  10. Index-based searchable encryption
    I - secure index (pointer on encrypted message);
    T - trapdoor (allows server to identify encrypted message without revealing its plaintext);

    View Slide

  11. SECURITY
    (ability to resist cryptanalytic attacks)
    EFFICIENCY
    (query latency)
    QUERY EXPRESSIVENESS
    (equality, conjunction, comparison,
    subset, range, wildcard)
    ARCHITECTURE
    (outsourcing /
    sharing)
    Searchable encryption tradeoff

    View Slide

  12. Searchable encryption security
    Information about objects that may be leaked:
    1) Order
    2) Equalities
    3) Predicates
    4) Identifiers
    5) Structure

    View Slide

  13. Searchable encryption security
    Information about objects that may be leaked:
    1) Order
    2) Equalities
    3) Predicates
    4) Identifiers
    5) Structure
    Groups of leakage:
    1) Secure index metadata
    2) Search pattern
    3) Access pattern

    View Slide

  14. Model of untrusted storage provider:
    1) Honest-but-curious
    2) Malicious
    Searchable encryption security
    Information about objects that may be leaked:
    1) Order
    2) Equalities
    3) Predicates
    4) Identifiers
    5) Structure
    Groups of leakage:
    1) Secure index metadata
    2) Search pattern
    3) Access pattern

    View Slide

  15. Model of untrusted storage provider:
    1) Honest-but-curious
    2) Malicious
    Searchable encryption security
    Information about objects that may be leaked:
    1) Order
    2) Equalities
    3) Predicates
    4) Identifiers
    5) Structure
    Strongest security definition (Curtmola et. al. 2006) [schemes exist only in theory]:
    Nothing should be leaked.
    Full security definition (Shen et. al. 2009) [schemes exist with implementation but inefficient in
    production]:
    Nothing should be leaked, except access pattern.
    Groups of leakage:
    1) Secure index metadata
    2) Search pattern
    3) Access pattern

    View Slide

  16. Leakage inference attacks

    View Slide

  17. Count Attack – 40% keyword recovery rate with a 80% of
    dataset known to attacker.
    Works well if the keyword universe sizes is 5000 at most.
    Leakage inference attacks

    View Slide

  18. Count Attack – 40% keyword recovery rate with a 80% of
    dataset known to attacker.
    Works well if the keyword universe sizes is 5000 at most.
    Leakage inference attacks
    Hierarchical-Search Attack – extension of the Count Attack,
    40% keyword recovery rate under a condition that (at least)
    40% of the data leaks.
    Attacker could inject a set of constructed records.

    View Slide

  19. 1. open source
    2. strong & proven
    3. fast & reliable
    4. without security design flaws
    How we selected SE scheme?

    View Slide

  20. Available SE solutions
    CryptDB [2011]:
    - https://css.csail.mit.edu/cryptdb/
    - http://people.csail.mit.edu/nickolai/papers/raluca-cryptdb.pdf
    - https://eprint.iacr.org/2015/979.pdf
    - https://github.com/CryptDB/cryptdb
    Mylar [2013]:
    - https://css.csail.mit.edu/mylar/
    - https://css.csail.mit.edu/mylar/mylar.pdf
    - https://github.com/strikeout/mylar
    CipherSweet [2018]
    - https://paragonie.com/blog/2019/01/ciphersweet-searchable-encryption-doesn-t-have-be-bitter
    - https://github.com/paragonie/ciphersweet

    View Slide

  21. CryptDB

    View Slide

  22. CryptDB (onion cryptography)
    Strong sides: query expressiveness, efficiency
    Weak side: security

    View Slide

  23. Mylar

    View Slide

  24. CipherSweet

    View Slide

  25. CipherSweet
    1) INSERT:
    INSERT INTO test_table(IndexFieldA, FieldA, FieldB) VALUES (MAC(dataA),Encrypt(dataA),dataB)
    2) SELECT:
    rows = select FieldA, FieldB from test_table where IndexFieldA=MAC(dataA)
    Decrypt(rows.FieldA)
    IndexFieldA FieldA FieldB
    MAC ENCRYPTED dataB
    ... ... ...

    View Slide

  26. CipherSweet
    IndexFieldA FieldA FieldB
    MAC [<32] ENCRYPTED dataB
    ... ... ...
    IndexFieldA FieldA FieldB
    MAC [32] ENCRYPTED dataB
    ... ... ...

    View Slide

  27. CipherSweet
    MAC length <==> Probability of index collision <==> Probability of
    “false positives” in
    SELECT response

    View Slide

  28. CipherSweet
    MAC length <==> Probability of index collision <==> Probability of
    “false positives” in
    SELECT response
    Application Database
    FieldA FieldB
    ENCRYPTED ...
    ENCRYPTED ...
    FieldA FieldB
    0x0123456 ...
    0x0125676 ...

    View Slide

  29. CipherSweet
    Application Database
    FieldA FieldB
    ENCRYPTED ...
    ENCRYPTED ...
    FieldA FieldB
    0x0123456 ...
    0x0125676 ...
    select * from test_table where FieldA=0x0123456

    View Slide

  30. github.com/cossacklabs/acra
    www.cossacklabs.com/acra/

    View Slide

  31. Acra – database encryption proxy
    AcraSE
    - Data encryption (separate keys per app, per user)
    - Authentication (transport, access control list for applications compartmentalization)
    - Query policy (a separate SQL firewall module)
    - Intrusion detection (poison records)
    - Key management (key rotation utility)
    - Monitoring and observability (logging, metrics, tracing)

    View Slide

  32. AcraSE cryptographic design

    View Slide

  33. AcraSE cryptographic design
    Application AcraServer Database
    Able to encrypt Data +/- + -
    Able to decrypt Data - + -
    Able to calculate Secure Index - + -

    View Slide

  34. AcraSE cryptographic design
    INSERT query transparent mode
    insert into test_table(A, B) values (, )
    changed to
    insert into test_table(A, B) values (, )
    INSERT query standard mode
    insert into test_table(A, B) values (, )
    changed to
    insert into test_table(A, B) values (, )

    View Slide

  35. AcraSE cryptographic design
    SELECT query
    select * from test_table where A=
    changed to
    select * from test_table where substring("A" from 1 for MAC_BYTE_LEN)=

    View Slide

  36. AcraSE configuration
    Main configuration (YAML) Encryption configuration

    View Slide

  37. AcraSE proxy design benefit

    View Slide

  38. Future work
    1) Secure Index truncation and false positives filtering.
    2) Performance evaluation.
    3) Extension of query expressiveness.
    4) Data entropy learning.
    github.com/cossacklabs/acra

    View Slide

  39. Conclusions
    1) Searchable encryption is modern and not completely
    stable.
    2) There is a lack of existing SQL solutions.
    3) Secure (blind) indexing approach is the one of reliable
    techniques for building secure SE schemes.

    View Slide

  40. Reading list
    http://cs.brown.edu/~seny/
    https://www.usenix.org/system/files/conference/osd
    i16/osdi16-papadimitriou.pdf
    https://subs.emis.de/LNI/Proceedings/Proceedings
    228/115.pd
    https://inst.eecs.berkeley.edu/~cs261/fa
    17/scribe/08_28_encdata.pdf

    View Slide

  41. Thank you! Any questions?
    Artem Storozhuk
    [email protected]

    View Slide