$30 off During Our Annual Pro Sale. View Details »

Levelling up database security by thinking in APIs

Levelling up database security by thinking in APIs

2020 saw an escalation in the volume, intensity, and tempo of cyber attacks against critical information systems. In Australia, data breaches cost $3.9m on average. Globally, ransomware cost $20B+.

One contributing factor is how we build systems to handle data about our users. No matter if you're using SQL or NoSQL — you're likely still using many of the same techniques from the advent of the web to read and write data.

The last five years have seen big leaps in how developers are designing and building APIs. What if we apply those same techniques to databases? What sort of security improvements can we unlock?

Lindsay Holmwood

September 15, 2021
Tweet

More Decks by Lindsay Holmwood

Other Decks in Technology

Transcript

  1. Levelling up
    database security
    by thinking in APIs
    Lindsay Holmwood
    @auxesis
    Chief Product Officer @ CipherStash

    View Slide

  2. The problem

    View Slide

  3. Techniques for building secure APIs have
    improved tremendously over the last decade.
    Database security is mostly unchanged.

    View Slide

  4. Average breach costs
    $4.24m USD
    10% increase in
    average total cost of breach
    between 20202021

    View Slide

  5. The landscape is changing
    ○ Compliance requirements (e.g.,
    GDPR, CCPA are becoming
    more stringent
    ○ Ransomware cost $20B
    globally in 2020
    ○ Attackers are becoming more
    sophisticated (exploiting supply
    chains, brokering access) and
    are moving faster
    Notable breaches
    2015 Anthem Health
    80 million health records
    2020 Nintendo
    160,000 user accounts exposed
    2020 BigFooty.com
    132GB sensitive data in Elastic
    2020 Antheus Tecnologia
    81.5 million personal records
    2019 CapitalOne
    100m personal records

    View Slide

  6. In 2020, over 300,000 patient records (including detailed
    consult notes) were leaked and used to extort users.
    Vastaamo’s system violated one of the “first principles of
    cybersecurity”: It didn’t anonymize the records. It didn’t even
    encrypt them. The only thing protecting patients’ confessions and
    confidences were a couple of firewalls and a server login screen.
     Mikael Koivukangas, OneSys Medical
    Case study: Vastaamo

    View Slide

  7. The techniques

    View Slide

  8. Techniques sorted by breach
    Source: IBM Cost of a Data Breach Report 2021
    Compromised credentials

    View Slide

  9. Attackers use stolen credentials to gain access
    to a target.
    Credentials can come from:
    ● Public data breaches
    ● Version control
    ● BEC & phishing
    ● Password stores
    Compromised credentials
    Source: IBM Cost of a Data Breach report 2021
    Source: MITRE ATT&CK
    Average time to discovery:
    250 days

    View Slide

  10. Cloud misconfiguration
    Types of misconfiguration:
    ● Default
    ● Unused features
    ● Untested
    Can be used to:
    ● Expose information
    ● Gain access Source: IBM Cost of a Data Breach report 2021
    Source: OWASP Top Ten
    Average time to discovery:
    186 days

    View Slide

  11. SQL injection
    Malicious user input used in SQL queries.
    Can be used to:
    ● Exfil data
    ● Tamper with data
    ● Escalate privileges
    Average time to discovery:
    154 days
    Source: IBM Cost of a Data Breach report 2021
    Source: OWASP Top Ten

    View Slide

  12. Observer can:
    ○ view data in transit
    ○ manipulate data in
    request/response
    Person in the Middle
    Source: OWASP Top Ten

    View Slide

  13. Denial of Service
    Make the service unavailable for legitimate users
    Resource exhaustion (network, CPU, memory, storage, IO
    Can be used as cover for remote code execution and data exfil
    Source: OWASP Top Ten

    View Slide

  14. What are the big API security
    advances in the last decade?

    View Slide

  15. What can we
    learn from APIs
    and apply to databases?

    View Slide

  16. 1. Standardised
    serialisation
    formats

    View Slide

  17. Strongly typed communication for:
    ● Network transport
    ● Storage
    Reduces attack surface, to mitigate attacks like
    ● SQL injection
    Serialisation formats

    View Slide

  18. Example: Protocol Buffers
    Binary representation of data
    structures:
    1. Describe data structure using
    built in types
    2. Compile bindings for languages
    3. Encode/decode data structure in
    efficient binary format
    Supports basic backwards
    compatibility via tags.
    service SearchService {
    rpc Search(SearchRequest) returns (SearchResponse);
    }
    message SearchRequest {
    required string query = 1;
    optional int32 page_number = 2;
    optional int32 result_per_page = 3;
    }
    message SearchResponse {
    repeated Result results = 1;
    }
    message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
    }

    View Slide

  19. Example: BSON
    Lightweight binary representation of
    data structures.
    Binary encoding of JSON-like data
    (includes field names in encoded
    data).
    Handle marshal/unmarshal in each
    language.
    {"hello": "world"} →
    \x16\x00\x00\x00 // total document size
    \x02 // 0x02 = type String
    hello\x00 // field name
    \x06\x00\x00\x00world\x00 // field value
    \x00 // 0x00 = type EOO

    View Slide

  20. For databases?

    View Slide

  21. Build secure clients, faster:
    ● Automatically generate clients for different languages
    ● Automatically generate documentation
    ● Backwards compatibility baked in
    Serialisation formats for databases

    View Slide

  22. Deserialization attacks:
    ● Injection — data injection, only support primitive data types
    ● Privilege escalation — gaining RCE through object deserialisation
    Denial of Service attacks:
    ● Resource exhaustion — drop and log bad deserialisations
    Serialisation formats — defend against:

    View Slide

  23. Defence in depth:
    ● Use strongly typed languages to stop injection attacks
    propagating from client to server
    “New” attacks like request smuggling
    Serialisation formats — but also consider:

    View Slide

  24. 2. RPC

    View Slide

  25. RPC  before
    Single Request/Response APIs:
    ● CORBA
    ● SOAP HTTP, XML
    ● XMLRPC
    ● REST HTTP, URI, JSON, XML
    Databases:
    ● Unique wire protocols

    View Slide

  26. Use code generation to handle:
    ● Routes
    ● Serialisation
    ● HTTP methods, request/response headers
    ● Errors
    RPC  now

    View Slide

  27. Example: gRPC
    From Google
    Uses protobufs
    Requires HTTP/2
    Bidirectional streaming

    View Slide

  28. Example: Twirp
    From Twitch
    Supports binary and JSON payloads
    HTTP 1.1 only
    No bidirectional streaming

    View Slide

  29. Example: GraphQL
    “Query language for APIs”
    Single API endpoint.
    Clients request the data and the
    structure.
    New fields and types can be added
    without affecting existing queries.
    Query:
    {
    person {
    name
    height
    }
    }
    Response:
    {
    “person”: {
    “name”: “Ada Lovelace”,
    “height”: 166
    }
    }

    View Slide

  30. For databases?

    View Slide

  31. RPC for databases
    Ensure protocol compatibility between client and server
    ● Force clients to upgrade to latest versions
    Reduce attack surface
    ● To only what the endpoint explicitly exposes
    ● Stop enumeration

    View Slide

  32. Broken authentication
    ● Session timeouts to limit foothold, through short lived tokens
    Broken access controls
    ● Privilege escalation, through scoped credentials
    Denial of service
    ● Strict encoding and deserialization
    ● Logging of deserialization failures
    RPC  defend against:

    View Slide

  33. gRPC reflection
    ● Enumerates gRPC services
    ● Exposes protobufs in human readable format (arguments, fields)
    You can use this now!
    ● ProfaneDB defines schema in protobufs and talks gRPC
    RPC  but also consider:

    View Slide

  34. 3. Auth

    View Slide

  35. Auth — before
    Authentication:
    ● Challenge–Response authentication
    ● Secure Remote Password protocol
    ● Client certificate authentication

    View Slide

  36. Auth — now
    Authentication:
    ● OAuth2  JWT
    ● SAML
    ● Self managed identity via G Suite, O365
    Proliferation of third party IDP
    ● Auth0
    ● Ping
    ● Okta

    View Slide

  37. For databases?

    View Slide

  38. Auth for databases
    Don’t roll your own auth — use third party identity provider
    Untrusted clients, trusted servers:
    ● Client authenticates to IDP
    ● IDP sets up session with database
    ● Database is ignorant of users — only knows if IDP gives an OK

    View Slide

  39. Auth for databases
    Benefits:
    ● Less code, lower ongoing costs
    ● Database is integrated with broader organisational IAM controls
    You can use this now!
    ● MongoDB, OpenSearch, CouchDB all support JWT authentication

    View Slide

  40. Auth — defend against:
    Broken authentication
    ● Limit impact of compromised credentials and account takeovers
    ⬆ involved in 20% of all breaches
    Broken access controls
    ● Privilege escalation, through strictly scoped credentials

    View Slide

  41. 4. TLS everywhere

    View Slide

  42. Certs were costly!
    Economise by not using TLS everywhere:
    ● TLS termination at your load balancers
    ● Unencrypted from load balancers onwards
    Poor automation for managing cert lifecycle
    Poor visibility into certificate supply chain
    TLS  before

    View Slide

  43. Certificates are basically free
    Proliferation of end-to-end TLS
    Better developer experience for the entire lifecycle:
    ○ Let’s Encrypt — automates nearly the entire cert lifecycle
    ○ mkcert — can use certs in local dev
    Certificate Transparency logs create supply chain visibility
    TLS  now

    View Slide

  44. For databases?

    View Slide

  45. TLS for databases
    Terminate TLS in the database server itself
    Handle the cert lifecycle in the database server itself
    Use well-automated PKI infrastructure
    Strictly use Forward Secrecy ciphers (ECDHE, DHE

    View Slide

  46. Sensitive data exposure:
    ● Observer can view data in transit (PITM
    Injection attacks:
    ● Attacker can inject data into request/response (PITM
    Replay attacks (with TLS 1.2
    ● Attacker can perform operations repeatedly
    Impersonation:
    ● Monitor cert transparency logs for compromised CAs
    TLS  defend against:

    View Slide

  47. $ subfinder -silent -d cipherstash.com
    discuss.cipherstash.com
    landing.cipherstash.com
    docs.cipherstash.com
    dev.cipherstash.com
    Easier passive asset discovery:
    ● Cert transparency logs fasttrack some asset discovery
    TLS  but also consider:

    View Slide

  48. Zero trust

    View Slide

  49. “never trust, always verify”
    Build all your systems like they are connected to the public internet
    All input is untrusted — sanitise everything
    Expose database to the network?

    View Slide

  50. Thank you!
    🙋 What questions do you have?
    💖 the talk? Let @auxesis know.

    View Slide

  51. Appendix: Data Serialization Formats
    ● Protocol Buffers [developers.google.com]
    ● BSON [bsonspec.org]
    ● Apache Avro [arvo.apache.org]

    View Slide

  52. Appendix: JWT-based database authentication
    ● Custom JWT Authentication [docs.mongodb.com]
    ● Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for
    Elasticsearch and Kibana [aws.amazon.com]
    ● Authentication — Apache CouchDB [docs.couchdb.org]

    View Slide

  53. Appendix: Attack Techniques
    ● HTTP Request Smuggling [portswigger.net]
    ● Credential Access techniques [attack.mitre.org]

    View Slide

  54. Other security advances I didn’t have time to cover
    ● Web Application Firewalls
    ● Infracode static analysis
    ○ Semgrep
    ● Reproducible builds
    ○ Bazel

    View Slide