Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Levelling up database security by thinking in APIs

Levelling up database security by thinking in APIs

2020 saw an escalation in the volume, intensity, and tempo of cyber attacks against critical information systems. In Australia, data breaches cost $3.9m on average. Globally, ransomware cost $20B+.

One contributing factor is how we build systems to handle data about our users. No matter if you're using SQL or NoSQL — you're likely still using many of the same techniques from the advent of the web to read and write data.

The last five years have seen big leaps in how developers are designing and building APIs. What if we apply those same techniques to databases? What sort of security improvements can we unlock?


Lindsay Holmwood

September 15, 2021


  1. Levelling up database security by thinking in APIs Lindsay Holmwood

    @auxesis Chief Product Officer @ CipherStash
  2. The problem

  3. Techniques for building secure APIs have improved tremendously over the

    last decade. Database security is mostly unchanged.
  4. Average breach costs $4.24m USD 10% increase in average total

    cost of breach between 20202021
  5. The landscape is changing ◦ Compliance requirements (e.g., GDPR, CCPA

    are becoming more stringent ◦ Ransomware cost $20B globally in 2020 ◦ Attackers are becoming more sophisticated (exploiting supply chains, brokering access) and are moving faster Notable breaches 2015 Anthem Health 80 million health records 2020 Nintendo 160,000 user accounts exposed 2020 BigFooty.com 132GB sensitive data in Elastic 2020 Antheus Tecnologia 81.5 million personal records 2019 CapitalOne 100m personal records
  6. In 2020, over 300,000 patient records (including detailed consult notes)

    were leaked and used to extort users. Vastaamo’s system violated one of the “first principles of cybersecurity”: It didn’t anonymize the records. It didn’t even encrypt them. The only thing protecting patients’ confessions and confidences were a couple of firewalls and a server login screen.  Mikael Koivukangas, OneSys Medical Case study: Vastaamo
  7. The techniques

  8. Techniques sorted by breach Source: IBM Cost of a Data

    Breach Report 2021 Compromised credentials
  9. Attackers use stolen credentials to gain access to a target.

    Credentials can come from: • Public data breaches • Version control • BEC & phishing • Password stores Compromised credentials Source: IBM Cost of a Data Breach report 2021 Source: MITRE ATT&CK Average time to discovery: 250 days
  10. Cloud misconfiguration Types of misconfiguration: • Default • Unused features

    • Untested Can be used to: • Expose information • Gain access Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten Average time to discovery: 186 days
  11. SQL injection Malicious user input used in SQL queries. Can

    be used to: • Exfil data • Tamper with data • Escalate privileges Average time to discovery: 154 days Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten
  12. Observer can: ◦ view data in transit ◦ manipulate data

    in request/response Person in the Middle Source: OWASP Top Ten
  13. Denial of Service Make the service unavailable for legitimate users

    Resource exhaustion (network, CPU, memory, storage, IO Can be used as cover for remote code execution and data exfil Source: OWASP Top Ten
  14. What are the big API security advances in the last

  15. What can we learn from APIs and apply to databases?

  16. 1. Standardised serialisation formats

  17. Strongly typed communication for: • Network transport • Storage Reduces

    attack surface, to mitigate attacks like • SQL injection Serialisation formats
  18. Example: Protocol Buffers Binary representation of data structures: 1. Describe

    data structure using built in types 2. Compile bindings for languages 3. Encode/decode data structure in efficient binary format Supports basic backwards compatibility via tags. service SearchService { rpc Search(SearchRequest) returns (SearchResponse); } message SearchRequest { required string query = 1; optional int32 page_number = 2; optional int32 result_per_page = 3; } message SearchResponse { repeated Result results = 1; } message Result { string url = 1; string title = 2; repeated string snippets = 3; }
  19. Example: BSON Lightweight binary representation of data structures. Binary encoding

    of JSON-like data (includes field names in encoded data). Handle marshal/unmarshal in each language. {"hello": "world"} → \x16\x00\x00\x00 // total document size \x02 // 0x02 = type String hello\x00 // field name \x06\x00\x00\x00world\x00 // field value \x00 // 0x00 = type EOO
  20. For databases?

  21. Build secure clients, faster: • Automatically generate clients for different

    languages • Automatically generate documentation • Backwards compatibility baked in Serialisation formats for databases
  22. Deserialization attacks: • Injection — data injection, only support primitive

    data types • Privilege escalation — gaining RCE through object deserialisation Denial of Service attacks: • Resource exhaustion — drop and log bad deserialisations Serialisation formats — defend against:
  23. Defence in depth: • Use strongly typed languages to stop

    injection attacks propagating from client to server “New” attacks like request smuggling Serialisation formats — but also consider:
  24. 2. RPC

  25. RPC  before Single Request/Response APIs: • CORBA • SOAP

    HTTP, XML • XMLRPC • REST HTTP, URI, JSON, XML Databases: • Unique wire protocols
  26. Use code generation to handle: • Routes • Serialisation •

    HTTP methods, request/response headers • Errors RPC  now
  27. Example: gRPC From Google Uses protobufs Requires HTTP/2 Bidirectional streaming

  28. Example: Twirp From Twitch Supports binary and JSON payloads HTTP

    1.1 only No bidirectional streaming
  29. Example: GraphQL “Query language for APIs” Single API endpoint. Clients

    request the data and the structure. New fields and types can be added without affecting existing queries. Query: { person { name height } } Response: { “person”: { “name”: “Ada Lovelace”, “height”: 166 } }
  30. For databases?

  31. RPC for databases Ensure protocol compatibility between client and server

    • Force clients to upgrade to latest versions Reduce attack surface • To only what the endpoint explicitly exposes • Stop enumeration
  32. Broken authentication • Session timeouts to limit foothold, through short

    lived tokens Broken access controls • Privilege escalation, through scoped credentials Denial of service • Strict encoding and deserialization • Logging of deserialization failures RPC  defend against:
  33. gRPC reflection • Enumerates gRPC services • Exposes protobufs in

    human readable format (arguments, fields) You can use this now! • ProfaneDB defines schema in protobufs and talks gRPC RPC  but also consider:
  34. 3. Auth

  35. Auth — before Authentication: • Challenge–Response authentication • Secure Remote

    Password protocol • Client certificate authentication
  36. Auth — now Authentication: • OAuth2  JWT • SAML

    • Self managed identity via G Suite, O365 Proliferation of third party IDP • Auth0 • Ping • Okta
  37. For databases?

  38. Auth for databases Don’t roll your own auth — use

    third party identity provider Untrusted clients, trusted servers: • Client authenticates to IDP • IDP sets up session with database • Database is ignorant of users — only knows if IDP gives an OK
  39. Auth for databases Benefits: • Less code, lower ongoing costs

    • Database is integrated with broader organisational IAM controls You can use this now! • MongoDB, OpenSearch, CouchDB all support JWT authentication
  40. Auth — defend against: Broken authentication • Limit impact of

    compromised credentials and account takeovers ⬆ involved in 20% of all breaches Broken access controls • Privilege escalation, through strictly scoped credentials
  41. 4. TLS everywhere

  42. Certs were costly! Economise by not using TLS everywhere: •

    TLS termination at your load balancers • Unencrypted from load balancers onwards Poor automation for managing cert lifecycle Poor visibility into certificate supply chain TLS  before
  43. Certificates are basically free Proliferation of end-to-end TLS Better developer

    experience for the entire lifecycle: ◦ Let’s Encrypt — automates nearly the entire cert lifecycle ◦ mkcert — can use certs in local dev Certificate Transparency logs create supply chain visibility TLS  now
  44. For databases?

  45. TLS for databases Terminate TLS in the database server itself

    Handle the cert lifecycle in the database server itself Use well-automated PKI infrastructure Strictly use Forward Secrecy ciphers (ECDHE, DHE
  46. Sensitive data exposure: • Observer can view data in transit

    (PITM Injection attacks: • Attacker can inject data into request/response (PITM Replay attacks (with TLS 1.2 • Attacker can perform operations repeatedly Impersonation: • Monitor cert transparency logs for compromised CAs TLS  defend against:
  47. $ subfinder -silent -d cipherstash.com discuss.cipherstash.com landing.cipherstash.com docs.cipherstash.com dev.cipherstash.com Easier

    passive asset discovery: • Cert transparency logs fasttrack some asset discovery TLS  but also consider:
  48. Zero trust

  49. “never trust, always verify” Build all your systems like they

    are connected to the public internet All input is untrusted — sanitise everything Expose database to the network?
  50. Thank you! 🙋 What questions do you have? 💖 the

    talk? Let @auxesis know.
  51. Appendix: Data Serialization Formats • Protocol Buffers [developers.google.com] • BSON

    [bsonspec.org] • Apache Avro [arvo.apache.org]
  52. Appendix: JWT-based database authentication • Custom JWT Authentication [docs.mongodb.com] •

    Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for Elasticsearch and Kibana [aws.amazon.com] • Authentication — Apache CouchDB [docs.couchdb.org]
  53. Appendix: Attack Techniques • HTTP Request Smuggling [portswigger.net] • Credential

    Access techniques [attack.mitre.org]
  54. Other security advances I didn’t have time to cover •

    Web Application Firewalls • Infracode static analysis ◦ Semgrep • Reproducible builds ◦ Bazel