Levelling up database security by thinking in APIs

Levelling up database security by thinking in APIs Lindsay Holmwood
@auxesis Chief Product Officer @ CipherStash

The problem

Techniques for building secure APIs have improved tremendously over the
last decade. Database security is mostly unchanged.

Average breach costs $4.24m USD 10% increase in average total
cost of breach between 20202021

The landscape is changing ◦ Compliance requirements (e.g., GDPR, CCPA
are becoming more stringent ◦ Ransomware cost $20B globally in 2020 ◦ Attackers are becoming more sophisticated (exploiting supply chains, brokering access) and are moving faster Notable breaches 2015 Anthem Health 80 million health records 2020 Nintendo 160,000 user accounts exposed 2020 BigFooty.com 132GB sensitive data in Elastic 2020 Antheus Tecnologia 81.5 million personal records 2019 CapitalOne 100m personal records

In 2020, over 300,000 patient records (including detailed consult notes)
were leaked and used to extort users. Vastaamo’s system violated one of the “first principles of cybersecurity”: It didn’t anonymize the records. It didn’t even encrypt them. The only thing protecting patients’ confessions and confidences were a couple of firewalls and a server login screen.  Mikael Koivukangas, OneSys Medical Case study: Vastaamo

The techniques

Techniques sorted by breach Source: IBM Cost of a Data
Breach Report 2021 Compromised credentials

Attackers use stolen credentials to gain access to a target.
Credentials can come from: • Public data breaches • Version control • BEC & phishing • Password stores Compromised credentials Source: IBM Cost of a Data Breach report 2021 Source: MITRE ATT&CK Average time to discovery: 250 days

Cloud misconfiguration Types of misconfiguration: • Default • Unused features
• Untested Can be used to: • Expose information • Gain access Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten Average time to discovery: 186 days

SQL injection Malicious user input used in SQL queries. Can
be used to: • Exfil data • Tamper with data • Escalate privileges Average time to discovery: 154 days Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten

Observer can: ◦ view data in transit ◦ manipulate data
in request/response Person in the Middle Source: OWASP Top Ten

Denial of Service Make the service unavailable for legitimate users
Resource exhaustion (network, CPU, memory, storage, IO Can be used as cover for remote code execution and data exfil Source: OWASP Top Ten

What are the big API security advances in the last
decade?

What can we learn from APIs and apply to databases?

1. Standardised serialisation formats

Strongly typed communication for: • Network transport • Storage Reduces
attack surface, to mitigate attacks like • SQL injection Serialisation formats

Example: Protocol Buffers Binary representation of data structures: 1. Describe
data structure using built in types 2. Compile bindings for languages 3. Encode/decode data structure in efficient binary format Supports basic backwards compatibility via tags. service SearchService { rpc Search(SearchRequest) returns (SearchResponse); } message SearchRequest { required string query = 1; optional int32 page_number = 2; optional int32 result_per_page = 3; } message SearchResponse { repeated Result results = 1; } message Result { string url = 1; string title = 2; repeated string snippets = 3; }

Example: BSON Lightweight binary representation of data structures. Binary encoding
of JSON-like data (includes field names in encoded data). Handle marshal/unmarshal in each language. {"hello": "world"} → \x16\x00\x00\x00 // total document size \x02 // 0x02 = type String hello\x00 // field name \x06\x00\x00\x00world\x00 // field value \x00 // 0x00 = type EOO

For databases?

Build secure clients, faster: • Automatically generate clients for different
languages • Automatically generate documentation • Backwards compatibility baked in Serialisation formats for databases

Deserialization attacks: • Injection — data injection, only support primitive
data types • Privilege escalation — gaining RCE through object deserialisation Denial of Service attacks: • Resource exhaustion — drop and log bad deserialisations Serialisation formats — defend against:

Defence in depth: • Use strongly typed languages to stop
injection attacks propagating from client to server “New” attacks like request smuggling Serialisation formats — but also consider:

2. RPC

RPC  before Single Request/Response APIs: • CORBA • SOAP
HTTP, XML • XMLRPC • REST HTTP, URI, JSON, XML Databases: • Unique wire protocols

Use code generation to handle: • Routes • Serialisation •
HTTP methods, request/response headers • Errors RPC  now

Example: gRPC From Google Uses protobufs Requires HTTP/2 Bidirectional streaming

Example: Twirp From Twitch Supports binary and JSON payloads HTTP
1.1 only No bidirectional streaming

Example: GraphQL “Query language for APIs” Single API endpoint. Clients
request the data and the structure. New fields and types can be added without affecting existing queries. Query: { person { name height } } Response: { “person”: { “name”: “Ada Lovelace”, “height”: 166 } }

For databases?

RPC for databases Ensure protocol compatibility between client and server
• Force clients to upgrade to latest versions Reduce attack surface • To only what the endpoint explicitly exposes • Stop enumeration

Broken authentication • Session timeouts to limit foothold, through short
lived tokens Broken access controls • Privilege escalation, through scoped credentials Denial of service • Strict encoding and deserialization • Logging of deserialization failures RPC  defend against:

gRPC reflection • Enumerates gRPC services • Exposes protobufs in
human readable format (arguments, fields) You can use this now! • ProfaneDB defines schema in protobufs and talks gRPC RPC  but also consider:

3. Auth

Auth — before Authentication: • Challenge–Response authentication • Secure Remote
Password protocol • Client certificate authentication

Auth — now Authentication: • OAuth2  JWT • SAML
• Self managed identity via G Suite, O365 Proliferation of third party IDP • Auth0 • Ping • Okta

For databases?

Auth for databases Don’t roll your own auth — use
third party identity provider Untrusted clients, trusted servers: • Client authenticates to IDP • IDP sets up session with database • Database is ignorant of users — only knows if IDP gives an OK

Auth for databases Benefits: • Less code, lower ongoing costs
• Database is integrated with broader organisational IAM controls You can use this now! • MongoDB, OpenSearch, CouchDB all support JWT authentication

Auth — defend against: Broken authentication • Limit impact of
compromised credentials and account takeovers ⬆ involved in 20% of all breaches Broken access controls • Privilege escalation, through strictly scoped credentials

4. TLS everywhere

Certs were costly! Economise by not using TLS everywhere: •
TLS termination at your load balancers • Unencrypted from load balancers onwards Poor automation for managing cert lifecycle Poor visibility into certificate supply chain TLS  before

Certificates are basically free Proliferation of end-to-end TLS Better developer
experience for the entire lifecycle: ◦ Let’s Encrypt — automates nearly the entire cert lifecycle ◦ mkcert — can use certs in local dev Certificate Transparency logs create supply chain visibility TLS  now

For databases?

TLS for databases Terminate TLS in the database server itself
Handle the cert lifecycle in the database server itself Use well-automated PKI infrastructure Strictly use Forward Secrecy ciphers (ECDHE, DHE

Sensitive data exposure: • Observer can view data in transit
(PITM Injection attacks: • Attacker can inject data into request/response (PITM Replay attacks (with TLS 1.2 • Attacker can perform operations repeatedly Impersonation: • Monitor cert transparency logs for compromised CAs TLS  defend against:

$ subfinder -silent -d cipherstash.com discuss.cipherstash.com landing.cipherstash.com docs.cipherstash.com dev.cipherstash.com Easier
passive asset discovery: • Cert transparency logs fasttrack some asset discovery TLS  but also consider:

Zero trust

“never trust, always verify” Build all your systems like they
are connected to the public internet All input is untrusted — sanitise everything Expose database to the network?

Thank you! 🙋 What questions do you have? 💖 the
talk? Let @auxesis know.

Appendix: Data Serialization Formats • Protocol Buffers [developers.google.com] • BSON
[bsonspec.org] • Apache Avro [arvo.apache.org]

Appendix: JWT-based database authentication • Custom JWT Authentication [docs.mongodb.com] •
Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for Elasticsearch and Kibana [aws.amazon.com] • Authentication — Apache CouchDB [docs.couchdb.org]

Appendix: Attack Techniques • HTTP Request Smuggling [portswigger.net] • Credential
Access techniques [attack.mitre.org]

Other security advances I didn’t have time to cover •
Web Application Firewalls • Infracode static analysis ◦ Semgrep • Reproducible builds ◦ Bazel

Levelling up database security by thinking in APIs

Levelling up database security by thinking in APIs

More Decks by Lindsay Holmwood

Other Decks in Technology

Featured

Transcript