Levelling up database security by thinking in APIs

Slide 1

Slide 1 text

Levelling up database security by thinking in APIs Lindsay Holmwood @auxesis Chief Product Officer @ CipherStash

Slide 2

Slide 2 text

The problem

Slide 3

Slide 3 text

Techniques for building secure APIs have improved tremendously over the last decade. Database security is mostly unchanged.

Slide 4

Slide 4 text

Average breach costs $4.24m USD 10% increase in average total cost of breach between 20202021

Slide 5

Slide 5 text

The landscape is changing ○ Compliance requirements (e.g., GDPR, CCPA are becoming more stringent ○ Ransomware cost $20B globally in 2020 ○ Attackers are becoming more sophisticated (exploiting supply chains, brokering access) and are moving faster Notable breaches 2015 Anthem Health 80 million health records 2020 Nintendo 160,000 user accounts exposed 2020 BigFooty.com 132GB sensitive data in Elastic 2020 Antheus Tecnologia 81.5 million personal records 2019 CapitalOne 100m personal records

Slide 6

Slide 6 text

In 2020, over 300,000 patient records (including detailed consult notes) were leaked and used to extort users. Vastaamo’s system violated one of the “first principles of cybersecurity”: It didn’t anonymize the records. It didn’t even encrypt them. The only thing protecting patients’ confessions and confidences were a couple of firewalls and a server login screen.  Mikael Koivukangas, OneSys Medical Case study: Vastaamo

Slide 7

Slide 7 text

The techniques

Slide 8

Slide 8 text

Techniques sorted by breach Source: IBM Cost of a Data Breach Report 2021 Compromised credentials

Slide 9

Slide 9 text

Attackers use stolen credentials to gain access to a target. Credentials can come from: ● Public data breaches ● Version control ● BEC & phishing ● Password stores Compromised credentials Source: IBM Cost of a Data Breach report 2021 Source: MITRE ATT&CK Average time to discovery: 250 days

Slide 10

Slide 10 text

Cloud misconfiguration Types of misconfiguration: ● Default ● Unused features ● Untested Can be used to: ● Expose information ● Gain access Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten Average time to discovery: 186 days

Slide 11

Slide 11 text

SQL injection Malicious user input used in SQL queries. Can be used to: ● Exfil data ● Tamper with data ● Escalate privileges Average time to discovery: 154 days Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten

Slide 12

Slide 12 text

Observer can: ○ view data in transit ○ manipulate data in request/response Person in the Middle Source: OWASP Top Ten

Slide 13

Slide 13 text

Denial of Service Make the service unavailable for legitimate users Resource exhaustion (network, CPU, memory, storage, IO Can be used as cover for remote code execution and data exfil Source: OWASP Top Ten

Slide 14

Slide 14 text

What are the big API security advances in the last decade?

Slide 15

Slide 15 text

What can we learn from APIs and apply to databases?

Slide 16

Slide 16 text

1. Standardised serialisation formats

Slide 17

Slide 17 text

Strongly typed communication for: ● Network transport ● Storage Reduces attack surface, to mitigate attacks like ● SQL injection Serialisation formats

Slide 18

Slide 18 text

Example: Protocol Buffers Binary representation of data structures: 1. Describe data structure using built in types 2. Compile bindings for languages 3. Encode/decode data structure in efficient binary format Supports basic backwards compatibility via tags. service SearchService { rpc Search(SearchRequest) returns (SearchResponse); } message SearchRequest { required string query = 1; optional int32 page_number = 2; optional int32 result_per_page = 3; } message SearchResponse { repeated Result results = 1; } message Result { string url = 1; string title = 2; repeated string snippets = 3; }

Slide 19

Slide 19 text

Example: BSON Lightweight binary representation of data structures. Binary encoding of JSON-like data (includes field names in encoded data). Handle marshal/unmarshal in each language. {"hello": "world"} → \x16\x00\x00\x00 // total document size \x02 // 0x02 = type String hello\x00 // field name \x06\x00\x00\x00world\x00 // field value \x00 // 0x00 = type EOO

Slide 20

Slide 20 text

For databases?

Slide 21

Slide 21 text

Build secure clients, faster: ● Automatically generate clients for different languages ● Automatically generate documentation ● Backwards compatibility baked in Serialisation formats for databases

Slide 22

Slide 22 text

Deserialization attacks: ● Injection — data injection, only support primitive data types ● Privilege escalation — gaining RCE through object deserialisation Denial of Service attacks: ● Resource exhaustion — drop and log bad deserialisations Serialisation formats — defend against:

Slide 23

Slide 23 text

Defence in depth: ● Use strongly typed languages to stop injection attacks propagating from client to server “New” attacks like request smuggling Serialisation formats — but also consider:

Slide 24

Slide 24 text

2. RPC

Slide 25

Slide 25 text

RPC  before Single Request/Response APIs: ● CORBA ● SOAP HTTP, XML ● XMLRPC ● REST HTTP, URI, JSON, XML Databases: ● Unique wire protocols

Slide 26

Slide 26 text

Use code generation to handle: ● Routes ● Serialisation ● HTTP methods, request/response headers ● Errors RPC  now

Slide 27

Slide 27 text

Example: gRPC From Google Uses protobufs Requires HTTP/2 Bidirectional streaming

Slide 28

Slide 28 text

Example: Twirp From Twitch Supports binary and JSON payloads HTTP 1.1 only No bidirectional streaming

Slide 29

Slide 29 text

Example: GraphQL “Query language for APIs” Single API endpoint. Clients request the data and the structure. New fields and types can be added without affecting existing queries. Query: { person { name height } } Response: { “person”: { “name”: “Ada Lovelace”, “height”: 166 } }

Slide 30

Slide 30 text

For databases?

Slide 31

Slide 31 text

RPC for databases Ensure protocol compatibility between client and server ● Force clients to upgrade to latest versions Reduce attack surface ● To only what the endpoint explicitly exposes ● Stop enumeration

Slide 32

Slide 32 text

Broken authentication ● Session timeouts to limit foothold, through short lived tokens Broken access controls ● Privilege escalation, through scoped credentials Denial of service ● Strict encoding and deserialization ● Logging of deserialization failures RPC  defend against:

Slide 33

Slide 33 text

gRPC reflection ● Enumerates gRPC services ● Exposes protobufs in human readable format (arguments, fields) You can use this now! ● ProfaneDB defines schema in protobufs and talks gRPC RPC  but also consider:

Slide 34

Slide 34 text

3. Auth

Slide 35

Slide 35 text

Auth — before Authentication: ● Challenge–Response authentication ● Secure Remote Password protocol ● Client certificate authentication

Slide 36

Slide 36 text

Auth — now Authentication: ● OAuth2  JWT ● SAML ● Self managed identity via G Suite, O365 Proliferation of third party IDP ● Auth0 ● Ping ● Okta

Slide 37

Slide 37 text

For databases?

Slide 38

Slide 38 text

Auth for databases Don’t roll your own auth — use third party identity provider Untrusted clients, trusted servers: ● Client authenticates to IDP ● IDP sets up session with database ● Database is ignorant of users — only knows if IDP gives an OK

Slide 39

Slide 39 text

Auth for databases Benefits: ● Less code, lower ongoing costs ● Database is integrated with broader organisational IAM controls You can use this now! ● MongoDB, OpenSearch, CouchDB all support JWT authentication

Slide 40

Slide 40 text

Auth — defend against: Broken authentication ● Limit impact of compromised credentials and account takeovers ⬆ involved in 20% of all breaches Broken access controls ● Privilege escalation, through strictly scoped credentials

Slide 41

Slide 41 text

4. TLS everywhere

Slide 42

Slide 42 text

Certs were costly! Economise by not using TLS everywhere: ● TLS termination at your load balancers ● Unencrypted from load balancers onwards Poor automation for managing cert lifecycle Poor visibility into certificate supply chain TLS  before

Slide 43

Slide 43 text

Certificates are basically free Proliferation of end-to-end TLS Better developer experience for the entire lifecycle: ○ Let’s Encrypt — automates nearly the entire cert lifecycle ○ mkcert — can use certs in local dev Certificate Transparency logs create supply chain visibility TLS  now

Slide 44

Slide 44 text

For databases?

Slide 45

Slide 45 text

TLS for databases Terminate TLS in the database server itself Handle the cert lifecycle in the database server itself Use well-automated PKI infrastructure Strictly use Forward Secrecy ciphers (ECDHE, DHE

Slide 46

Slide 46 text

Sensitive data exposure: ● Observer can view data in transit (PITM Injection attacks: ● Attacker can inject data into request/response (PITM Replay attacks (with TLS 1.2 ● Attacker can perform operations repeatedly Impersonation: ● Monitor cert transparency logs for compromised CAs TLS  defend against:

Slide 47

Slide 47 text

$ subfinder -silent -d cipherstash.com discuss.cipherstash.com landing.cipherstash.com docs.cipherstash.com dev.cipherstash.com Easier passive asset discovery: ● Cert transparency logs fasttrack some asset discovery TLS  but also consider:

Slide 48

Slide 48 text

Zero trust

Slide 49

Slide 49 text

“never trust, always verify” Build all your systems like they are connected to the public internet All input is untrusted — sanitise everything Expose database to the network?

Slide 50

Slide 50 text

Thank you! 🙋 What questions do you have? 💖 the talk? Let @auxesis know.

Slide 51

Slide 51 text

Appendix: Data Serialization Formats ● Protocol Buffers [developers.google.com] ● BSON [bsonspec.org] ● Apache Avro [arvo.apache.org]

Slide 52

Slide 52 text

Appendix: JWT-based database authentication ● Custom JWT Authentication [docs.mongodb.com] ● Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for Elasticsearch and Kibana [aws.amazon.com] ● Authentication — Apache CouchDB [docs.couchdb.org]

Slide 53

Slide 53 text

Appendix: Attack Techniques ● HTTP Request Smuggling [portswigger.net] ● Credential Access techniques [attack.mitre.org]

Slide 54

Slide 54 text

Other security advances I didn’t have time to cover ● Web Application Firewalls ● Infracode static analysis ○ Semgrep ● Reproducible builds ○ Bazel