Data is a new security boundary

Data is a new security boundary @vixentael

@vixentael Head of customer solutions, Security software engineer at Cossack
Labs. I’m focused on data security and applied cryptography, building e2ee protocols, and security controls around crypto. Core maintainer of Themis cryptolib. cossacklabs.com

@vixentael Things we won’t talk about Exact ciphers, symmetric vs
asymmetric encryption. TLS. Typical cryptographic mistakes developers do. Privacy, and evil corporations. Recent data incidents and breaches. FUD. What library / tool to use to encrypt your data the best way.

@vixentael What we will talk about Data security 101: encryption,
OWASP, regulations. Cases: observations of real apps that combine encryption with supporting security controls. Encryption ways: ALE, FLE, E2EE, ZKA/ZT.

Data security 101

@vixentael Your app’s data is everywhere. (apps, public clouds, databases,
backups, 3rd APIs, analytics, etc). No perimeters, no trusted zones anymore.

@vixentael

@vixentael Your app’s data is everywhere. (apps, public clouds, databases,
backups, 3rd APIs, analytics, etc). No perimeters, no trusted zones anymore. Data security measures become security boundary for data. It's not about "protect the data where it's stored". It’s “protect the data whenever it exists”.

@vixentael Data security depends on a data fl ow gathering
secure processing secure storage and backups secure disclosure data removal data migration Gathering Processing Output logging, analytics

@vixentael Data security depends on a data fl ow gathering
secure processing secure storage and backups secure disclosure data removal data migration Gathering Processing Output logging, analytics leakage, loss, disclosure never removed, disclosure, loss gathering without consent

@vixentael Data security 101 1. Identify sensitive data, understand sensitive
data lifecycle, classify data. 2. Identify risks to data. 3. Build trust models, understand risk impact. 4. Prioritize threat vectors. 5. Select and implement proper security controls for exploitable high risk vectors (to prevent risks and to identify leaks).

Encryption

@vixentael Encryption is an ultimate data security measure 1. When
data is properly encrypted, it can’t be suddenly, unnoticeably decrypted. 2. Even if leaked, data is encrypted. Encryption is the best access control. 3. Protections against insiders & outsiders. 4. Properly con fi gured encryption allows mistakes in other security controls. 5. Regulations, compliance.

@vixentael Regulations, compliance media.defense.gov/2018/Apr/22/2001906836/-1/-1/0/ DEFENSEINNOVATIONBOARD_TEN_COMMANDMENTS_OF_SOFTWARE_2018.04.20.PDF

@vixentael Regulations, compliance gdpr-info.eu/issues/encryption/ GDPR art 32/35: responsibly store and
process data according to risks. GDPR art 33/34: detecting data leakage and alert users & controller.

@vixentael Regulations, compliance cossacklabs.com/blog/what-we-need-to-encrypt-cheatsheet.html also, HIPAA, FISMA, FIPS, FedRAMP, CCPA,
PCI DSS, FERPA, and many more

@vixentael OWASP Top10 2021 A01:2021-Broken Access Control. A02:2021-Cryptographic Failures. A03:2021-Injection.
A04:2021-Insecure Design. A05:2021-Security Miscon fi guration. A06:2021-Vulnerable and Outdated Components. A07:2021-Identi fi cation and Authentication Failures. A08:2021-Software and Data Integrity Failures. A09:2021-Security Logging and Monitoring Failures. A10:2021-Server-Side Request Forgery. owasp.org/Top10/

@vixentael A02:2021-Cryptographic Failures. owasp.org/Top10/A02_2021-Cryptographic_Failures/ Focused mostly on crypto usage and
implementation. Bad ciphers: old and insecure? Wrong AES modes? AES-CBC instead of AES-GCM? Asymmetric encr where symmetric should be used? Bad keys: short, low entropy keys? Math.random? User password used as encryption keys without a proper KDF? Bad KDF choice / params? Unsuitable crypto-primitives choice: MD5 instead of Argon2DI, AES- OFB instead of GCM. SHA-256 instead of HMAC-SHA256. Home-brewed crypto.

@vixentael A04:2021-Insecure Design. owasp.org/Top10/A04_2021-Insecure_Design/ Focused on design, missing or wrong
security controls. Bad key management: storing encryption keys in plaintext together with data. Lack of rotation, revocation, expiration? Components are trusted when they shouldn’t be? One encryption key for everything? Lack of PKI? Lack of encryption for sensitive assets. Home-brewed encryption protocols. Encryption is not supported by authN, access control, logging & monitoring.

@vixentael OWASP ASVS github.com/OWASP/ASVS V6: Stored Cryptography, V8: Data Protection,
V9: Communications

@vixentael OWASP MASVS github.com/OWASP/owasp-masvs/ V3: Cryptography, V2: Data Storage, V5:
Network Communication

@vixentael *ASVS

Encryption models and ways

@vixentael Encryption Data stored encrypted locally – data-at-rest encryption; also
FS/OS encryption, database encryption. host OS / server app host OS / server app Transport layer encryption – data-in-transit encryption (TLS, IPSec, SSH). host OS / server app

@vixentael Application-level encryption (ALE) Encryption process happening within application context,
triggered by an application. ALE could work together with data-at-rest encryption and data- in-transit encryption. ALE could be client-side, server-side, end-to-end, etc. infoq.com/articles/ale-software-architects/ Encryption is easy, key management is hard.

@vixentael TLS (in transit) application-level encryption server 1 server 2
server 3 Alice Carol Bob server 1 server 2 server 3 Alice Carol Bob encrypted encrypted infoq.com/articles/ale-software-architects/

@vixentael Application-level encryption data encrypted by any app – application-
level encryption (ALE) app ALE happens on a client side – client-side encryption client ALE happens on a server side – server-side encryption server proxy … proxy-side encryption infoq.com/articles/ale-software-architects/

@vixentael Field-level encryption Only some data fi elds are encrypted
– fi eld-level encryption (FLE). { "name": base64_str(encrypted_name),   "phone": base64_str(encrypted_phone),   "passport": base64_str(encrypted_passport), "ID": user_ID, "last_activity_date": timestamp, ... }

@vixentael End-to-end encryption Alice App-side encryption when no keys/ secrets/data
is available to the intermediate infrastructure – end-to-end encryption. Bob speakerdeck.com/vixentael/e2ee-equals-security-equals-privacy Encryption should work on all selected platforms. Key management is tricky – backend should work only as key discovery service without access to private/secret keys. Complicated to design, easy to maintain, hard to debug.

@vixentael TLS and ALE have different threat models, it’s unfair
to compare them, but we will :)

@vixentael encryption controls / events transit (TLS) disk / FS
TDE / DB encryption ALE E2EE physical access to servers ⛔ ✅ ✅ ✅ ✅ MitM ✅ ⛔ ⛔ ✅ ✅ privileged DB access ⛔ ⛔ Depends ✅ ✅ privileged system access ⛔ ⛔ ⛔ Depends ✅ backups, logs, snapshots ⛔ ⛔ Depends ✅ ✅ infoq.com/articles/ale-software-architects/

@vixentael If E2EE is so great, why we don’t use
it everywhere? TLS FS/OS encr, TDE custom data-at- rest encryption ALE E2EE security efforts, tradeoffs key storage, key rotation, key revocation, data re-encryption, consistency, backups, tying keys w/ identity, search in encrypted data, logging monitoring, and all the NIST SP 800-57, 800-53.

@vixentael Zero Trust / Zero Trust Architecture – assumes there
is no implicit trust granted to assets or user accounts based solely on their physical or network location. No asset is inherently trusted. nist.gov/publications/zero-trust-architecture ZT is more about access control and authN than encryption.

@vixentael Zero Knowledge Architecture (ZKA) – system where no one
has access to unencrypted data, except the user (node, service, person). Also known as “No Knowledge” Systems. Typically built on E2EE + strong authN + privacy-respectful design. See also: ZKP, SMP, PAKE, OPAQUE; FHE, searchable encryption. cossacklabs.com/solutions/e2ee-zero-trust/

@vixentael Searchable encryption Perform queries on encrypted data without decryption.
Different schemes are possible: SSE, PEKS, blind index, (F)HE, etc. cossacklabs.com/blog/secure-search-over-encrypted-data-acra-se/ eprint.iacr.org/2019/806.pdf Most realistic: keyword search (blind index).

@vixentael *ASVS ALE, E2EE

@vixentael Other exciting crypto terms Privacy-enhancing cryptography: SMPC, PSI, PIR,
FHE, PAKE, OPAQUE. ZK: ZKP, zk-SNARKs, zk-STARKs, zk-SNORKs :) nist.gov/blogs/cybersecurity-insights/privacy-enhancing-cryptography-complement-differential-privacy Crypto reinforced guarantees in data structures: blockchain, Merkle-tree. PQC. hackernoon.com/eli5-zero-knowledge-proof-78a276db9eff cossacklabs.com/blog/crypto-signed-audit-logs.html aumasson.jp/data/talks/quantum-poc-2021.pdf blog.cloud fl are.com/opaque-oblivious-passwords/ blog.cloud fl are.com/the-tls-post-quantum-experiment/

@vixentael SMPC, PIR, zk- SNARKs, PQC

Crypto is more useful when integrated with traditional security controls.

@vixentael data encryption access control, authN transport encryption access logging
honeypots, SIEMs

@vixentael

AAA WAF honey pots IDS infra mngmt compartmentalization access logging
jailbans monitoring data fi rewall SIEM HIDS DAST SAST HSM PKI TPM honey tokens dependency mngmt UEBA IAM TLS TDE @vixentael API protection obfuscation anti-RE csrc.nist.gov/publications/detail/sp/800-53/rev-5/ fi nal RTFM

@vixentael Security controls to support crypto 1. Use encryption to
protect sensitive data globally during the whole data fl ow. 2. Whatever is the attack vector, there is a defense layer. 3. For most popular attack vectors, set up as many independent and overlapping defenses as possible. ✅ ✅ ✅

@vixentael *ASVS ALE, E2EE crypto + security controls

Let’s see some real-world cases

Field-level data encryption for SaaS platforms

@vixentael Who we are and what we want Huge B2B
SaaS platforms. Protect from insiders, provide transparency, detect malicious users. Encrypt data per customer, using different keys, BYOK. Minimize lifecycle of plaintext data – use encryption as early as possible to the data generation point.

application @vixentael Client-side fi eld-level encryption MongoDB MongoDB SDK MongoDB
stores records with encrypted fi elds encryption / decryption TLS writes records with encrypted fi elds reads records with encrypted fi elds TLS docs.mongodb.com/drivers/security/client-side- fi eld-level-encryption-guide key vault, KMS

@vixentael Key hierarchy docs.mongodb.com/drivers/security/client-side- fi eld-level-encryption-guide MongoDB fi eld, encrypted
by DEK Key Vault DEK, encrypted by CMK KMS CMK millions dozens 1

@vixentael Pros & Cons docs.mongodb.com/drivers/security/client-side- fi eld-level-encryption-guide Extremely useful when
you have MongoDB. Client apps shares responsibility for en/decryption and key management due to exposure to key material. Deterministic & non-deterministic encryptions available. Support for different encryption keys (DEK).

application @vixentael Proxy-side fi eld-level encryption Acra github.com/cossacklabs/acra Database stores
records with encrypted fi elds writes records with encrypted fi elds reads records with encrypted fi elds Acra proxy encryption / decryption TLS TLS TLS key vault, KMS

@vixentael Key hierarchy Database fi eld encrypted by DEK, DEK
encrypted by KEK Key Vault KEK, encrypted by CMK KMS CMK millions dozens 1 github.com/cossacklabs/acra

@vixentael Pros & Cons Neither database, not application doesn’t know
that the data is encrypted. Proxy app is responsible for en/decryption. Easy to scale and build DAO-based architectures. Non-deterministic and searchable encryptions available. Support for different encryption keys, BYOK, key rotation, revocation, etc. Easy-to-armor a single encryption layer with speci fi c controls: DLP, anomaly detection, fi rewaling, anonymisation, etc. github.com/cossacklabs/acra

@vixentael ALE for NoCode platform API frontend database DAO encryption
integration API frontend database DAO encryption integration customer’s pod NoCode platform tech logs, analytics ... ...

@vixentael ALE for NoCode platform MongoDB key vault, KMS DAO
+ ALE encryption module API frontend fi elds are encrypted + TLS fi elds are plaintext + TLS

@vixentael Crypto + supporting controls 1. Key management, separate key
per customer (BYOK). 2. Full compartmentalization: customer’s data is located in different DBs, encrypted by different key, each app uses its own DAO. 3. Full transparency — the platform doesn’t have access to customer’s data. 1. Logging, monitoring. 2. Alerting on suspicious activity, fi rewaling. 3. ASVS: API protection, anti throttling, anti-fraud, access control …

@vixentael ALE for fi ntech platform Service1 PostgreSQL encryption /
decryption layer Service2 BI Analytics ServiceN ... DAO1 DAON load balancing ... MySQL key vault, KMS ... fi elds are encrypted + TLS fi elds are plaintext + TLS

@vixentael Crypto + supporting controls 1. Key management, separate keys
per domain. 2. Decryption & anonymisation of data for BI software. 3. Isolation of sensitive data from non-sensitive. 1. Logging, monitoring. 2. PCI DSS audit logging + crypto-veri fi able logging. 3. Alerting on suspicious activity. 4. AppSec measures for DAO.

Crypto-based ML models protection

@vixentael Who we are and what we want AI/ML-driven product
with unique IP. Paid feature. Server-side generates ML models, mobile-side executes them. ML models are unique per user, per app, per request (IML). Protecting them is crucial. leakage of IP loss of IP, competitor advantage, investments into updating ML model. Losing 1 IML is not a problem, losing many IML is. broken apps, clones apps, API fraud abuse of infrastructure, revenue loss, abuse of IP, competitor advantage, reputation risks

API @vixentael IML data fl ow user data GCE worker,
TF native iOS app native Android app GCP, storing IMLs training servers main ML infra generating IMLs

@vixentael Encryption layer API GCE worker, TF native iOS app
native Android app GCP, storing IMLs encrypts each IML stores encrypted decrypts IML decrypts IML

@vixentael Encryption scheme GCE worker, TF IML encryptedIML generation encryption
storage transfer + TLS transfer + TLS decryption re-encryption & storage execution encryptedIML encryptedIML IML IML encryptedIML IMLs are encrypted after generation using ALE, unique keys per each encryption. Transmitted using TLS. Then re-encrypted on device. github.com/cossacklabs/themis

@vixentael IML encryption & decryption GCE worker, TF 1. Generate
keypair. Send app.publicKey to backend. 2. Generate keypair. Use server.privateKey and app.publicKey to derive sharedKey (ECDH). 3. Generate random DEK. 4. Encrypt IML using DEK, AES-256-GCM. 5. Encrypt DEK using sharedKey, AES-256-GCM. 6. Send { encryptedIML, encryptedDEK, server.publicKey }. 7. Receive. Use app.privateKey and server.publicKey to derive sharedKey. 8. Decrypt DEK, decrypt IML.

@vixentael IML format { "data": base64_str(encrypted_IML), "key": base64_str(encrypted_DEK), "public_key": server_ephemeral_public_key,
"version": MODEL_VERSION, "layers": { // additional ML layers encryption } } speakerdeck.com/vixentael/cryptographic-protection-of-ml-models

@vixentael Hybrid Public Key Encryption (HPKE) datatracker.ietf.org/doc/draft-irtf-cfrg-hpke/ encrypt data with
symmetric key using AEAD; encapsulate symmetric key with public key scheme RFC describes approach used before and implies standardization.

@vixentael Supporting security controls API native iOS app native Android
app GCP, storing IMLs AuthN AuthN TTL crypto anti- RE AppSec crypto anti- RE AppSec API sec anti- fraud crypto crypto ACL logging monitoring AppSec GCE worker, TF

@vixentael Cloud storage security 101 1. IMLs are stored min
time – apps are expected to grab their IML quickly. 2. URL TTL (expire after mins). 3. URL authentication & access control. 4. Clean up IML fi les (every hour). 5. Do not backup IMLs. 6. URLs are not logged. 7. Monitoring of access errors. (also see OWASP WSTG-CONF-11) AuthN TTL ACL

@vixentael API protection 101 1. User authN, IMLs are available
only after successful authN. 2. API limits, requests throttling, fi rewalling. 3. IML request limits – after N model requests, server returns error. (also see OWASP ASVS :) ) API AuthN AppSec API sec

@vixentael Anti-fraud system 201 1. Limit access to IML based
on user behaviour. 2. Gather events from mobile apps and from server side. 3. Calculate user scoring based on events (“stop- factors”, rules). 4. User scoring: OK, suspicious, malicious. 5. Block malicious, limit suspicious. anti- fraud API

@vixentael Anti-fraud system 201 JB detected same public key, different
device invalid app signature remote device attestation failed 🛑 stop factors } URL download failure app reinstall too many requests keychain not accessible 🤨 implicative rules } wrong API version … honey token deviceID … malicious suspicious OK

@vixentael Remote device attestation developer.apple.com/ documentation/devicecheck Apple DeviceCheck developer.android.com/training/ safetynet/attestation
Android SafetyNet 1. Use as part of user authN. 2. Use as source for anti-fraud system. 3. Block apps installed not from stores.

@vixentael Read the full story! speakerdeck.com/vixentael/cryptographic-protection-of-ml-models Cryptographic protection of ML
models

CRDT & E2EE

@vixentael Who we are and what we want CRDT-based mobile-
fi rst product. Users create shared spaces and collaborate on visuals and texts together. Encrypt users’ data but allow them to collaborate. speakerdeck.com/ept/adapting-secure-group-messaging-for-encrypted-crdts Martin Kleppmann discussed other approaches 1 2 3 4 1 2 5 1 2 3 4 5

@vixentael CRDT log encryption strategy 1. The main problem –
how to reduce problem to a typical one. 2. We selected document-based encryption, not chat-based. 3. Encrypt payload or action + payload. Trade-off: the more server knows the faster are merges; the less server knows – the better security. 4. Use the same encryption key for all entries of the document. 5. Tricky part: “invite” and “revoke” users: • give users access to the Document Key (“invite”) by encrypting it for each user. • de fi ne key rotation period. • pre-keying, double ratchet – overkill.

log entries protection (e2ee) @vixentael

passphrase encryption hint encryption zeroing secrets secure key sharing auto-locking
timer failed attempts counter encrypted user settings log entries protection (e2ee) obfuscation anti-RE & anti-debugging good TLS @vixentael authZ / authN

@vixentael *ASVS ALE, E2EE crypto + security controls

Encryption is not that hard. Key management is a bit
harder.

Encryption is not that hard. Key management is a bit
harder. Crypto + key management + data fl ow control + security controls… Welcome to the real world :)

@vixentael Don’t hesitate to talk to me if you have
questions about data security and cryptography. Esp E2EE. vixentael.dev cossacklabs.com

Data is a new security boundary

Data is a new security boundary

More Decks by vixentael

Other Decks in Programming

Featured

Transcript