Secure foundations for healthcare startups

Secure foundations for healthcare startups Goce Bonev — CTO, ThinkWeb
2025 Redacted public version

What I am going to talk about • A real
app for a healthcare startup • Challenges and dangers • Domain discovery • Infrastructure • Secure coding practices • Encryption, types, key management, real applications and challenges • Securing files • Logs and append only records

PHI / Digital health records

High level requirements • Secure infrastructure to run the application
• Reliable, scalable storage that can store many GB of data • Data encryption at rest and transit • Secure secret storage • Access controls • Backups • Monitoring • Audit logs

Healthcare data regulations • HIPAA (Health Insurance Portability and Accountability
Act) GDPR (General Data Protection Regulation) • Health Data Hosting (HDS) certification for French patients • HITECH (Health Information Technology for Economic and Clinical Health Act) • PIPEDA (Personal Information Protection and Electronic Documents Act)

• A minimum security checklist for enterprise-ready products and services
• Not a standard • Application security, not enterprise security • Simple to implement • Provide a good foundation • https://mvsp.dev/ MVSP - Minimum Viable Secure Product

• Detailed security architecture guidance providing secure coding checklists •
ASVS Level 1 is for low assurance level • ASVS Level 2 - minimum for applications that contain any sensitive data, recommended level for most apps • ASVS Level 3 - critical applications that perform high value transactions, contain sensitive medical data, or any application that requires the highest level of trust OWASP Application Security Veriﬁcation Standard

• What data ﬂows in the system • Sensitive data
mapping • Deﬁne processes • What data each process needs • Need to know basis • Data minimization • Data ownership Data / Process discovery and mapping

Domain discovery

Bounded contexts

Mindset Not, “if we will be hacked”, but “when we
will be hacked”

• Security ﬁrst vs security last • Security as a
core business requirement • Identify potential threats and vulnerabilities early in the development process Secure by design

The activity of identifying threads and ﬁguring out how to
mitigate them. PHI / Business risk OWASP Threat Dragon Threat modeling Image: Continuous Architecture in Practice: Software Architecture in the Age of Agility and DevOps, Murat Erder Pierre Pureur and Eoin Woods

Threat modeling example (STRIDE) Threat Type Mitigated Risk Mitigation Unauthorized
access to the database An attacker could access the database. Information disclosure Yes High Valid user/password is required to access the database. Database credential theft An attacker could obtain the credentials and use them to make unauthorized calls to the database. Information disclosure Yes High The database is placed behind a firewall and is only accessible from inside the ECS cluster. No access from public internet. Database MITM attacks An attacker could intercept the database queries and obtain sensitive information. Information disclosure No High All connections to the database require SSL. Database data theft An attacker or an insider can steal the database data. Information disclosure Yes High Personal and all sensitive data in the database is encrypted using envelope encryption, different DEKs per row approach. The DEK can only be decrypted by the KEK stored in AWS KMS… Message tampering / Fake messages could be placed on the queue Fake/tampered messages could be placed on the queue, resulting in incorrect processing by the service. Tampering / Spoofing Yes High All messages are signed by service that sent them and the receiving service checks the signature using the service public key. An attacker could upload malicious code An attacker could upload malicious code on the server. Tampering Yes High File uploads are disabled on application services. Container filesystems are read-only. MITM attacks Attacker can intercept requests between the user and the application or between the applications. Information disclosure Yes High SSL is used on all endpoints. Valid HSTS policy is present on the website. Microservice communication is secured with mTLS.

• On premise • Cloud Infrastructure

AWS Shared responsibility model

✔ Business Associate Addendum (BAA) ✔ Only HIPAA-Eligible services ✔
AWS Well-Architected Framework ✔ Architecting for HIPAA Security and Compliance on AWS Whitepaper ✔ https://aws.amazon.com/compliance/hipaa-compliance/ ✔ https://aws.amazon.com/health/providers/ ✔ https://aws.amazon.com/compliance/programs/ AWS and Compliance

RDS / SQL • Enable storage encryption (at rest, KMS
key) • Transport encryption ◦ CA (ex: rds-ca-ecc384-g1 / rds-ca-rsa4096-g1) ◦ REQUIRE SSL for all users • Enable backups, retention period (30-35 days) • No public access • Enable deletion protection

What can go wrong? • Leak patient information publicly •
Leaked / stolen credentials • Allow doctors to view information for patients who are not theirs (isolation of resources); BOLA (Broken Object Level Authorization) • Create / modify / delete cases of other doctors; BFLA (Broken Function Level Authorization) • Someone can see or modify a object property that they should no have access to; BOPLA (Broken Object Property Level Authorization = Excessive data exposure / Mass assignment) • Mix up patient records or prescriptions or expose sensitive health data in wellness apps • Complexity • So many other…

Treating doctors vs Backoﬃce Client portal • Access to his/her
own cases only • Limited access and functions • Scoped to his/her account • Limited impact • …. Backoﬃce • Access to all cases for all doctors • Accounting • Administrative features • Depending on the role • For all accounts • Huge impact • ….

Backoﬃce security • Network access ◦ IP Restriction / VPN
◦ mTLS • MFA - Mandatory • Roles (Principle of least privilege)

Broken Object Level Authorization Broken Function Level Authorization • Permission
checker in command / query handlers • Security context for commands (type of logged user / meta) • Teammate vs Client • Unit testing

Broken Object Property Level Authorization • Query handlers return view
models (DTO) • View models are reviewed and approved • Different view models for teammates / clients

✔ Encryption in transit ✔ Encryption at rest ✔ Key
rotation ✔ Encryption as a service ✔ Crypto-shredding Battle-tested libraries: https://github.com/paragonie/halite Encryption

Do not store the keys and data on the same
system! ▪ Version control ▪ Container images ▪ In the database* ▪ On the application server (env/memory) ▪ Encryption as a service Now add the following app vulnerabilities in the mix: ▪ SQL Injection ▪ Upload and execute code on the server Key storage

Encrypt the plaintext data with a data key and then
encrypt the data key with another key. Envelope encryption

Envelope encryption

1. Assuming the attacker can see the encrypted data and
the blind indexes (access to a full SQL dump) 2. The application server has access / can use the encryption service and blind index keys (blind / data keys stored on the server encrypted with KEK) 3. The database server stores the encrypted data and blind indexes (no keys are stored on the database server) 4. The attacker has can not upload or execute malicious code on the application server Encryption threat model

AWS KMS • Fully managed service • Symmetric / asymmetric
encryption • Envelope encryption • Key rotation / management • Compliance with regulations • Cost ◦ $0.03 per 10,000 requests ◦ $0.10 per 10,000 generated data keys

Implementation

✔ Encryption granularity that ﬁts your risk model ✔ One
DEK client (for all bounded contexts) ✔ One DEK per aggregate or bounded context / service ✔ One DEK per projection / stream / use case ✔ One DEK per database row Security, performance and cost

Image: DALL·E

Write model != read model Write / command model •
Data model is built for ACID writes • Patient data is encrypted with a different DEK per database row • Encrypted blobs contain a lot of different data elements • Optimized for reading one case / case element at a time • Ineﬃcient encryption model for building listings Read / query / view model • Listings containing personal or sensitive information • Only some of the elements from each entity are needed • Listings need to display information from different bounded contexts (case progress, notiﬁcations, payments) • Cost effective encryption model that supports listings

• Read only data model used for queries • Creatable
/ rebuildable from primary data models • Speciﬁcally created for a problem / question at hand • Eventually consistent or not (same transaction, async… implementation detail) • Stored in the same persistence as primary data or another (SQL, NoSQL, Elasticsearch, Neo4j, etc.) • Disposable Projections

✔ Blind index is created by applying hash functions and/or
key-stretching algorithms on plaintext, using a secret key ✔ Blind indexes for exact match search ✔ Blind index size – smaller has more false positives, bigger is vulnerable to leakage attacks ✔ Normalize the query (uppercase / lowercase or something else) ✔ Calculate the blind index from the normalized query ✔ Do a exact match on the blind index ✔ Database returns X results ✔ Decrypt the results to ﬁnd the one you are looking for Battle-tested libraries: https://github.com/paragonie/ciphersweet Searchable encryption / blind indexes

Search/sort by patient name in all cases belonging to the
same doctor (example) • Extract and group data by tenant / doctor and create a projection / helper view (ex: Get the patient names and case IDs for all active cases grouped by doctor) • Normalize data if necessary (encoding, case, etc…) • Store data in a single encrypted blob • Event driven updates • When searching decrypt the blob • Perform the search/sorting in memory and return the list of case IDs • Find the matching cases by ID from the DB or combine with other ﬁlters by unencrypted ﬁelds • Cursor pagination Searching and sorting encrypted data

Where do we encrypt and decrypt data?

File uploads • Treating doctors can add photos, x-rays, scans,medical
imaging and other health-related files • Orthodontists can add also do the same • Externally connected systems too • File size up to 4GB • Minimum retention period of case files (project specific) ◦ 3 months for unsubmitted cases ◦ 10 years for submitted cases

File upload risks • Execution of the uploaded file (Script
injection / Directory traversal attacks) • Resource exhaustion (CPU/Network) • Storage space exhaustion (DoS) • Malware uploads • Many more…

File names and metadata • Client provided names, ex: Goce_Bonev_Maxilla_20230217_1135.stl
• Never include sensitive data such as names or SSN in the file name Goce_Bonev_Sofia_Hayduska_Gora_37_SSN_123.jpg • Don’t use sequential or predictable names 123.jpg (enumeration attacks) • Standardize filenames with filename patterns (new CaseFilenamePattern(CaseId $caseId))->originalFile() • Use file IDs that are not sequential (UUID4, ULID) + checksum / signature 4iOgG1pA41D5maKytUqWe-RJf0XP.jpg • Always store the original file + checksum (data integrity) • Encrypt the extracted EXIF metadata (In transit / At rest) • Disable directory indexing

AWS S3 • Scalability, durability, and high availability • Security
features (access control, versioning, logs) ◦ Use only HTTPS endpoints (encryption in transit) ◦ Enable bucket encryption (at rest) ◦ Versioning • Presigned URLs • Object Lock, WORM (Write once, read many) • Scalable • Compliance with regulations • Cost

Standard upload ﬂow • Limited by the resources of the
web app, max file size we can process • We need to validate the file on the web app (resource usage) • Temporary upload file is stored on the local filesystem of the web app • File uploads are enabled on the server • Bottleneck • Scalability issues

Pros Supports large file uploads Autoscaling Low resource usage of
the main app Final destination bucket events (file overwrite lambda, deletions etc.. CloudWatch alarms) Versioning on TB allows checking if the token has been reused. TFPL checks if the file has more than one version and if it does triggers and exception. TFPL lambda can do validation of the filename (signature provided by the app) and the file contents. {file_id}_{signature}.{extension} Real file name is overridden when uploading to bucket, all files are uploaded as file.extension. The bucket does not need to know the real file name.

Standard download ﬂow • High data transfer through the app
server • Bottleneck • Scalability issues

Presigned URLs

Monolith vs microservices • Compartmentalization on service and database level
• Per service encryption keys • Reduced information exposure in if one service is breached (no access to data from other services – depending on the system design) • PoLP - Each service has access to only the permissions that it needs (billing has no access to the patient file bucket) • red • Complexity + new attack vectors

Non repudiation • Non repudiation - I did not do
that! • Append only records • Immutable – once written, records cannot be altered • Audit trail • Compliance requirement • WORM Storage • MySQL/MariaDB – Allow only INSERT, SELECT and disable UPDATE, REPLACE and DELETE for the app database user • MySQL/MariaDB Use the ARCHIVE storage engine

Logs / APM / Monitoring • Logs and logging solutions
• Error tracking and performance monitoring • Proper data scrubbing • Proper data anonymization

Data Anonymization for Testing Environments • Production data is highly
sensitive • Production data is sometimes needed for to ensure realistic testing conditions • Export function: extracts data, strips it of PII, and replaces it with fictional but realistic data • Comply with HIPAA and GDPR for data handling and anonymization processes • Limited access to the export function, audit logs

Continuous improvement

Let’s connect! gocebonev https://www.linkedin.com/in/gocebonev/

Secure foundations for healthcare startups

Secure foundations for healthcare startups

More Decks by Goce Bonev

Other Decks in Programming

Featured

Transcript