Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Case Studies: Modern Development Practices In Highly Regulated Environments

Charity Majors
September 28, 2023

Case Studies: Modern Development Practices In Highly Regulated Environments

I gave a talk at Fintech Devcon in August 2023 on why "Modern Development Best Practices Are Not Incompatible With Highly Regulated Environments." Since then I have been compiling a list of case studies of companies in regulated domains (health care, security, etc) who are practicing Continuous Deployment, observability-driven development, and other modern practices that combine to radically accelerate delivery via fast feedback loops. If you would like to add your company to the list, please get in touch!! Contact info on the last slide.

Charity Majors

September 28, 2023
Tweet

More Decks by Charity Majors

Other Decks in Technology

Transcript

  1. Modern software development practices 1.Engineers owning their own code in

    production 2.Practicing observability-driven development 3.Testing in production 4.Separating deploys from releases using feature flags 5.Continuous deployment (or at least delivery)
  2. Getting your code into production as fast as possible after

    writing it. FAST FEEDBACK LOOPS Modern software development practices are ✨ALL✨ about
  3. “Explain it to me like I’m five”: Regulations: you are

    subject to these if you operate under their domain, e.g. GDPR, CCPA, HIPAA, PCI/DSS, etc ✨Security✨ Frameworks: you may be audited to ensure you conform to these, e.g. SOC2, ISO 27001, NIST, FedRAMP etc Your security team has written policies for compliance with these, and your legal team signs contracts with customers.
  4. Frameworks & regulations are not prescriptive. None of them forbid

    any modern development practices. However, these practices MAY conflict with your own written policies. They might also conflict with terms in your own customer contracts.
  5. Policies are living documents. They should be subject to regular

    review and reconsideration. But! Do your security and legal teams know when to push back or loop you in? Contracts should be negotiated, not just signed. Engineering should have a say.
  6. “We’re a regulated industry. Therefore…” ❌ We can’t let developers

    deploy their own code due to segregation of duties ❌ All changes must be approved by a Change Advisory Board ❌ Trunk based development is not allowed ❌ No testing in production, or developer access to production ❌ You cannot log anything ❌ You must log everything, and cannot delete anything ❌ You are not allowed to use any SaaS, or multi tenant databases or compute ❌ You are not permitted to refactor your code ❌ Manual testing must occur before each deploy ❌ Auto-deploying your code is not permissible ❌ Auto-deploying is mandatory For more, see this thread ➡ ➡ ➡ https://twitter.com/mipsytipsy/status/1694163770753601887 How many times have you heard:
  7. ✨Bullshit.✨ All of that is Stand by for proof: a

    long list of case studies of companies who are auto-deploying, developing off trunk, getting code into production in a matter of minutes, etc. All of them are subject to the same regulations you are. Some of them may be your competitors.
  8. How Etsy did it (in 2013!!): • Decouple the cardholder

    data and PCI/DSS regulations from the rest of the system • The systems that form the cardholder data environment (CDE) are separated from the rest of Etsy’s environments at the physical, network, source code, and logical infra levels • The CDE is built and operated by an xfn team that is solely responsible for the CDE. Again, this limits the scope of the PCI DSS regulations to just this team. https://queue.acm.org/detail.cfm?id=3190610
  9. How Honeycomb does it: • Subject to privacy laws such

    as GDPR, CCPA, HIPAA (BAA) • Security framework adapted to SOC2 trust services criteria (confidentiality and security • Auto-deploys once an hour off trunk via a cron job. Extensive investment into tests. Takes about an hour for code to go live. • Practices trunk-based dev, short-lived branches, code reviews • Access Management policy based on least privilege model. Access to PII/prod data is limited to those with a business need.
  10. How Branch Insurance does it: • Regulated by 36 states

    and DC, annual SOC2s • Production data and envs mostly isolated from most engineers; only TLs can analyze production telemetry for PII purposes (despite masking and filtering and tokenizing) • Every developer has their own AWS account, massive investment in testing. Trunk-based development. • Uses serverless extensively; pushes to trunk many times/day, pushes to prod many times/week, in under an hour end to end.
  11. How Stytch does it: • Certified ISO27001 and SOC2 Type

    2, subject to GDPR, CCPA • Auto-deploys on PR merge with an average of 13 min before code goes live, approximately 30 times/week • Trunk-based development with optional on-demand preview environments for PRs. Extensive integration testing before merge! • Data access granted to people who need it for their jobs, with data auditing and masking to further ensure user privacy
  12. How Entrata does it: • Subject to A LOT of

    compliance audits, including PCI-DSS • Keeps PCI environment isolated on a separate private network, AWS account, GitHub org, etc. PCI codebase has no external deps, can be tested in isolation. Owned by a single eng team. • Can deploy a line of PCI-compliant code to production in 15 min • Code review before merging to main, then test on staging, cut a release to production branch, deploy to prod. Access to db, app servers is extremely limited. • 20 year old company; code originally written w/o unit tests
  13. How Ocado Technology does it: • Certified SOC1, SOC2, PCI/DSS;

    also subject to GDPR • Hundreds of apps in production, owned by ~200 teams • On average, code gets deployed to production every 3 minutes • Takes ~1 hour for code to get to production after a merge. Practices canary + rolling deploys over the course of 4-5 days. • Data access granted to people who need it for their jobs, with data auditing and masking to further ensure user privacy https://handbook.ocado.tech/#/sw-development/technical-standards?id=encryption-of-personal-data https://handbook.ocado.tech/#/sw-development/hallmarks https://handbook.ocado.tech/#/sw-development/maturity-model
  14. How ClarityAI does it: • Certified ISO27001 and SOC2 Type

    2, practiced a joint audit strategy to streamline time and resources • Some teams practice Continuous Deployment and deploy several times per day using trunk-based development, TDD, and pairing • Other teams deploy at least once per day using short-lived branches https://medium.com/clarityai-engineering/iso27001-and-soc2-type-ii-from-greenfield-to-success-24ca99decb26
  15. How Bankwest (Perth) does it: • Deploys to production within

    a few hours • Worked to get rid of the Change Advisory Board for most uses. First defined some types of changes as lower risk to avoid Change Approval processes, then worked hard to make almost every change fit those lower risk definitions. • Feature flags, separating deploys/releases, backwards compatible changes, API expand/rollout/contract, small releases deployed often, observability in production
  16. How Cabify does it: • Practices Continuous Delivery, deploys 1-6

    times a day, lead time for changes is 35 min • Certified PCI/DSS on the payments side, financial audit for the entire company • Feature flags, separating deploys/releases, backwards compatible changes, API expand/rollout/contract, small releases deployed often, observability in production
  17. How Ping Identity does it: • Certified ISO27001, SOC2 Type

    2 • Took about an hour to deploy • Auditors cared about what pipeline did, what gates there were, what controls we had. • Merge requests required approval from someone not the author, tests needed to run and pass, someone needed to approve before deployment
  18. How SALTO does it: • Certified ISO27001, working on SOC2

    Type 2 • Deploys several times a day • No one has access to raw data. If something must be checked against databases, it must be 1) requested, 2) approved by a manager, 3) run through a system that anonymizes data • Practices GitOps (TF, Flux2, k8s) to avoid manually writing to prod • Oncall and a few other people have read access to prod
  19. How Duffel does it: • PCI L1 compliant • Can

    get a line of code into prod in 30 min (!!!) • Deploys from trunk, runs static analysis as part of CI?CD • Mandatory PR review approvals from an accepted PCI group, which turns into a merge commit after approval. • Merge commit SHA is the source of a container image • Uses a lot of Security Command Center premium features for threat detection, vulnerabilities, time to resolution.
  20. How toplyne.io does it: • Certified SOC2 Type 2, subject

    to GDPR, HIPAA, and CCPA, working on ISO27001:2022 • Trunk-based deployment, manual PR reviews via GitStream • All teams deploy multiple times a day, and can deploy one line of code in <15 min • Platform engineering owns Security and Compliance • Multiple tests run for SAST and DAST in CI and during deploys
  21. How AudioStack does it: • Certified SOC2 Type 1, subject

    to GDPR; working on ISO27001 and SOC2 Type 2 • All security checks run automatically with GitHub and other tools as part of CI/CD architecture • Deploy takes about 30 minutes • Deploys to prod at least daily, after tests pass and a merge request has been reviewed and approved • Restricts access to data, least privilege access
  22. How Jack Henry does it: • Certified SOC1, SOC2, PCI/DSS;

    subject to FBA, state banking regs • 300+ different applications in K8s. 100+ deploys per day. • Column-level encryption on DBs allows devs to have read access to prod DBs (✨cool!!✨) • Code review before merging to main. Release gets cut and runs through user acceptance testing; approvals sent to stakeholders, deploy to production kicks off once approved • Takes about 30 min for code to get to production after a merge. All changes are canaried.
  23. How up.com.au does it: • Certified SOC1, SOC2, PCI/DSS; subject

    to Aussie banking regs • Deploys to production around hourly • Massively parallel, fully automated test suite spins up a replica of production in seconds, uses Rspec and Appium to run thousands of tests on every change • Takes around 20 min to run the full test suite, then decommissions replica. Can turn around changes to prod in minutes • Two-speed architecture lets us deploy changes constantly on the customer-facing side, and deliberately on the banking side.
  24. Stop blaming regulations and frameworks. This is all about how

    we decide to interpret the standards. ✨This is not their fault.✨
  25. We are all on the same side. ❤ This is

    about better security, too.
  26. We need engineers & leaders who understand the existential urgency

    of a short cycle time, and will fight for it. Not just once or twice. Every day.
  27. Hey, you! ✨Hi!✨ Do YOU work at a company that

    is subject to regulations and standards, but uses modern development best practices (continuous deployment, observability- driven development, fast feedback loops, auto-deploys, etc)? Would you like to be on a slide? ☺ DM me on twitter @mipsytipsy or email me at [email protected] and let’s do this! 🥰 You don’t have to be “perfect” (no one is). Let’s show the world just how doable this is!! ❤🔥 P.S. This is also GREAT for recruiting…just sayin’.
  28. For more, see my slides on “Why Compliance And Regulatory

    Standards Are Not Incompatible With Modern Development Best Practices” https://speakerdeck.com/charity/compliance-and-regulatory-standards- are-not-incompatible-with-modern-development-best-practices https://speakerdeck.com/charity or just go to: