Compliance & Regulatory Standards Are NOT Incompatible With Modern Development Best Practices

@mipsytipsy Compliance & Regulatory Standards Are ✨Not✨ Incompatible With Modern
Development Best Practices

@mipsytipsy engineer/cofounder/CTO https://charity.wtf

“The Sociotechnical Path to High-Performing Teams” “It Is Time To
Fulfill The Promise Of Continuous Delivery” “Debugging Is A Team Sport” “On Call Does Not Have To Suck” “Testing in Production”

“Okay, could you talk about that stuff, but also explain
how and why we can do these things in a heavily regulated environment?” YES I can!

Modern software development practices 1.Engineers owning their own code in
production 2.Practicing observability-driven development 3.Testing in production 4.Separating deploys from releases using feature flags 5.Continuous deployment (or at least delivery)

Getting your code into production as fast as possible after
writing it. FAST FEEDBACK LOOPS Modern software development practices are ✨ALL✨ about

These practices, which have gone mainstream just in the last
five years, aren’t about being trendy or showing off on twitter. They represent thousands of people-years of research and experimentation into how to build better software. How well your team performs can make the difference between loving your job or hating it; an exciting career or stagnation; happy users or angry users; even the success or failure of your company.

Engineers owning their code in production • No dev/ops divide
• You write it, you are on call for it • You kick off your own deploys • Systems are becoming too complex for anyone to operate systems they didn’t write, or write systems they don’t also operate. #1 Practice

Observability-driven development • Instrument your code as you go •
After you deploy it, you go and look at it in production • Is it doing what you expected? • Does anything else look…weird? #2 Practice

Testing in production • Everybody tests in production… • …but
only some of us admit it. • Instrument your code. Get used to looking at it. • And not just when things are broken. Know what good looks like. • Close the loop by looking at your code after you deploy it, every time. #3 Practice

Separating deploys from releases using feature flags • The key
to reliable software is shipping smaller diffs, more frequently. • Using feature flags is how you do this. • Deploy continuously and flexibly. Roll changes out to users gradually, by groups, opt-in, etc. • Get your diffs out swiftly, while honoring scheduled release dates for product features. #4 Practice

Continuous Delivery (or even better, Continuous Deployment) • NO manual
QA, Change Advisory Board, or approval gates • We have an ocean of evidence that these do nothing to make software better, and in fact make software worse. • Deploy as fast as possible, • As automated as possible. • If you haven’t read it, read it: —> #5 Practice

Security: “Explain it to me like I’m five” (ELI5) Confidentiality,
Integrity, Availability “You must protect customer data” You must demonstrate that you have policies, procedures, and safeguards in place to protect customer data, and supply evidence you are actually following those policies, procedures, and safeguards. “You must protect your code”

✅ Frameworks: ✅ Written policies for how you are going
to comply with regulations (security team) ✅ Regulations: GDPR, CCPA, HIPAA, PCI/DSS, etc SOC2, ISO 27001, NIST, FedRAMP etc State banking regulations ❌ We are NOT fucking around with FedRAMP or state banking regulations in this talk. ✅ Contractual terms/DPAs for big customers (legal team) ELI5

Frameworks are typically very loose on the specifics. None of
them expressly forbid any modern development practices. However, they may conflict with your own written policies, the ones that are being used to demonstrate compliance. They may also conflict with terms in your own customer contracts. E.g. “People should not be able to see private data unless you have a business need to do so.” (but the definition of “business need” is left up to us) Like, “You need to be scanning your code for known vulnerabilities before it goes live”

Frameworks can be used to achieve compliance with regulations. Policies
are living documents. They should be subject to regular review and reconsideration. Contracts should be negotiated, not blindly signed. Is your security team reviewing contracts before signing them? Are YOU? Are you giving your teams guidance on where to push back? But!

Compliance standards exist for a reason. Our goal here is
NOT to avoid or evade them. The problem is that elaborate security theater makes us slower and less competitive, while also making us no more (or even LESS!) secure. Always honor the spirit of the control, when devising a solution. As engineers, we may be best positioned to find the solution that is actually secure, not only theatrically secure.

“We can’t have continuous delivery because …” Jez Humble, “Continuous
Delivery Sounds Great But It Won’t Work Here” DevOpsDays Seattle 2017 1. We’re regulated 2. We’re not building websites 3. We have too much legacy 4. Our people are too stupid Stated Reasons: • Our culture sucks • Our architecture sucks • We haven’t tried • We don’t care enough Actual Reasons: (borrowed from a Jez Humble slide circa 2017 👇)

1. We’re regulated 2. We’re not building websites 3. We
have too much legacy 4. Our people are too stupid But this is a solved problem. This was a solved problem a decade ago! Etsy, since 2013 Amazon Stripe HP firmware Branch Insurance   Jack Henry Moov Honeycomb US gov (!!) Some of your competitors You can be, too.

How Etsy did it (in 2013!): • Decouple the cardholder
data and PCI/DSS regulations from the rest of the system • The systems that form the cardholder data environment (CDE) are separated from the rest of Etsy’s environments at the physical, network, source code, and logical infra levels • The CDE is built and operated by an xfn team that is solely responsible for the CDE. Again, this limits the scope of the PCI DSS regulations to just this team. https://queue.acm.org/detail.cfm?id=3190610

How Branch Insurance does it: • Regulated by 36 states
and DC, annual SOC2s • Production data and envs mostly isolated from most engineers; only TLs can analyze production telemetry for PII purposes (despite masking and filtering and tokenizing) • Every developer has their own AWS account, massive investment in testing. Trunk-based development. • Uses serverless extensively; pushes to trunk many times/ day, pushes to prod many times/week, in under an hour end to end.

How Honeycomb does it: • Certified SOC2 Type 2. Subject
to GDPR, HIPAA, CCPA, state regs • Auto-deploys once an hour off trunk via a cron job. Extensive investment into tests. Takes about an hour for code to go live. • Practices trunk-based development, short-lived branches, code reviews • Access Management policy based on least privilege model. Access to PII/production data is limited to those who have a business need for it, i.e. need it to do their jobs.

Stop blaming regulations and frameworks. It’s all about how we
choose to interpret the standards.

Interpretations vary based on risk tolerance. Far too often, the
paperwork seems to matter more than the actual security of the implementation. ☹ The difficulty here is that every product, company, and architecture is sui generis, so we can’t apply cookie- cutter solutions — we need to actually understand each use case before we can negotiate a solution. Also, we are terrible about sharing the solutions we do find. Every situation is ✨unique✨

Architecture The biggest architectural obstacle to continuous delivery is when
you want to ship a single line of code, but you have to deploy the whole world. Can you deploy the service you’re working on without having to deploy all the dependencies? Can you test the service you’re working on on your laptop, without needing an integrated environment?

Architectural considerations: • Use a well-designed PaaS, if you can
• Design for testability and deployability • Invest heavily in your test suite • If you need to unbundle a monolith, do not rip and replace; redesign iteratively into services. • Make sure services have their own databases! • Bring security in to the discussion from day one.

In general, engineers shouldn’t need to be constantly thinking about
compliance. Mostly just when setting up a new thing, or when gathering PII — does this matter, and where should I put it? Engineering performance and productivity, on the other hand, should ALWAYS be on our minds. Entropy is constantly eating away at our efficiency.

If you want category-defining, competition-crushing engineering excellence, your engineering leadership
will have to engage with security and legal as partners. One thing is exceptionally clear:

We need engineering leaders who understand the existential urgency of
a short cycle time, and will fight for it. Not just once or twice. Every day.

“How well does your team perform?” != “how good are
you at engineering”

High-performing teams get to spend the majority of their time
solving interesting, novel problems that move the business materially forward. Lower-performing teams spend almost all their time firefighting, waiting on code review, context switching, rolling back, rolling forwards, reproducing tricky bugs, solving problems they thought were fixed, responding to customer complaints, fixing flaky tests, running deploys by hand, fighting with their infrastructure, fighting with their tools, fighting with each other, debugging merge conflicts, triaging failed deploys, debugging and reproducing problems for each other when the rest of the team can’t use the debugging tools adequately, waiting on CI/CD to complete, waiting on tests to run, waiting on the queue to deploy, re-running tests because they aren’t sure if the one that failed is a real failure or not, paging in a different project to work on while your other project is stalled… basically everything BUT making progress on core business problems.

🔥1 — How frequently do you deploy? 🔥2 — How
long does it take for code to go live? 🔥3 — How many of your deploys fail? 🔥4 — How long does it take to recover from an outage? 🔥5 — How often are you paged outside work hours? How high-performing is YOUR team? DORA metrics: https://dora.dev

It really, really, really, really, really pays off to be
on a high performing team. Like REALLY. 2019 numbers 2021 numbers

“Hire the smartest people you can find. Recruit from the
best schools. Aggressively poach as much talent from FAANG as you can.” How do we build high-performing teams?

Who is going to be a better engineer in two
years? An engineer on an “Elite” team 3000 deploys/year 9 outages/year 6 hours firefighting An engineer on a “Medium” team 5 deploys/year 65 outages/year firefighting: constant

Q: What happens when an engineer from the “elite” yellow
bubble joins a medium- performing team in the blue bubble? A: Your productivity tends to rise (or fall) to match that of the team you join.

Great teams make great engineers. ❤

Your ability to ship code swiftly and safely has less
to do with your personal knowledge of algorithms and data structures, sociotechnical (n) “Technology is the sum of ways in which social groups construct the material objects of their civilizations. The things made are socially constructed just as much as technically constructed. The merging of these two things, construction and insight, is sociotechnology” — wikipedia and more to do with the sociotechnical system you participate in.

Technical leadership should focus intensely on constructing and tightening the
feedback loops at the heart of their system. The smallest unit of software delivery is the team.

which brings us to… ✨CI / CD✨ 💜 Shipping is
the heartbeat of your company. 💜 Shipping new code should be as small, as common, as regular, as boring, as unremarkable as a heartbeat. and CI/CD is how we get there. Right? So … do YOU do CI/CD?!??

“YES! We do CI/CD.” …but do you really? “Well, we
have a Circle-CI account?”

Most people are doing *CI*… sorta … But CI is
only the prelude to the main course.   The ENTIRE POINT of CI is to prepare the path for you to do CD. Continuous Deployment Continuous DELIVERY? At least. Better yet,

If you aren’t going to hook CI up to production,
honestly, why even bother with CI? Just run your tests continuously in a shell loop from your laptop. Same deal, less hassle. ¯\_(ツ)_/¯ Once you merge your code to main, it should be automatically deployed by default. No manual gates. ✨One hour or less✨ Continuous Deployment is what will change your life. Continuous Deployment is what will change your life. Continuous Deployment is what will change your life. Continuous Deployment is what will change your life

P.S.: Fear of deploys is the single largest source of
technical debt in most organizations.

The speed, coverage, and cadence of your CI/CD pipeline will
set the high water mark for your team’s performance. The “You Had One Job” of engineering leadership is tuning the feedback loops of our sociotechnical systems. It can’t get any better or faster than that, but it can definitely get slower and worse downstream.

That precious interval of time between when you wrote the
code and when the code has been deployed is everything. wrote the code deployed the code This is the cornerstone of high performing teams.

At that moment when you finish solving a problem, your
mental state holds everything: your original intent, motivation, implementation details tried and tossed, tradeoffs, variable names, etc. This lasts for … minutes? hours? 😬 Until you move on to the next problem, maybe.

Which is why engineers can find upwards of 80% of
all bugs in that magical, fleeting interval, so long as they 1) have good observability tooling, 2), instrument their code and 3) go and look at it. Ask yourself: 🌟 is it doing what I expected it to? 🌟 and does anything else look … weird? A predictable interval of a few minutes lets you to hook into the body’s own intrinsic reward systems. Muscle memory. Dopamine hits! 🥰

https://deepsource.io/blog/exponential-cost-of-fixing-bugs/ The cost of finding and fixing bugs goes up
exponentially with time elapsed since development.

welcome to the software development death spiral. If it takes
you hours (or even days!) to get a single line of code out,

a longer interval between when code is written & deployed
leads to … larger diffs … longer turnaround time for code review … multiple changes getting batched up and deployed at once … makes it hard to identify whose code is at fault … which severs ownership of changes … and soon requires specialists to deploy, run, monitor, and debug … more and more engineering cycles are spent waiting on each other … now we need to hire more engineers, managers, TPMs, project managers … more people and teams incur more coordination costs … more time spent paging state in and out of your brain … which all costs MORE TIME …😱

large diffs, long review turnaround, batched up changes in a
single deploy, complicated outage recovery processes, bloated org, coordination costs, tool proliferation, too many teams, burnout, boredom, boilerplate, unhappy customers, competitive losses, too little time spent on core business problems… You can spend your life chasing symptoms and pathologies … Or you can fix it at the source. 60 minutes or bust.

A fast cycle time is an enormous competitive advantage. It
is worth taking up this fight. ☺ I have never known a company where engineers were happy and customers were unhappy, or vice versa. Users’ and engineers’ happiness tends to rise and fall in tandem.

“We can’t do this because of regulations…” Bullshit. Engineers can
be overly literal. You are interpreters between security, legal, and tech…not transcriptionists. YOU are the experts in your code. YOU are the experts in software development. YOU are responsible for resolving conflicting requirements from security, legal and dev.

Again: there is NO LAW or regulatory framework preventing you
from following modern software development best practices. None. Zero. Zip.

We are all on the same side. This is about
better security, not worse. Documentation is a HUGE part of what matters, so use this to your advantage. Document what you’re going to do up front, do what you say you’re going to do, then document that you did it.

Start small. Look for ways to demonstrate what you’re talking
about with small wins that benefit everyone. Come to understand their pain, develop empathy for them. Then help them understand your pain and develop some empathy for you. Start by… building relationships. Get to know your peers in security and legal. Understand the constraints they are working under. They are probably held responsible for a pile of nightmares that you have no idea even exists. ☠ This will take time…possibly years, at calcified organizations. And you won’t progress much without SOME cover from the top. Get anyone and everyone you can to read “Accelerate”. How to drive change in your org:

P.S. Learn this phrase: “Compensating Controls” “I’m not following the
letter of the law, but I have this other system that proves I’m following the spirit of the law”

Instrument for observability. Engineers shouldn’t need full production access; you
should be able to understand your software with just commit access and observability. Observability is what gives us the confidence to move swiftly, not blindly.

Good SLOs actually check multiple boxes for us. Executive visibility
into important numbers, monitoring, alerts, etc … instead of needing a different system for each one, SLOs cover many.

“How well does your team perform?” Your team’s performance is
defined by your sociotechnical systems, and especially by the speed of your feedback loops. It isn’t just about the security or economic arguments…

High-performing teams spend the majority of their time solving interesting,
novel problems that move the business materially forward. Everybody wants to be on teams like these. ❤

This is a quality of life issue. This is an
ethical issue. We must build high-performing teams that are low in toil and high in Autonomy, Mastery, and Meaning. This begins with keeping your intervals low and your feedback loops tight.

The End ☺

Charity Majors @mipsytipsy

Compliance & Regulatory Standards Are NOT Incom...

Compliance & Regulatory Standards Are NOT Incompatible With Modern Development Best Practices

More Decks by Charity Majors

Other Decks in Technology

Featured

Transcript