Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Write Code In Production (With Notes)

Write Code In Production (With Notes)

Teams building data-oriented applications against sensitive or large-scale information often face the problem of how to conduct discovery and work iteratively during development. Often, developers are cut-off from production data sets for security and operational reasons. But this doesn't have to be the case. You can build your system in such a way that analysts can work securely with production data, separate from customers, sharing their work, and performing quality control with real, live, production data. Save on operational overhead and complexity while maintaining security by moving your development environment to production instead of your production environment to your laptop.

This version includes speaker's notes.

Avatar for Sam Wilson

Sam Wilson

August 22, 2019
Tweet

More Decks by Sam Wilson

Other Decks in Programming

Transcript

  1. W R I T E C O D E I

    N P R O D U C T I O N S A M W I L S O N Today, I’d like to start a conversation about writing code in production, why I think it is important, and ways we can, as an industry, rediscover our roots—without sacrificing trust or adding unnecessary risk.
  2. W R I T E C O D E I

    N P R O D U C T I O N S A M W I L S O N ^some So, let’s be clear about something: I’m not completely out my mind, so if any of you came armed with a straight-jacket or pitch forks, I ask you to please hear me out. As with anything, there’s obvious limits, constraints, and nuance.
  3. w h o a m i • CTO @ Bainbridge

    Health • Optimizing Medication Safety and Stewardship for Hospitals • Payments, Education, Finance, Marketing Automation, E-Commerce, Media • @numbsafari A little bit more about me… I work for a spin-out of the Children’s Hospital of Philadelphia, Bainbridge Health. We help hospitals use their data to improve medication safety and stewardship. I’ve also worked in a range of other industries, with varying levels of regulatory oversight. Very often, my work has focused on the regulatory-adjacent aspects of the software engineering stack.
  4. “ B O R I N G ” P R

    O B L E M S • Getting paid, paying bills, compliance, security • “Boring” problems often become the truly differentiating and innovative part of a business. So, in my various roles, I’ve often worked on what is considered to be the “boring” part of most tech teams. You do all this work to build an amazing website or app, you launch it to the world, and people love it. But eventually you have to get paid, pay your bills, comply with contracts, standards, and laws. It helps if you also don’t get hacked or inadvertently publish everyone’s secrets. What I’ve come to find, in my own personal experience, is that an organization’s ability to innovate on the boring parts, is usually what makes the most difference in their eventual outcomes. If you can’t get paid, lower costs, or more efficiently deliver your services, you will ultimately fall behind.
  5. H T T P S : / / E N

    . W I K I P E D I A . O R G / W I K I / C O N F I G U R AT I O N _ M A N A G E M E N T # / M E D I A / F I L E : C O N F I U R AT I O N A C T I V I T Y M O D E L . P N G Lately, I’ve become really interested in Change Management. It’s a really cool topic when you are talking about things like Continuous Delivery, DevOps, and GitOps. It’s a super boring topic when you are talking about AICPA SOC auditing standards or ISO 27001. So, how can we innovate on Change Management to create competitive advantage?
  6. • Industry survey and market data analysis • Speed +

    Stability • DevOps How many of you have heard of this book? Came out in 2018. It is chock-full of cool stuff. Basically, folks at Puppet Labs organized the “State of DevOps” surveys, and coordinated with researchers to turn out what might be the most important quantitative work in Software Engineering, ever. If you haven’t had a chance to read it, I highly recommend that you do. Over successive years of surveying a growing number of industry practitioners, and combining that with market data, they were able to find that certain kinds of management and technical practices created competitive advantages.
  7. H I G H P E R F O R

    M E R S • Greater commercial success • Market cap growth, profitability, and market share • Greater non-commercial success • Effectiveness, efficiency, customer satisfaction The authors identified, in a fairly robust fashion with clustering analysis, three groups of organizations: high, medium, and low performers. The high performing organizations exhibited greater commercial, and non-commercial success, as measured across a range of metrics, such as financial growth and customer satisfaction.
  8. S P E E D + S TA B I

    L I T Y • Speed: Higher frequency of deploys • Speed: Lower lead-time to delivery (commit to deploy) • Stability: Lower MTTR • Stability: Lower change fail rate The high-performing organizations also excelled at certain management and engineering metrics. Most important to today’s discussion, they were found to be both faster and more stable.They deliver value in a more timely and frequent manner, without sacrificing reliability and availability. In fact, their speed and stability work hand-in-hand to create a virtuous cycle that, ahem, accelerates their businesses.
  9. H O W ? D E V O P S

    • Paraphrasing Wikipedia: integrating development and operational practices to shorten SDLC to deliver change faster and more often in close alignment with organizational objectives • Paraphrasing Twitter: automate all the things! How do they achieve this? Well, basically through what has come to be known as “devops”. They don’t all call what they do “devops”, but it’s a useful term here for a high-level conversation. The twitter-version of DevOps is really focused on the technical aspects. But, it’s important to remember the management component: lean principles. So, keep that in the back of your mind while we focus a bit on this technical call to arms.
  10. C O N T I N U O U S

    D E L I V E RY Continuous Delivery is the ability to get changes of all types—including new features, configuration changes, bug fixes and experiments—into production, or into the hands of users, safely and quickly in a sustainable way. http://continuousdelivery.com One important part of “automating all the things”, is bringing greater automation to your technical delivery processes. “Continuous Delivery” has become a popular goal in technical organizations. How many of you would say your organization does “continuous delivery”? And for those of you who didn’t raise your hands, how many of you are currently working to implement it? Now, there’s more to continuous delivery than simply automating your build process and having scripted deployments. But automating as much of the delivery pipeline as is possible, is critical to making this work.
  11. C O N T I N U O U S

    D E L I V E RY H T T P S : / / U P L O A D . W I K I M E D I A . O R G / W I K I P E D I A / C O M M O N S / C / C 3 / C O N T I N U O U S _ D E L I V E RY _ P R O C E S S _ D I A G R A M . S V G The goal of continuous delivery is to get your changes in front of users as quickly, and safely as possible. Continuous delivery doesn’t necessarily mean every commit magically rides into production, hand delivered by unicorns and robots. We still need to do testing. We still need checks and balances and user acceptance. We still need staged deployments.
  12. D TA P Development Testing Acceptance Production Each stage in

    the delivery pipeline has a different owner and a different purpose. I’ve mostly worked in organizations that have dev, test, and production environments. But I’ve also worked with organizations that had environments in the double digits—mostly due to training and auditing requirements—and an overabundance of middle management. There’s an acronym out there, DTAP, that covers the general idea and the bulk of what you want to accomplish. You want an environment where you can do development work. Be creative and experimental. Noncommittal and uncontrolled. Of course, you want a test environment, where you can try to identify any bugs before they get in front of customers. Maybe a little less common, but eventually this comes up for most of us, is some kind of demo or acceptance environment, where you can show some subset of your users what you are about to release to production so you—and they—can know what to expect. And, of course, you need a production environment. The real deal. Where the rubber hits the road.
  13. W H Y ? • To maintain stability. • To

    maintain security. • To maintain secrecy. • To maintain control. There are some obvious reasons. We want to avoid introducing defects by having rigorous testing and peer review. Most development environments are expected to have little or no availability. Ever tried giving a demo from a developer environment? It’s a recipe for disaster. Developer environments are typically devoid of access controls. And, besides, if your job is to edit the code, you can just modify the access control logic, right? So we want a wall between developers and sensitive resources. Does everyone remember the plot of the movie Office Space? The developers of some banking software added code that was supposed steal fractions of a penny from every transaction. In the real world, we want to avoid this. Other reasons… If your team is working on some secret new feature, you probably don’t want that exposed to users before it’s entirely ready. Or maybe you want to be able to introduce a new feature in a series of changes to help transition users to a new, less familiar paradigm. So you need a way to stage changes. Of course, this all boils down to control. Organizations want, and need, to exert control over changes.
  14. M A N D AT E D • Industry and

    Legal Frameworks • Explicit and Implicit Of course, besides being common sense, we are also mandated, through various industrial and legal frameworks to have change management practices that include separate environments. I say common sense, because most of these frameworks don’t explicitly state that you do this. Instead, they are written with their existence as a foregone conclusion.
  15. I M P L I C I T (DSI-05) Production

    data shall not be replicated or used in non-production environments. Any use of customer data in non-production environments requires explicit, documented approval from all customers whose data is affected, and must comply with all legal and regulatory requirements for scrubbing of sensitive data elements. (IVS-08) Production and non-production environments shall be separated to prevent unauthorized access or changes to information assets. Separation of the environments may include: stateful inspection firewalls, domain/realm authentication sources, and clear segregation of duties for personnel accessing these environments as part of their job duties. Cloud Security Alliance, Cloud Controls Matrix Here’s some requirements from the Cloud Security Alliance’s Cloud Controls Matrix, a decent security standard if you are building Software as a Service or Cloud-based applications. It doesn’t tell you that you need a production environment, or any other environment. It just tells you that if you’ve got anything other than a production environment, you need to keep your production stuff separate.
  16. I M P L I C I T (10.i) Test

    data shall be selected carefully, and protected and controlled in non-production environments. HiTRUST CSF 9.2 And here’s a requirement from the HiTRUST Common Security Framework, version 9.2. This one is interesting because, even though it’s targeted at the Healthcare industry, which is probably the most heavily regulated industry outside of defense, when it comes to computer security in the United States, it actually has a pretty lax requirement about the nature of test data. Not only is it implicitly telling you that you are supposed to have non-production environments, it’s also implicitly telling you that your test data should be sensitive in nature.
  17. I M P L I C I T (6.4.1) Separate

    development/test environments from production environments, and enforce the separation with access controls. PCI-DSS 3.2 Another implicit example, from the Payment Card Industry Data Security Standards.
  18. E X P L I C I T (BAI07.04) Establish

    a test environment. Define and establish a secure test environment representative of the planned business process and IT operations environment, performance and capacity, security, internal controls, operational practices, data quality and privacy requirements, and workloads. COBIT5 On the explicit side of things, here’s a requirement from COBIT5. This one is pretty clear: you must have a test environment.
  19. G O L D E N R U L E

    If we go back and review all of those standards, more important than the existence of these different environments, is the idea that the data from production shouldn’t be copied around and that there should be separate ownership and access controls for each environment. How many of you have a process that copies production data to your test environment so you can run load tests, verify that your backup process works, or test your data migrations? Well, we fill out a lot of security questionnaires, for customers and investors, and every single one of them wants to know if we are doing this. Very often, this question is in bold and can't be skipped. They care because they’ve all seen Office Space. They care, because they have to follow HIPAA and GDPR. They care, because they know that this golden rule is one of the first things developers throw out the window when they get started.
  20. P H O T O B Y U LV I

    S A FA R I O N U N S P L A S H Wait… what? I thought we all agreed a few minutes ago that having all these environments, with all these controls was common sense. And yet, time and again, developers first instinct is to throw all that control?
  21. H T T P S : / / W W

    W. A P P L E M U S T. C O M / A P P L E - C O N T I N U E S - T O - I N V E S T- I N - S I N G A P O R E - I N D O N E S I A - C O D I N G - TA L E N T / I think one reason for this is because, when we learn how to write software, we don’t have a staged release environment. We don’t learn how to write software in the context of controls. We typically learn how to code in a tightly iterative fashion. We learn using REPLs, or tools like Swift Playgrounds.
  22. H T T P S : / / T W

    I T T E R . C O M / D O S N O S TA L G I C / S TAT U S / 8 0 8 1 0 6 1 2 4 1 3 8 4 7 9 6 1 6 / P H O T O / 1 If you’re a little older, like me, then maybe you learned how to program using a BASIC interpreter.
  23. If you’re into “data science”, then perhaps your first experience

    learning to code happened with tools like Jupyter Labs, Notebooks, or maybe R Studio.
  24. Developer User In school, our experience is something like this.

    We write some code, we show it to our “user”, they give us a grade.
  25. Developer User But when we get into the real world,

    things start to look more like this. We write some code, it goes through layers and layers of teams, environments, and processes before it ever finds its way to a user. And user feedback… well… it follows some kind of whisper-down-the-lane process until it shows up in a JIRA ticket or GitHub issue.
  26. B O O T S T R A P P

    I N G How can I build a report without any sample data? The requirements are in the data. Another common reason why developers immediately drop this golden rule, is the bootstrapping problem. If we are supposed to build a report, how do we do that without having some kind of sample data? Very often, nobody knows what kind of report to build. We actually need to dive into the data to discover the requirements. How do you do this without real data?
  27. S A M P L I N G • Subsets

    (horizontal sampling) • De-identification (vertical sampling) One approach, a common approach, is to sample the data. As the HiTRUST framework guides us, we pick our test data carefully. We filter out PHI, PII. Or we hash it. And maybe we don’t copy all the data, but some subset of the data. This approach still has its drawbacks. What if PHI or PII leak out in unexpected ways. Say, through unstructured notes or unexpected schema changes. And what if our subset doesn't include enough data to address calendar issues, like end-of-year or leap year concerns?
  28. S C A L I N G • Laptops aren’t

    datacenters • Why would we want data all over everyone’s laptops? • What surprises are missing from the sample? Another problem with this approach is that developer laptops just aren’t the same thing as production. A more subtle problem with this approach is that, when we are doing exploratory work, we will make decisions and judgements based on the subset we are examining. This may lead us to make product or design decisions that fall apart when they encounter the full production data set.
  29. Why is the report treated the same as the rest

    of the application? Why can’t you do proper exploration using your production application? I think, fundamentally, the problem we create for ourselves is that we conflate the core application and the report, or business logic, embedded within it. We fail to build our application such that it is capable of supporting our exploration and analysis in a flexible and comprehensive manner. So we are forced to revert to our development tools on our local hardware.
  30. What if we took you to production, instead of bringing

    production to your laptop? So… what if we flipped this arrangement. What if, instead of copying data out of production, we instead embedded a development environment in our production application?
  31. C U S T O M I Z A B

    I L I T Y • Macros • Plugins • Extensions • “Internal Reprogrammability” • Web Pages, Emacs, ERPs, EHRs Lots of the programs and tools work this way. Fire up your IDE. You can add various plugins and extensions to expand its capabilities. You can embed macros in Office and GSuite documents. You don’t need access to Google or Microsoft’s build and delivery pipelines in order to create extensions. Those facilities are built in and user accessible. Go to any web app, and you can open the console and start interacting with the javascript, the DOM, the whole local state of the application. Most ERP and EHR systems are delivered in source form, so enterprises can directly customize the source. Emacs is like this with LISP. Martin Fowler calls this “internal reprogrammability”. Where the core of an application is written in such a manner that it is exposed for modification by the user. Within the shell of the application, we can access and edit all of the business logic, while using that same business logic.
  32. “ D E V E L O P M O

    D E ” Most web app boilerplates include an “admin” console. Perhaps they should also include a “develop” console. When thinking about the web applications, lots of frameworks have some kind of “boilerplate” or template. Generally, they include some kind of “admin” console that you can log into to perform special tasks. Like adding a new product, or looking up an order. Or granting a permission. But what if we included a “develop” console that had a small IDE, or a full IDE, or just exposed a git repository you could edit locally and push to?
  33. S A N D B O X E S A

    P P L I C AT I O N S H E L L I A M U I D ATA D E FA U LT B U S I N E S S L O G I C S A N D B O X E S E X P E R I M E N T 1 E X P E R I M E N T 2 Of course, you can't just let someone modify the whole application. That would defeat the purpose and we’d get into all sorts of trouble. So we need to divide our application into the parts you can and cannot change. Unless we have a lot of resources, we probably can’t just let you upload any old code you want. We’ll need to somehow constrain what that code has access to, and what it can do with that. So what if you could create a branch, and add some code changes to it, push those up—all in production—and then have a toggle to switch your experience to that branch, or a URL parameter so you could share that branch with a customer?
  34. C H A L L E N G E S

    • Maintaining Security and Integrity • DoS / Resource Exhaustion • Bugs • What about … change management? Here are some of the challenges we would need to address with this approach. Obviously, whatever code you write can’t be allowed to circumvent the built-in access controls. You can’t be allowed to introduce a back-door or otherwise exfiltrate data. We need to make sure that you can’t accidentally take down production through resource exhaustion, like an infinite loop or leaked file descriptors, etc. Of course, we want to keep you from creating unexpected bugs that could crash the whole site or app. And… what about change management? If you are editing live in production, won’t that look really funny to end-users?
  35. S T R AT E G I E S •

    Functional Core, Imperative Shell • Declarative Programming • Immutable Data • Traffic Routing and Service Meshes So, here are a few interesting strategies I think one could employ to work around these issues. Let’s quickly tackle them one-by-one
  36. F U N C T I O N A L

    C O R E , I M P E R AT I V E S H E L L • Phrase coined by Gary Bernhardt of Destroy All Software • https://www.destroyallsoftware.com/talks/boundaries • The application core is implemented in purely functional logic. It makes no mutations. • The application shell cannot be modified in production, so the dangerous parts are isolated. Like I said before, we have to divide the application into two parts: the part you can change and the part you cannot. If the part you can change also has the constraint that it is written in a purely functional manner, then it is easy for us to enforce controls on what it can and cannot do.
  37. D E C L A R AT I V E

    P R O G R A M M I N G • SQL • Configuration files • Business Rules • Tell the shell what to do, not how it should do it. Another useful strategy is to use declarative programming. Instead of making the editable core of your application written in, say, Python, it could instead just be a bunch of SQL or configuration files and business rules.
  38. I M M U TA B L E D ATA

    • Make data pipelines non-destructive • Leverage tools like Pachyderm (pachyderm.io) • Also allows you to quickly promote changes to “full production” because they will have been built on a branch and verified first. • Also helps with data provenance If your data is immutable, that is, from the perspective of the editable core of your application, then you can keep it in an isolated lineage and not interfere with other branches or “full production”.
  39. A L I A S I N G R AW

    D A I LY D A I LY ’ D A S H B O A R D ? An interesting challenge arises with this approach, which is that you end up having an aliasing problem. For example, if I have a simple pipeline from a raw data table into a daily summary table of precomputed metrics, but I customize that transformation on a special branch, then I’ll end up with two versions of that daily table. Depending on what branch the user is viewing, dependent objects, like dashboards, notebooks, or down-stream transforms, will need to know which one to look at. So you may need a templating layer on top of your SQL, and you may need to add custom client libraries to things like Jupyter notebooks, so that they will be pull from the right data source for the authenticated user.
  40. T R A F F I C R O U

    T I N G • Associate the selected branch with the user’s authenticated session • Route their requests based on that attribute • If using a service oriented architecture, a service mesh may be helpful to ensure full isolation Of course, if we are going to offer our users the ability to modify the environment, we still need a way for them to switch between different branches. In a simple monolithic application, we can accomplish this with pretty simple header used to lookup different configurations. But in a more complex service-oriented architecture, we will need to have the ability to ensure that that request context is maintained between services. In addition, if we are truly concerned about isolating these customizations, we may want to spin up dedicated service instances.
  41. P R I O R A R T • Looker

    (BI) • CMS (WordPress, SquareSpace) • Marketing Automation (Optimizely, Monetate) Of course, I didn’t come up with this on my own, and it’s not like this hasn’t been done before. You probably interact with systems like this every day without thinking about it. I’ve had a lot of success using the Looker BI tool, which very much works this way. And many CMS tools have similar capabilities for staging content changes. Of course, I think the best example of this is Marketing Automation tools, like Optimizely, or one that I’m familiar with, Monetate.
  42. PA A S ? • Shares a lot of similarities

    • You (probably) shouldn’t build your own Pass • Doesn’t mean you can’t have extensibility in your own application. How is this different from a Platform As A Service offering? Well… they’re similar in spirit. But PaaS are much more general. Typically you can deploy any kind of code, and most PaaS don’t generally let you switch between versions of a deployed app directly. The approach I’m recommending is more targeted to your application.
  43. S TA R T S M A L L •

    Feature Flags • Business Rules • Marketing Automation So, if you want to experiment with this approach, with writing some code in production, how can you get started? I recommend looking at how you configure your feature flags and business rules. Do you have a way to stage rolling out feature flags or business rule changes? Of course, you can also throw Optimizely or Google Tag Manager on your site and create targeted “tests” to roll out changes.
  44. C O N F I G U R AT I

    O N I S C O D E • Lots of apps have a database table with “configuration” • Make it configuration in your existing app and pipeline • Move it to its own git repository • Then put that in production! Once you’ve played around with this idea a little bit at small scale, I think the next logical step is to look at how you are storing “configuration” in your application. I bet many of you have a database table in your application that contains “configuration”. These are the largely static, non-transactional parts of your data model. Go through a thought exercise around moving this data out of the database, and into a git repository. What could that look like? You can probably start with moving this data into configuration files in your existing deployment pipeline. Then move that configuration to its own git repository. … and then put that in production. And now you’re coding in production.