ComplianceOps: Containers in regulated environments

ComplianceOps: Containers in regulated environments

Container Days Boston 2016, an overview of using container technologies in regulated environments. Discusses secure configuration practices, mental models for navigating regulations, and practices to enable secure delivery of polyglot software.

E50d396533a9455ba01a4827868598e9?s=128

Elliot Murphy

May 24, 2016
Tweet

Transcript

  1. CONTAINERS IN REGULATED ENVIRONMENTS Elliot Murphy @sstatik elliot@kindlyops.com Goal is

    an overview, launch point to help you be successful with using some very nice tools in a regulated environment. Delighted to see Jeff talking in detail about Vault, and the talk about Clair. * concepts and mental models that are helpful when trying to survive a regulated environment * containers as a virtualization technology * containers as packaging and deployment technology
  2. CAN I USE CONTAINERS? Docker or Rkt? Alpine or RancherOS?

    Kubernetes or ECS? AWS or Google Cloud or Azure? Usually asking a more specific question.
  3. THE ANSWER IS ALWAYS YES containers? mobile phones? pencil and

    paper? database technology from the 1960s? The answer is always yes, you can use any given technology with any given regulatory environment. The question is, how does X affect my controls. Strengthen? Weaken? Call for completely new controls?
  4. CONTROLS Technical controls Administrative controls Example of a technical control

    would be you must use encryption when transmitting data, you must authenticate users before exposing any data, you must have roles that define which data can be accessed. Example of administrative controls would be your firewall logs must be reviewed daily, you must screen personnel prior to hiring using background checks, you must monitor your service providers and vendors for their compliance status. Given a set of controls, some technology is a good fit and some is a bad fit. Containers are getting there and are here to stay. As soon as you start talking about controls, you run smack into people problems.
  5. IGNORANCE Not knowing what the rules are or not knowing

    how the technology works. This one is sad, and it happens at all levels: legislators, managers, regulators, auditors, technologists. Regularly talk to and hear about companies breaking the rules because of pure ignorance. One company was processing healthcare data using AWS RDS PostgreSQL and didn’t know that PG is currently excluded from the AWS BAA. One doctor doing drug research was not following the rules for CFR 21.11 about accuracy, reliability, integrity, authenticity of records, a bunch of drug research was thrown out.
  6. OPTIMISM BIAS The person who will do my audit doesn’t

    understand technology as much as I do This regulation is wasteful and enforcement is lax Assuming that insiders (developers, managers, system administrators) are honest Maybe a set of regulations doesn’t specifically address or account for a new technology invention - happened recently with PCI 2 and the invention of javascript based credit card processor integration with client side encryption. When PCI 3 came out it specifically addressed sloppy practices by customers of Recurly and Stripe. Sure regulatory requirements have cost: what you get for that cost varies, but it is generally intended as additional resilience against failures. Running load balancers and clusters is also wasteful. Optimism bias is a nice trait of people, probably crucial for the survival of humanity, but it can interfere with sound reasoning about a regulated system.
  7. LACK OF SYSTEMS THINKING Reliability of a component vs the

    safety of a system Safety as an emergent property Group cognition The right way to work must be the EASIEST way to work If you design a bunch of reliable components, surely when assembled into a system they will be safe? Nope. Safety is a property that emerges from interactions given a specific context. It is totally possible for a system to be safe but unreliable. Very commonly held ideas about causality, root causes, are totally wrong. Safety and reliability often conflict! Is safety part of the mission, or merely a constraint? Accidents are complex processes involving the entire socio-technical system. Often mental models contribute to human error. Even more complex when consider that it is not just a human making a decision, it is multiple humans networked together with multiple cognitive devices (other brains, other computers) and often no one person can comprehend the entire system as it runs. Example of group cognition is the navigation team for a ship, or the engineering team for any modern application that you might want to…run in containers.
  8. INVENTION, REGULATION, ENFORCEMENT CYCLE Time delay between technology availability and

    update of regulations Sometimes the laws stay the same but the interpretation and enforcement changes • Eventually technology is refined to make compliance easier • Castle, D., Kumagai, K., Berard, C., Cloutier, M., & Gold, R. (2009). 
 A model of regulatory burden in technology diffusion: The case of plant-derived vaccines.
 http://www.agbioforum.org/v12n1/v12n1a10-castle.htm Many examples in our careers of technology leapfrogging regulations: introduction of networks, explosion of the web, explosion of mobile phones. In 2011 joint commission ruled that it is not acceptable for docs to text orders for patient care, services, or treatment. In May 2016, joint commission revised it’s position allowing secure texting for transmission of orders, and defined characteristics of a secure texting platform (based on review of industry developed technology) PCI DSS 3.0 updated in 2014, look at SAQ A for card-not-present merchants with all cardholder data functions fully outsourced. PCI DSS 3.0 section 2.2.1 specifically talks about virtualization, one primary function per server to prevent functions that require different security levels from co-existing on the same server (web, DB, DNS on different servers) Interesting example of trying to model out different approaches, this paper discusses 3 models for vaccine development, production, and distribution with varying regulatory burdens and tries to model the impact on disease for a given population with each approach.
  9. DO A GOOD JOB Why do regulations exist? Safety, harm

    reduction, risk management Don’t lose sight of the big picture Some developers have a selective allergy to cost. Don’t do that, be willing to invest the same energy that goes into optimizing, debugging, inventing cool things. Don’t become so obsessed with checking off the boxes that you lose sight of the big picture. For example, sometimes folks working on HIPAA become so focused on the obligation to protect privacy that they forget about the patients right to disclose. Sadly some corporations mis-apply regulations in an attempt to justify anti-competitive behavior and obstruct data sharing that would result in better, cheaper, safer patient care.
  10. READING LIST • https://mitpress.mit.edu/books/engineering-safer-world Nancy Leveson • https://mitpress.mit.edu/books/cognition-wild Edwin Hutchins

    • https://www.hks.harvard.edu/fs/msparrow/Publications--Books-- Character%20of%20Harms.html Malcom Sparrow • http://www.tempobook.com/ Venkatash Rao Nancy Leveson discusses safety, causality, and a model for safety. Fascinating analysis of accidents and the entire socio-technical systems involved. Edwin Hutchins work on group cognition is amazing, case study of a navy team operating a ship and how computations are performed in the group as a weird sort of distributed system. Character of harms is interesting, takes a rather adversarial approach focused on mitigation of bad things and is worth reading to understand the mindset of an auditor or regulator and temporarily snap you out of optimism bias. Tempo book is about narrative driven decision making, and is incredibly helpful when deciding how to engage and interact with the various authorities - I have to imagine some of this was going on with the vendors that spearheaded the work to get the ruling reversed on secure texting.
  11. CONTAINERS AS VIRTUALIZATION TECHNOLOGY Distinguish between infrastructure or execution concerns

    and the application management and configuration concerns. Excellent overview from Randy Bias talking about VT-x, hypervisor security, paravirtualization http://cloudscaling.com/blog/cloud- computing/will-containers-replace-hypervisors-almost-certainly/ Another way to put it is running the containers vs building the containers.
  12. UNDERSTANDING AND HARDENING LINUX CONTAINERS Important paper from NCC Group

    published in April 2016 https://www.nccgroup.trust/globalassets/our-research/us/ whitepapers/2016/april/ ncc_group_understanding_hardening_linux_containers-10pdf/ Covers Docker, LXC, Rkt with specific hardening recommendations Managing security artifacts such as secrets, keys, passwords Will hit a few of the key areas to think about from the paper, but it’s far too detailed to cover in a single talk, there are specific hardening recommendations for these three container engines. Also talks about managing security artifacts - don’t put passwords and keys in your source tree! Don’t put passwords in your docker files! Environment variables still carry a level of risk. Use Vault.
  13. CONSIDER THE RULES FOR YOUR ENVIRONMENT Isolation from different types

    of containers? Isolation from other tenants? Updates of host systems? Use a host distro that was designed with containers in mind: CoreOS, RancherOS, AtomicHost As you select and configure your orchestration layer, do you have specific requirements to separate different types of containers from each other? Does your scheduling layer allow you to express those constraints and then enforce them? (i.e. DB container can’t run on the same host as the web app containers). Does your environment prove sufficient isolation from other tenants? For example, in AWS to be HIPAA compliant you have to use dedicated EC2 instances - you can run containers on those, but you can’t use Elastic Beanstalk. How will you update the host systems? If you are using a pre-cloud distro, how will you handle rebooting the container hosts when needed (kernel updates). How about hardening the host? Recommend using a bistro that was developed with container hosting in mind: CoreOS, RancherOS, AtomicHost
  14. SECURE TRAFFIC TO/FROM CONTAINERS Encrypted from load balancer to container?

    Encrypted from container to database? Encrypted from container to message queue? Message queue durable storage on encrypted disks? Other services? Who will run your internal CA? This is an area that has been poorly documented for a long time. Vault can be your internal CA and you should use it!
  15. LOGGING Do logs contain protected info? https://fpf.org/2016/04/25/a-visual-guide-to-practical-data-de- identification/ Do you

    need to make logging or audit trails tamper resistant? Does your logging system provide support for automating the detection and alerting of key events to reduce your administrative burden? SumoLogic stands out as a particularly useful vendor (will sign a BAA), many competing options available here for collecting application logs.
  16. INTRUSION DETECTION & MONITORING Time for security team to learn

    some new tricks, ossec and auditd don’t fully support container environments and are harder to configure threatstack.com http://sysdig.org/falco datadoghq.com Sysdig Cloud Just like with the distros, don’t use a tool that was designed before containers, better options are available that will greatly simplify your life. Threatstack is great, container aware, gives you compliance reports specifically tied back to chapter and verse of particular regulations, alerts on specific types of activity. falco new “Behavioral Activity Monitor with Container Support”, describes itself as an easy to use combo of snort, ossec, and strace.
  17. MALWARE DEFENSE Traditional antivirus is widely mocked as ineffective or

    actively harmful AWS reference architecture for PCI-DSS 3.0 completely ignores requirement 5: “Use and regularly updated anti-virus software or programs” (on servers) strongarm.io This is an area where we get eye-rolls and derision instead of thinking about how to make a responsible choice in line with the spirit of the requirement and what would be the best way to address this risk in the new environment. Boston startup over in wakefield that I like (and work with) uses DNS as control point and interrogates the malware as well as alerting on infection. DNS can be injected via VPC DHCP options.
  18. CONTAINERS AS PACKAGING AND DEPLOYMENT TOOL http://rhelblog.redhat.com/2016/05/18/architecting-containers- part-5-building-a-secure-and-manageable-container-software- supply-chain/ https://blog.acolyer.org/2016/03/30/diplomat-using-delegations-to-

    protect-community-repositories/ CPAN, PyPI, RubyGems, PHP PEAR, NPM, go get, cradle Consider the provenance of all the source code in your container. Interesting perspective from Scott McCarty on modeling your container contents as a supply chain. Other interesting work being done is Diplomat, looking at how to protect community repositories
  19. CONTAINER SCANNING https://github.com/OpenSCAP/container-compliance RedHat atomic scan http://developers.redhat.com/blog/2016/05/02/introducing- atomic-scan-container-vulnerability-detection/ Docker Cloud

    security scanning https://docs.docker.com/docker-cloud/builds/ image-scan/ Docker best practices checking https://github.com/docker/docker-bench-security CoreOS clair static vulnerability analysis: https://github.com/coreos/clair https://www.twistlock.com/ atomic scan defaults to OpenSCAP but can add other scanners docker cloud image scanning will be a talk on Clair tomorrow Twistlock is another offering that offers specific support for achieving compliance Saw aqua in the hall, also help with image assurance
  20. CONTAINER REGISTRY AWS ECS Registry Google Compute Engine Container Registry

    quay.io docker cloud VMWare Harbor quay.io from the CoreOS folks integrates Clair docker cloud has security scanning, I think it costs extra VMWare harbor has RBAC and auditing of image changes but no scanning many other registries Ideal is to use something like quay.io for your base images, consider your attestation and signing requirements and where those will be enforced. The cloud specific registries are appealing because they are convenient.
  21. SCANNING YOUR APP CODE Add static analysis of your app

    code to your build pipeline https://codeclimate.com/engines https://www.blackducksoftware.com/products/hub Run additional checks after build and before production http://gauntlt.org CodeClimate checks for code smells, security vulnerabilities, some security issues can be found via static analysis, is extensible with additional rules. BlackDuck maps known vulnerabilities, some overlap with the container scanners we talked about recently. Gauntlet can run interactive checks against your running software during acceptance test phase, for example even if you were implementing in a language where you didn’t have a static analysis scanner you could use gauntly runs against your running app, it can find SQL injection holes, check for HTTP headers (based on mozilla secure coding standard) Containers make setting up this kind of infrastructure much much easier even when dealing with polyglot applications.
  22. DOCUMENTATION Recording of configuration changes is more likely to happen

    in a container environment because version control. Don’t forget to record the rationale for the changes! https://aws.amazon.com/about-aws/whats-new/2016/05/pci-dss- standardized-architecture-on-the-aws-cloud-quick-start- reference-deployment/ Don’t make your commit message something stupid like “compliance LOL” Create a document mapping specific design decisions back to a particular regulation. This will empower future developers working on the project to change, improve, or drop obsolete configurations given future changes in context. A great starting point is the spreadsheet that amazon published on Monday May 23 to accompany their reference PCI-DSS 3.0 architecture, the spreadsheet maps each requirement back to a specific design feature. Notice that a bunch are blank, they are not accounted for in the infrastructure level design, you should fill those in with your application level choices.
  23. GOOD STARTING POINTS Amazon ECS running on Dedicated Instances Mesosphere

    DC/OS on Microsoft Azure Tectonic: Kubernetes, Rocket, CoreOS on AWS or packet.net Google Cloud Platform: Compute Engine, Cloud SQL, Kubernetes images. We’ve covered a LOT of ground, where to get started? Taking HIPAA compliance as one example, here are a few totally reasonable options that will not paint you into a corner with compliance ECS - you can’t use Elastic Beanstalk Microsoft - can’t currently use Azure Container Service, kudos to MS for actually publishing their BAA to the public, unlike Amazon, Google. packet.net is bare metal, full TPM story down to the firmware attestation) Google Cloud it’s unclear to me whether Container Engine is covered because it runs on Compute Engine.
  24. QUESTIONS? Elliot Murphy @sstatik elliot@kindlyops.com If you are working in

    a regulated environment I would love to talk to you afterwards, please come say hello.