Securing Containers the Netflix Way

938bca9547ba1cac3e69d80efd67fe6b?s=47 Bryan Payne
January 25, 2017

Securing Containers the Netflix Way

This talk was presented at the Container Security Summit 2017.

Security has been a first principle at Netflix as we created our internal container service, known as Titus. Along the way we learned many places where our security needs aligned with others in the community. But, perhaps most interestingly, we also learned about where we differed. This talk will take you on a tour of how we have handled container security at Netflix from threat modeling to container isolation and identity. You’ll see how we built containers to integrate seamlessly into our development and production ecosystem. Finally, we discuss what we see as the open security challenges for large-scale container deployments.

938bca9547ba1cac3e69d80efd67fe6b?s=128

Bryan Payne

January 25, 2017
Tweet

Transcript

  1. Securing Containers the Netflix Way Bryan D. Payne Engineering Manager,

    Product & Application Security
  2. None
  3. Netflix by the Numbers • 93M+ members • 1000+ developers

    • 190+ countries • > 1/3 NA internet download traffic • 500+ microservices • 100,000+ virtual machines • 3+ AWS regions
  4. Containers at Netflix

  5. 2008 2016 2012 AWS EC2

  6. Why Containers? Velocity! • Improved developer experience • Improved system

    resource management • Improved deployment time Speeding Up Deployments Measuring CPU usage once a minute makes no sense for containers… Coping with rate of change is a big challenge for monitoring tools. Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours AWS Lambda Events • Respond in milliseconds • Live for seconds Monitoring Microservice and Containers: A Challenge by Adrian Cockcroft @ Gluecon 2015
  7. Container Use Cases @ Netflix • Media Encoding - R&D

    Cycle • Using VMs: 1 month • Using Containers: 1 week • Continuous Integration • Build all Netflix codebase in hours • Saves countless hours of debugging
  8. Batch Jobs

  9. Batch Jobs Services

  10. Services Add Complexity • Resizing & Lifecycle Duration • Autoscaling

    • Harder to upgrade underlying host • Have More State • Want to use existing tool chains
  11. None
  12. None
  13. Titus / Spinnaker Integration

  14. None
  15. None
  16. AWS ECS

  17. Security at Netflix

  18. Netflix Culture —> Netflix Security Values are what we Value

    High Performance Freedom & Responsibility Context, not Control Highly Aligned, Loosely Coupled http://www.slideshare.net/reed2001/culture-1798664
  19. Scumblr https://github.com/Netflix/Scumblr • Sync from data sources • Github •

    Route53 DNS • Manual upload • Analysis on these results • Search • Curl • Static analysis
  20. None
  21. Stethoscope • Help employees stay safe with BYOD • Education

    • Self service • Personalized • Actionable • User Focused Security • No forced updates • No company wide emails • No information overload
  22. None
  23. Titus Security

  24. Titus Security = Titus Security =Container Security

  25. Titus Security = Titus Security =Container Security Understanding and Hardening

    Linux Containers by Aaron Grattafiori (NCC Group) https://www.nccgroup.trust/us/our-research/understanding-and-hardening-linux-containers/
  26. Titus Security = Titus Security =Container Security ++

  27. Titus Security = Titus Security = Container App Security API

    Security Identity AWS Security Monitoring Registry Security Control Plane Security
  28. None
  29. Container App Security • Fresh base containers • Reference container

    app • Apps run as non-root
  30. AWS Security • Metadata proxy service • IAM roles per

    container • Security groups per container • Dedicated AWS account • Limit IAM perms for host
  31. Identity

  32. API Security • mTLS for all API access • Security

    assessment for all components • Threat modeling • AppSec assessment
  33. Monitoring • Insight for security patches • Check human access

    using Stethoscope • Forensic audit trail
  34. Registry Security • Container signing • Container revocation • AuthN

    to publish • Registry scanning
  35. Control Plane Security • TLS + authN for all UI

    access • Container isolation • Namespaces • Capabilities • MAC • Etc • General host OS hardening
  36. Questions? bryanp@netflix.com https://bryanpayne.org