Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A DevOps State of Mind: Continuous Security with Kubernetes

A DevOps State of Mind: Continuous Security with Kubernetes

At Bay Area Cyber Security Meetup - Mountain View

Chris Van Tuin

July 26, 2018
Tweet

More Decks by Chris Van Tuin

Other Decks in Technology

Transcript

  1. A DevOps State of Mind:
    Continuous Security with Kubernetes
    Chris Van Tuin
    Chief Technologist, NA West / Silicon Valley
    [email protected]

    View Slide

  2. DEV QA OPS
    SECURITY IS AN AFTERTHOUGHT
    | SECURITY |
    “Patch?
    The servers are behind the firewall.”
    - Anonymous (far too many to name), 2005 - …

    View Slide

  3. “Only the paranoid survive”
    - Andy Grove, 1996

    View Slide

  4. THE DISRUPTERS EMBRACING DEVOPS
    Empowered
    organization
    Speed Up 

    Innovation
    Time
    Change
    Move Fast,
    Break Things
    Culture of
    experimentation
    A
    20% vs. 25%
    Shorten the
    Feedback Loop
    Real-time
    data-driven
    intelligence &
    personalization
    AI /

    ML
    Data,
    Data,
    Data
    B

    View Slide

  5. BARE METAL VIRTUAL PRIVATE CLOUD
    OFF-PREMISE
    ON-PREMISE
    PUBLIC CLOUD
    DATA
    DATA
    DISTRIBUTED SERVICES

    View Slide

  6. ANY COMBINATION, WHETHER TRADITIONAL OR CONTAINERIZED
    LEGACY APPS
    (1,000+)
    BARE METAL
    PRIVATE CLOUD PUBLIC CLOUD
    VIRTUAL
    PRODUCTION DEV/TEST
    HYBRID CLOUD ENVIRONMENTS

    View Slide

  7. DEVSECOPS
    + +
    Security
    DEV QA OPS
    Culture Process Technology
    Linux + Containers
    IaaS
    Orchestration
    CI/CD
    Source Control Management
    Collaboration
    Build and Artifact Management
    Testing
    Frameworks
    Cloud Native Applications
    Hybrid Cloud
    Open Source

    View Slide

  8. DEVSECOPS
    Continuous
    Security
    Improvement
    Process
    Optimization
    Security
    Automation
    Dev QA Prod
    Reduce Risks, Lower Costs, Speed Delivery, Speed Reaction

    View Slide

  9. LAPTOP
    Container
    Application
    OS dependencies
    Guest VM
    LINUX
    BARE METAL
    Container
    Application
    OS dependencies
    LINUX
    VIRTUALIZATION
    Container
    Application
    OS dependencies
    Virtual Machine
    LINUX
    PRIVATE CLOUD
    Container
    Application
    OS dependencies
    Virtual Machine
    LINUX
    PUBLIC CLOUD
    Container
    Application
    OS dependencies
    Virtual Machine
    LINUX
    CONTAINERS - Build Once, Deploy Anywhere
    Reducing Risk and Improving Security with Improved Consistency

    View Slide

  10. BARE METAL VIRTUAL PRIVATE CLOUD PUBLIC CLOUD
    Security Platform

    View Slide

  11. ● No security on K8s
    dashboard
    ● IT infrastructure
    credentials exposed
    ● Enabled access to a
    large part of Weight
    Watchers' network
    ● K8s dashboard
    exposed
    ● AWS environment with
    telemetry data
    compromised
    ● Tesla’s infrastructure
    was used for crypto
    mining
    THE CONTAINERS NEWS YOU DON’T WANT
    ● 17 tainted crypto-mining
    containers on
    dockerhub
    ● Remained for ~1 year

    with 5 million pulls and
    ● Harvested ~90k in
    crypto currency.

    View Slide

  12. AUTOMATION

    View Slide

  13. Web Database
    role=web role=db role=web
    replicas=1, 

    role=db
    replicas=2, 

    role=web
    ORCHESTRATION
    Deployment, Declarative
    Pods
    Nodes
    Services
    Controller
    Manager
    &
    Data Store
    (etcd)

    View Slide

  14. Web Database
    replicas=1, 

    role=db
    replicas=2, 

    role=web
    HEALTH CHECK
    Pods
    Nodes
    Services
    role=web role=db role=web
    Controller
    Manager
    &
    Data Store
    (etcd)

    View Slide

  15. Pods
    Nodes
    Services
    Web Database
    replicas=1, 

    role=db
    replicas=3 

    role=web
    AUTO-SCALE
    50% CPU
    role=web role=db role=web role=web
    Controller
    Manager
    &
    Data Store
    (etcd)

    View Slide

  16. Network
    isolation
    API & Platform
    access
    Federated
    clusters
    Storage
    {}
    CI/CD
    Monitoring &
    Logging
    Images
    Builds
    SECURING YOUR CONTAINER ENVIRONMENT
    Container
    host
    Registry

    View Slide

  17. CONTAINER BUILDS

    View Slide

  18. docker.io
    Registry
    Private
    Registry
    FROM fedora:1.0
    CMD echo “Hello”
    Build file
    Physical, Virtual, Cloud
    Image Container
    Build Run
    Ship
    CONTAINER BUILDS

    View Slide

  19. 4
    ● Are there known vulnerabilities in
    the application layer?
    ● Are the runtime and OS layers up
    to date?
    ● How frequently will the container
    be updated and how will I know
    when it’s updated?
    CONTENT: EACH LAYER MATTERS
    CONTAINER
    OS
    RUNTIME
    APPLICATION
    CONTENT: EACH LAYER MATTERS
    AYER MATTERS
    CONTAINER
    OS
    RUNTIME
    APPLICATION
    JAR CONTAINER

    View Slide

  20. A CONVERGED SOFTWARE 

    SUPPLY CHAIN

    View Slide

  21. Best Practices
    • Treat as a Blueprint
    • Specify a user, defaults to root
    • Don’t login to build/configure
    • Version control build file
    • Be explicit with versions, not latest
    • Each Run creates a new layer
    CONTAINER BUILDS
    FROM fedora:1.0
    CMD echo “Hello”
    Build
    file
    Build

    View Slide

  22. CONTAINER IMAGE SECURITY

    View Slide

  23. code config data
    Kubernetes
    configmaps
    secrets
    Container
    image
    Traditional 

    data services,
    Kubernetes 

    persistent volumes
    TREAT CONTAINERS AS IMMUTABLE

    View Slide

  24. IMAGE SIGNING
    Validate what images and version are running

    View Slide

  25. CONTAINER REGISTRY SECURITY

    View Slide

  26. 64% of official images in Docker Hub 

    contain high priority security vulnerabilities
    examples:
    ShellShock (bash)
    Heartbleed (OpenSSL)
    Poodle (OpenSSL)
    Source: Over 30% of Official Images in Docker Hub Contain High Priority Security Vulnerabilities, Jayanth Gummaraju, Tarun Desikan, and Yoshio Turner, BanyanOps,
    May 2015 (http://www.banyanops.com/pdf/BanyanOps-AnalyzingDockerHub-WhitePaper.pdf)
    WHAT’S INSIDE THE CONTAINER MATTERS

    View Slide

  27. PRIVATE REGISTRY

    View Slide

  28. CONTAINER HOST SECURITY

    View Slide

  29. RUNNING CONTAINER RUNTIME IN READ-ONLY MODE
    Improve Security, Avoid data loss, Enforce quota
    Read/Write (default)
    /volumes
    tmpfs (memory)
    rootfs (copy-on-write)
    Development
    Container
    CRI-O Read Only Mode
    /volume
    tmpfs (/tmp,/var/tmp,/dev/shm,/run
    )
    /volumes
    rootfs (/)
    Production
    Container

    View Slide

  30. Best Practices
    • Don’t run as root
    • Limit SSH Access
    • Use namespaces
    • Define resource quotas
    • Enable logging
    • Apply Security Errata
    • Apply Security Context
    and seccomp filters
    • Run production 

    unprivileged containers 

    as read-only
    http://blog.kubernetes.io/2016/08/security-best-practices-kubernetes-deployment.html
    CONTAINER HOST SECURITY
    Kernel
    Hardware (Intel, AMD) or Virtual Machine
    Containers Containers
    Containers
    Unit File
    Docker
    Image
    Container CLI
    SYSTEMD
    Cgroups Namespaces SELinux
    Drivers seccomp Read Only mounts

    View Slide

  31. CGROUPS - RESOURCE ISOLATION

    View Slide

  32. NAMESPACES - PROCESS ISOLATION

    View Slide

  33. SELINUX - MANDATORY ACCESS CONTROLS
    Password
    Files
    Web
    Server Attacker
    Discretionary Access Controls 

    (file permissions)
    Mandatory Access Controls 

    (selinux)
    Internal
    Network
    Firewall
    Rules
    Password
    Files
    Firewall
    Rules
    Internal
    Network
    Web
    Server
    selinux
    policy

    View Slide

  34. SECCOMP - DROPPING PRIVILEGES

    View Slide

  35. READ ONLY MOUNTS

    View Slide

  36. CONTINUOUS INTEGRATION
    WITH CONTAINERS

    View Slide

  37. SECURITY IMPLICATIONS
    What’s inside matters…

    View Slide

  38. CONTINUOUS INTEGRATION + SECURITY

    View Slide

  39. Security
    CONTINUOUS INTEGRATION WITH
    SECURITY SCAN

    View Slide

  40. CONTINUOUS DELIVERY
    WITH CONTAINERS

    View Slide

  41. CONTINUOUS DELIVERY WITH CONTAINERS

    View Slide

  42. CONTINUOUS DELIVERY + SECURITY

    View Slide

  43. CONTINUOUS DELIVERY:
    DEPLOYMENT STRATEGIES

    View Slide

  44. CONTINUOUS DELIVERY DEPLOYMENT STRATEGIES
    DEPLOYMENT STRATEGIES
    • Recreate
    • Rolling updates
    • Blue / Green deployment

    View Slide

  45. Recreate

    View Slide

  46. Version 1 Version 1
    Version 1
    Version 1.2
    `
    Tests / CI
    RECREATE WITH DOWNTIME

    View Slide

  47. Version 1 Version 1
    Version 1
    Version 1.2
    `
    Tests / CI
    RECREATE WITH DOWNTIME

    View Slide

  48. Version 1.2 Version 1.2
    Version 1.2
    RECREATE WITH DOWNTIME
    Use Case
    • Non-mission critical services
    Cons
    • Downtime
    Pros
    • Simple, clean
    • No Schema incompatibilities
    • No API versioning

    View Slide

  49. Rolling Updates

    View Slide

  50. Version 1 Version 1
    Version 1
    Version 1.2
    `
    Tests / CI
    ROLLING UPDATES with ZERO DOWNTIME

    View Slide

  51. Deploy new version and wait until it’s ready…
    Version 1 Version 1 V1.2
    Health Check:
    readiness probe
    e.g. tcp, http, script
    V1

    View Slide

  52. Each container/pod is updated one by one
    Version 1.2
    50%
    Version 1 V1 V1.2

    View Slide

  53. Each container/pod is updated one by one
    Version 1.2
    Version 1.2
    Version 1.2
    100%
    Use Case
    • Horizontally scaled
    • Backward compatible
    API/data
    • Microservices
    Cons
    • Require backward
    compatible APIs/data
    • Resource overhead
    Pros
    • Zero downtime
    • Reduced risk, gradual
    rollout w/health checks
    • Ready for rollback

    View Slide

  54. Blue / Green Deployment

    View Slide

  55. Version 1
    BLUE / GREEN DEPLOYMENT
    Route
    BLUE

    View Slide

  56. Version 1
    BLUE / GREEN DEPLOYMENT
    Version 1.2
    BLUE GREEN

    View Slide

  57. Version 1 Tests / CI
    BLUE / GREEN DEPLOYMENT
    Version 1.2
    BLUE GREEN

    View Slide

  58. Version 1 Version 1.2
    BLUE / GREEN DEPLOYMENT
    Route
    Version 1.2
    BLUE GREEN

    View Slide

  59. Version 1
    BLUE / GREEN DEPLOYMENT
    Rollback
    Route
    Version 1.2
    BLUE GREEN
    Use Case
    • Self-contained micro
    services (data)
    Cons
    • Resource overhead
    • Data synchronization
    Pros
    • Low risk, never
    change production
    • No downtime
    • Production like testing
    • Rollback

    View Slide

  60. RAPID INNOVATION &
    EXPERIMENTATION

    View Slide

  61. ”only about 1/3 of ideas improve the metrics 

    they were designed to improve.”

    Ronny Kohavi, Microsoft (Amazon)
    MICROSERVICES
    RAPID INNNOVATION & EXPERIMENTATION

    View Slide

  62. CONTINUOUS FEEDBACK LOOP

    View Slide

  63. A/B TESTING USING CANARY DEPLOYMENTS

    View Slide

  64. Version 1.2
    Version 1
    100%
    Tests / CI
    Version 1.2
    Route
    25% Conversion Rate ?! Conversion Rate
    CANARY DEPLOYMENTS

    View Slide

  65. 50% 50%
    Version 1.2
    Version 1
    Route
    Version 1.2
    25% Conversion Rate 30% Conversion Rate
    CANARY DEPLOYMENTS

    View Slide

  66. 25% Conversion Rate
    100%
    Version 1 Version 1.2
    Route
    Version 1.2
    30% Conversion Rate
    CANARY DEPLOYMENTS

    View Slide

  67. Version 1.2
    Version 1
    100%
    Route
    Rollback
    25% Conversion Rate 20% Conversion Rate
    CANARY DEPLOYMENTS

    View Slide

  68. Network
    isolation
    API & Platform
    access
    Federated
    clusters
    Storage
    {}
    CI/CD
    Monitoring &
    Logging
    Images
    Builds
    Container
    host
    Registry
    SECURING YOUR CONTAINER ENVIRONMENT

    View Slide

  69. NETWORK SECURITY

    View Slide

  70. Kubernetes 

    Logical Network Model
    NETWORK SECURITY
    • Kubernetes uses a flat SDN model
    • All pods get IP from same CIDR
    • And live on same logical network
    • Assumes all nodes communicate

    Traditional 

    Physical Network Model
    • Each layer represents a Zone with

    increased trust - DMZ > App > DB,

    interzone flow generally one direction
    • Intrazone traffic generally unrestricted

    View Slide

  71. Network Namespace 

    provides resource isolation
    NETWORK ISOLATION
    Multi-Environment Multi-Tenant

    View Slide

  72. NETWORK POLICY
    example: 

    all pods in namespace ‘project-a’ allow traffic 

    from any other pods in the same namespace.”

    View Slide

  73. NETWORK SECURITY MODELS
    Co-Existence Approaches
    One Cluster
    Multiple Zones
    Kubernete Cluster
    Physical Compute 

    isolation based on 

    Network Zones
    Kubernete Cluster
    One Cluster
    Per Zone
    Kubernete Cluster B
    Kubernete Cluster A
    Kubernetes Cluster B
    C
    D
    https://blog.openshift.com/openshift-and-network-security-zones-coexistence-approaches/

    View Slide

  74. MONITORING & LOGGING

    View Slide

  75. KUBERNETES MONITORING CONSIDERATIONS
    Kubernetes
    Application
    Container
    Host
    Cluster services, services, pods, 

    deployments metrics
    Distributed applications
    - traditional app metrics
    - service discovery
    - distributed tracing
    Container native metrics
    Traditional resource metrics
    - cpu, memory, network, storage
    kubernetes-state-metrics
    probes
    Stack Metrics Tool
    prometheus + grafana
    jaeger tracing
    istio
    node-exporter
    kuberlet:cAdvisor

    View Slide

  76. Aggregate platform and application monitoring access

    via prometheus + Grafana
    MONITORING
    Host

    View Slide

  77. Aggregate platform and application log access via Elasticsearch+ Fluentd +Kabana (EFK)
    LOGGING
    https://www.slideshare.net/JosefKarsek/logsmetrics-gathering-with-openshift-efk-stack

    View Slide

  78. STORAGE SECURITY

    View Slide

  79. Local Storage Quota Security Context Constraints
    STORAGE SECURITY
    Sometimes we can also have
    storage isolation requirements: 

    pods in a network zone must use
    different storage endpoints 

    than pods in other network
    zones.
    We can create one storage class
    per storage endpoint and 

    then control which storage
    class(es) a project can use

    View Slide

  80. API & PLATFORM ACCESS

    View Slide

  81. Authentication
    via
    OAuth tokens and
    SSL certificate
    Authorization
    via
    Policy Engine
    checks
    User/Group
    Defined Roles
    API & PLATFORM ACCESS

    View Slide

  82. FEDERATION

    View Slide

  83. Amazon East OpenStack
    FEDERATED CLUSTERS
    Roles & access management (in-dev)

    View Slide

  84. MICROSERVICES

    View Slide

  85. Monitoring & Metrics
    -prometheus (logs)
    -grafana (visual)
    Access Control
    & usage policies
    -mixr (policy decisions)
    Encryption & Auth
    -citadel
    -service 2 service
    -user auth
    Traffic routing
    - pilot
    - circuit breaker
    - a/b testing
    - traffic mirroring
    Fault injections
    -envoy
    corner cases: abort & delays
    SERVICE MESH

    View Slide

  86. OPERATORS

    View Slide

  87. CRI-O v1.10
    Feature(s): CRI-O v1.10
    Description: CRI-O is an OCI compliant
    implementation of the Kubernetes Container Runtime
    Interface. By design it provides only the runtime
    capabilities needed by the kubelet. CRI-O is designed
    to be part of Kubernetes and evolve in lock-step with
    the platform.
    CRI-O brings:
    ● A minimal and secure architecture
    ● Excellent scale and performance
    ● Ability to run any OCI / Docker image
    ● Familiar operational tooling and commands
    Improvements include:
    ● crictl CLI for debugging and troubleshooting
    ● Podman for image tagging & management
    ● Installer integration & fresh install time decision:
    openshift_use_crio=True
    ○ Not available for existing cluster upgrades
    Kubelet Storage Image
    RunC
    CNI Networking

    View Slide

  88. Deployment
    Frequency
    Lead
    Time
    Deployment

    Failure Rate
    Mean Time
    to Recover
    99.999
    Service
    Availability
    DEVSECOPS METRICS
    Compliance
    Score

    View Slide

  89. THANK YOU
    linkedin: Chris Van Tuin
    email: [email protected]
    twitter: @chrisvantuin

    View Slide