Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Key Management in AWS: How Netflix Secures Sensitive Data Without Its Own Data Center

Bryan Payne
October 21, 2015

Key Management in AWS: How Netflix Secures Sensitive Data Without Its Own Data Center

Netflix has embraced the cloud paradigm by moving its entire product into AWS. Fulfilling this vision meant finding a solution for every piece of our ecosystem in the cloud, including storage of our most sensitive cryptographic keys. We accomplished this using AWS CloudHSM, combined with a custom solution that handles cross-region key replication, failover, and more. This talk will discuss Netflix’s Cryptex service and how it balances security and agility in ways that bring HSM-level security into the toolkit of modern cloud application deployments.

This talk was given at JPL in October 2015.

Thanks to Jay Zarfoss for creating many of the slides!

Bryan Payne

October 21, 2015
Tweet

More Decks by Bryan Payne

Other Decks in Technology

Transcript

  1. Key Management in AWS
    How Netflix Secures Sensitive Data Without Its Own Data Center
    Bryan D. Payne, Ph.D. Engineering Manager, Platform Security @ Netflix

    View Slide

  2. Platform Security @ Netflix
    Case Study: AWS Authentication
    Cryptex Motivation
    Cryptex Architecture
    Looking Ahead

    View Slide

  3. Platform
    Security
    Overview
    Microservices in the Cloud
    Device or
    Browser
    Netflix Open
    Connect Appliance
    1
    2
    - AWS Mgmt
    - Security Tools
    - Code Review
    - Forensics / IR
    - IT Security
    - Content Protection
    - Device Security
    Platform Security
    - Foundational Security Services
    - Security in Common Platform
    - Security by Default in base AMI

    View Slide

  4. Classic
    Security
    via
    AWS
    CloudHSM
    Instance
    Metadata
    Signature
    Identity &
    Access
    Management
    Trusted Services
    (AWS)
    Great Unknown
    Hypervisor
    Hardware Platform
    Physical Security
    Malicious Insider
    Key Management
    Supply Chain
    Firmware
    Side Channel Leaks
    Trusted Services
    (Netflix)
    Secret Deployment
    Service
    Self-Service CA
    Crypto / Key
    Management Service

    View Slide

  5. Ubiquitous
    Security
    • Partner with other teams
    • Make security transparent (or easy)
    • Focus on common components
    • Also focus on strategic risks
    Platform Security
    Review
    Implement
    Im
    plem
    ent
    D
    eploy
    Report
    Service Creation
    Service Maintenance
    Security Audit
    IR / Forensics
    Plan Security
    Improvements
    Security Services
    Security Defaults

    View Slide

  6. Any large scale AWS deployment needs key-leveraging cryptography
    • Authentication to AWS Services themselves
    • Business Logic use cases (if nothing else PCI/HIPAA/whatever requires encryption)
    Microservices in the Cloud
    Device or
    Browser
    Netflix Open
    Connect Appliance
    ISP or Netflix Hosted at IXs
    Amazon Web Services (AWS)
    Everyone
    Needs
    Crypto

    View Slide

  7. Case Study: AWS Authentication

    View Slide

  8. Background: Authenticating yourself to AWS
    http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html
    DELETE /johnsmith/photos/puppy.jpg HTTP/1.1
    User-Agent: dotnet
    Host: s3.amazonaws.com
    Date: Tue, 27 Mar 2007 21:20:27 +0000
    x-amz-date: Tue, 27 Mar 2007 21:20:26 +0000
    Authorization: AWS AKIAIOSFODNN7EXAMPLE:lx3byBScXR6KzyMaifNkardMwNk=

    View Slide

  9. AWS HMAC
    Generation
    Not real secret keys, sorry.
    Lifecycle of AccessKeyID and
    SecretKey is of utmost interest here.
    AKIAIOSFODNN7EXAMPLE:iX
    KQe8qXbhnN0jUe7JGVqFNXM
    mTxP5pI6example
    DELETE\n
    \n
    \n
    Tue, 27 Mar 2007 21:20:26 +0000\n
    /johnsmith/photos/puppy.jpg
    AccessKeyID and SecretKey
    HMAC-
    SHA-1
    Customer Request
    lx3byBScXR6KzyMaifNkardMwNk
    Digest Verified by AWS

    View Slide

  10. SDKs make this easier
    //fortunately, AWS provides helper objects that do most of the work
    BasicAWSCredentials cred =
    new BasicAWSCredentials(accessKeyID, secretKeyID);
    AmazonSimpleDBClient client = new AmazonSimpleDBClient(cred);
    //ugly HMAC generating code safely tucked away in here somewhere
    client.listDomains();
    Still does not answer the question as to where accessKeyID and
    secretKeyID are stored in our ecosystem.

    View Slide

  11. Where we put AccessKeyID and SecretKey

    First Attempt - Just stick them in a System Property in the AMI
    // if it makes you feel better, let’s pretend the properties

    // are obfuscated and don’t show up in an obvious grep
    BasicAWSCredentials cred =
    new BasicAWSCredentials(

    System.getProperty(”AccessKeyID”),
    System.getProperty(”SecretKeyID”)

    );
    Obvious deficiencies:
    • Key Exposed? Rebake thousands of AMIs and Redeploy
    • Is a relatively big effort to rotate in a managed fashion

    View Slide

  12. Where to Put AccessKeyID and SecretKey

    Second Attempt - Deliver via our own simple RESTful Key Server
    GET server/getAWSKey

    AKIAIOSFODNN7EXAMPLE
    iXKQe8qXbhnN0jUe7JGVqFNXMmTxP5pI6example

    Improvement!
    • In theory gives direct control over key delivered to machines
    • Add some access control and TLS because security
    • In practice still has problems and you’ll need to reboot VMs

    View Slide

  13. Unfortunate Key Stickiness
    RESTfulObj AWSKey = RESTService.get(“server/getAWSKey”);
    BasicAWSCredentials cred =
    new BasicAWSCredentials(AWSKey.getAccessID(), AWSKey.getSecretKey());
    AmazonSimpleDBClient client = new AmazonSimpleDBClient(cred);
    client.listDomains();
    In practice in the standard Java (and other) SDKs, keys stay
    lodged in the client object in the above code example.

    View Slide

  14. AWS SDKs Introduce the Provider Paradigm

    2012 seems like a different lifetime
    // provider paradigm dynamically asks for keys every time
    AWSCredentialsProvider prov = new AWSCredentialsProvider(){
    public AWSCredentials getCredentials(){
    RESTfulObj AWSKey = RESTService.get(“server/getAWSKey”);
    return new BasicAWSCredentials(
    AWSKey.getAccessID(), AWSKey.getSecretKey());
    }
    };
    AmazonSimpleDBClient client = new AmazonSimpleDBClient(prov);
    client.listDomains();
    The client object in the above code example no longer caches keys.

    View Slide

  15. Systematically Enforce Refresh
    GET server/getAWSKey

    AKIAIOSFODNN7EXAMPLE
    iXKQe8qXbhnN0jUe7JGVqFNXMmTxP5pI6example

    1352083995

    Huge wins to this approach as temporary mishandling by clients
    (writing keys to log files) no longer invokes a mass key rotation,
    and rotation now happens automatically.

    View Slide

  16. On Instance Credentials 


    …Or Finally just let AWS Manage The Key Server
    $curl http://169.254.169.254/latest/meta-data/iam/security-credentials/role
    {
    "Code" : "Success",
    "LastUpdated" : "2015-09-17T01:29:49Z",
    "Type" : "AWS-HMAC",
    "AccessKeyId" : "ASIAIL6IJJCXLEXAMPLE",
    "SecretAccessKey" : "iXKQe8qXbhnN0jUe7JGVqFNXMmTxP5pI6example",
    "Token" : "...",
    "Expiration" : "2015-09-17T07:47:45Z"
    }
    The preferred approach of today — still not perfect.
    • Key good everywhere (not just sourced machine)
    • Developers still out of luck — so we maintain legacy server too*
    *Arguably better approaches for mimicking this like Hologram: https://github.com/AdRoll/hologram

    View Slide

  17. Cryptex Motivation

    View Slide

  18. Business Use Cases For Cryptographic Keys
    Now that we’ve (mostly) outsourced AWS Keys to AWS,
    we’re left with everything else we need to run our business.
    • Certificate Authority Private Keys
    • Keys to Encrypt Sensitive Data
    • HMAC API Keys to Sign REST requests to Partners
    • Long term, Short term
    • Symmetric, Asymmetric

    View Slide

  19. Can you really trust a VM?
    Plenty of research exists to imply that keys are not safe
    in a VM that shares resources with an untrusted party.
    Some light reading as an example:
    Cross-VM Side Channels and Their Use to Extract Private Keys

    Available at: http://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf
    A seemingly limitless pool of virtual hardware offers unprecedented
    benefits of scale, throughput, and availability. But….

    View Slide

  20. Use Case of a Key Implies Handling Requirements
    TLS Session Key - Fast, Handled in Dynamic Environment

    • But easy to have a reasonable policy if we lose it
    Certificate Authority Private Key - Maybe not used so much
    • Probably way more important that you just don’t lose it

    View Slide

  21. Simple Framework for Key Handling
    Throughput Protection It’s Exposed! It lives…
    Low Sensitivity High Low No biggie In lots of VMs
    Medium Sensitivity Medium Medium
    It’ll be a long
    week.
    In very few VMs
    High Sensitivity Low High No. Just. No.
    In Special
    Hardware

    View Slide

  22. Cryptex Architecture

    View Slide

  23. Cryptex - Our Framework for Key Handling
    Eureka
    Server(s)
    Eureka
    Server(s)
    Cryptex
    Server(s)
    Web Server Logic
    Netflix Business Application
    Cryptex Client Library
    Netflix IPC Components (Ribbon/Hystrix/etc)
    Many of these
    Not Many of these
    Cloud HSMs - Dedicated Hardware

    View Slide

  24. “Low” Key Handling
    Cryptex Client Library
    Netflix Business Application
    Cryptex Server
    GetKey(ID=123)
    Resp(Value=iXKQ…)
    Client Auth TLS
    Encrypt/Decrypt
    Key Exported Out to Every Client
    • Extremely High Throughput
    • Client Library Attempts to be Mindful of Key Handling

    View Slide

  25. “Medium” Key Handling
    Every Operation is a REST Call
    • Luckily we don’t have many bulk encrypt use cases for these
    • Cryptex servers not publicly facing; ostensibly harder to get onto
    Cryptex Client Library
    Netflix Business Application
    Cryptex Server
    GetKey(ID=456)
    Resp(Value=null)
    Client Auth TLS
    Encrypt(ID=456,PT=…)
    Resp(CT=5pI6…)

    View Slide

  26. “High” Key Handling
    Cryptex
    Server
    Cryptex Client Library
    Netflix Business Application
    GetKey(ID=789)
    Resp(Value=null)
    Client Auth TLS
    Encrypt(ID=789,PT=…)
    Resp(CT=JGVqF…)
    HSM API
    Encrypt(ID=789,PT=…)
    Resp(CT=JGVqF…)
    Every Operation is a call to specialized hardware
    • HSM API challenging relative to REST calls (only Cryptex does it)
    • Very constrained throughput;VM side channel attacks negated

    View Slide

  27. “Asymmetric” Key Handling
    Cryptex Client Library
    Netflix Business Application
    Cryptex Server
    GetKey(ID=111)
    Resp(PubValue=iXKQ…)
    Client Auth TLS
    Verify
    We support the basics: AES, HMAC-SHA, RSA
    • Optimize RSA verify/encrypt by pushing public key to edge
    • At scale computational intensity of RSA quite apparent

    View Slide

  28. The ongoing struggle with integration
    // legacy software assumes long running and managed machines
    // with long lasting hand deployed filesystems, not dynamic key loading
    -Djavax.net.ssl.keyStore=
    Many OSS products provide only this mechanism for key loading
    • Filesystem-like integration with these is current/future work
    • When making your OSS API, consider a stream instead of a file

    View Slide

  29. Upcoming and Future
    We are eagerly awaiting better new features to leverage
    • AWS KMS* — great potential; not ready for us yet
    • AES Symmetric Only
    • Not Cross Regional
    • Extremely constrained throughput (hundreds of ops/sec per zone)
    • We’re hopeful someday it will make our own products obsolete
    *https://aws.amazon.com/kms/
    We’d love to OSS more (wanna help?)

    View Slide

  30. Questions?
    bryanp@netflix.com
    https://www.linkedin.com/in/bdpayne

    View Slide