Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Key Management in AWS: How Netflix Secures Sens...

Bryan Payne
October 21, 2015

Key Management in AWS: How Netflix Secures Sensitive Data Without Its Own Data Center

Netflix has embraced the cloud paradigm by moving its entire product into AWS. Fulfilling this vision meant finding a solution for every piece of our ecosystem in the cloud, including storage of our most sensitive cryptographic keys. We accomplished this using AWS CloudHSM, combined with a custom solution that handles cross-region key replication, failover, and more. This talk will discuss Netflix’s Cryptex service and how it balances security and agility in ways that bring HSM-level security into the toolkit of modern cloud application deployments.

This talk was given at JPL in October 2015.

Thanks to Jay Zarfoss for creating many of the slides!

Bryan Payne

October 21, 2015
Tweet

More Decks by Bryan Payne

Other Decks in Technology

Transcript

  1. Key Management in AWS How Netflix Secures Sensitive Data Without

    Its Own Data Center Bryan D. Payne, Ph.D. Engineering Manager, Platform Security @ Netflix
  2. Platform Security Overview Microservices in the Cloud Device or Browser

    Netflix Open Connect Appliance 1 2 - AWS Mgmt - Security Tools - Code Review - Forensics / IR - IT Security - Content Protection - Device Security Platform Security - Foundational Security Services - Security in Common Platform - Security by Default in base AMI
  3. Classic Security via AWS CloudHSM Instance Metadata Signature Identity &

    Access Management Trusted Services (AWS) Great Unknown Hypervisor Hardware Platform Physical Security Malicious Insider Key Management Supply Chain Firmware Side Channel Leaks Trusted Services (Netflix) Secret Deployment Service Self-Service CA Crypto / Key Management Service
  4. Ubiquitous Security • Partner with other teams • Make security

    transparent (or easy) • Focus on common components • Also focus on strategic risks Platform Security Review Implement Im plem ent D eploy Report Service Creation Service Maintenance Security Audit IR / Forensics Plan Security Improvements Security Services Security Defaults
  5. Any large scale AWS deployment needs key-leveraging cryptography • Authentication

    to AWS Services themselves • Business Logic use cases (if nothing else PCI/HIPAA/whatever requires encryption) Microservices in the Cloud Device or Browser Netflix Open Connect Appliance ISP or Netflix Hosted at IXs Amazon Web Services (AWS) Everyone Needs Crypto
  6. Background: Authenticating yourself to AWS http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html DELETE /johnsmith/photos/puppy.jpg HTTP/1.1 User-Agent:

    dotnet Host: s3.amazonaws.com Date: Tue, 27 Mar 2007 21:20:27 +0000 x-amz-date: Tue, 27 Mar 2007 21:20:26 +0000 Authorization: AWS AKIAIOSFODNN7EXAMPLE:lx3byBScXR6KzyMaifNkardMwNk=
  7. AWS HMAC Generation Not real secret keys, sorry. Lifecycle of

    AccessKeyID and SecretKey is of utmost interest here. AKIAIOSFODNN7EXAMPLE:iX KQe8qXbhnN0jUe7JGVqFNXM mTxP5pI6example DELETE\n \n \n Tue, 27 Mar 2007 21:20:26 +0000\n /johnsmith/photos/puppy.jpg AccessKeyID and SecretKey HMAC- SHA-1 Customer Request lx3byBScXR6KzyMaifNkardMwNk Digest Verified by AWS
  8. SDKs make this easier //fortunately, AWS provides helper objects that

    do most of the work BasicAWSCredentials cred = new BasicAWSCredentials(accessKeyID, secretKeyID); AmazonSimpleDBClient client = new AmazonSimpleDBClient(cred); //ugly HMAC generating code safely tucked away in here somewhere client.listDomains(); Still does not answer the question as to where accessKeyID and secretKeyID are stored in our ecosystem.
  9. Where we put AccessKeyID and SecretKey
 First Attempt - Just

    stick them in a System Property in the AMI // if it makes you feel better, let’s pretend the properties
 // are obfuscated and don’t show up in an obvious grep BasicAWSCredentials cred = new BasicAWSCredentials(
 System.getProperty(”AccessKeyID”), System.getProperty(”SecretKeyID”)
 ); Obvious deficiencies: • Key Exposed? Rebake thousands of AMIs and Redeploy • Is a relatively big effort to rotate in a managed fashion
  10. Where to Put AccessKeyID and SecretKey
 Second Attempt - Deliver

    via our own simple RESTful Key Server GET server/getAWSKey <AWSKEY> <accessKeyID>AKIAIOSFODNN7EXAMPLE</accessKeyID> <secretKey>iXKQe8qXbhnN0jUe7JGVqFNXMmTxP5pI6example</secretKey> </AWSKEY> Improvement! • In theory gives direct control over key delivered to machines • Add some access control and TLS because security • In practice still has problems and you’ll need to reboot VMs
  11. Unfortunate Key Stickiness RESTfulObj AWSKey = RESTService.get(“server/getAWSKey”); BasicAWSCredentials cred =

    new BasicAWSCredentials(AWSKey.getAccessID(), AWSKey.getSecretKey()); AmazonSimpleDBClient client = new AmazonSimpleDBClient(cred); client.listDomains(); In practice in the standard Java (and other) SDKs, keys stay lodged in the client object in the above code example.
  12. AWS SDKs Introduce the Provider Paradigm
 2012 seems like a

    different lifetime // provider paradigm dynamically asks for keys every time AWSCredentialsProvider prov = new AWSCredentialsProvider(){ public AWSCredentials getCredentials(){ RESTfulObj AWSKey = RESTService.get(“server/getAWSKey”); return new BasicAWSCredentials( AWSKey.getAccessID(), AWSKey.getSecretKey()); } }; AmazonSimpleDBClient client = new AmazonSimpleDBClient(prov); client.listDomains(); The client object in the above code example no longer caches keys.
  13. Systematically Enforce Refresh GET server/getAWSKey <AWSKEY> <accessKeyID>AKIAIOSFODNN7EXAMPLE</accessKeyID> <secretKey>iXKQe8qXbhnN0jUe7JGVqFNXMmTxP5pI6example</secretKey>
 <expirationEpoch>1352083995</expirationEpoch> </AWSKEY>

    Huge wins to this approach as temporary mishandling by clients (writing keys to log files) no longer invokes a mass key rotation, and rotation now happens automatically.
  14. On Instance Credentials 
 
 …Or Finally just let AWS

    Manage The Key Server $curl http://169.254.169.254/latest/meta-data/iam/security-credentials/role { "Code" : "Success", "LastUpdated" : "2015-09-17T01:29:49Z", "Type" : "AWS-HMAC", "AccessKeyId" : "ASIAIL6IJJCXLEXAMPLE", "SecretAccessKey" : "iXKQe8qXbhnN0jUe7JGVqFNXMmTxP5pI6example", "Token" : "...", "Expiration" : "2015-09-17T07:47:45Z" } The preferred approach of today — still not perfect. • Key good everywhere (not just sourced machine) • Developers still out of luck — so we maintain legacy server too* *Arguably better approaches for mimicking this like Hologram: https://github.com/AdRoll/hologram
  15. Business Use Cases For Cryptographic Keys Now that we’ve (mostly)

    outsourced AWS Keys to AWS, we’re left with everything else we need to run our business. • Certificate Authority Private Keys • Keys to Encrypt Sensitive Data • HMAC API Keys to Sign REST requests to Partners • Long term, Short term • Symmetric, Asymmetric
  16. Can you really trust a VM? Plenty of research exists

    to imply that keys are not safe in a VM that shares resources with an untrusted party. Some light reading as an example: Cross-VM Side Channels and Their Use to Extract Private Keys
 Available at: http://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf A seemingly limitless pool of virtual hardware offers unprecedented benefits of scale, throughput, and availability. But….
  17. Use Case of a Key Implies Handling Requirements TLS Session

    Key - Fast, Handled in Dynamic Environment
 • But easy to have a reasonable policy if we lose it Certificate Authority Private Key - Maybe not used so much • Probably way more important that you just don’t lose it
  18. Simple Framework for Key Handling Throughput Protection It’s Exposed! It

    lives… Low Sensitivity High Low No biggie In lots of VMs Medium Sensitivity Medium Medium It’ll be a long week. In very few VMs High Sensitivity Low High No. Just. No. In Special Hardware
  19. Cryptex - Our Framework for Key Handling Eureka Server(s) Eureka

    Server(s) Cryptex Server(s) Web Server Logic Netflix Business Application Cryptex Client Library Netflix IPC Components (Ribbon/Hystrix/etc) Many of these Not Many of these Cloud HSMs - Dedicated Hardware
  20. “Low” Key Handling Cryptex Client Library Netflix Business Application Cryptex

    Server GetKey(ID=123) Resp(Value=iXKQ…) Client Auth TLS Encrypt/Decrypt Key Exported Out to Every Client • Extremely High Throughput • Client Library Attempts to be Mindful of Key Handling
  21. “Medium” Key Handling Every Operation is a REST Call •

    Luckily we don’t have many bulk encrypt use cases for these • Cryptex servers not publicly facing; ostensibly harder to get onto Cryptex Client Library Netflix Business Application Cryptex Server GetKey(ID=456) Resp(Value=null) Client Auth TLS Encrypt(ID=456,PT=…) Resp(CT=5pI6…)
  22. “High” Key Handling Cryptex Server Cryptex Client Library Netflix Business

    Application GetKey(ID=789) Resp(Value=null) Client Auth TLS Encrypt(ID=789,PT=…) Resp(CT=JGVqF…) HSM API Encrypt(ID=789,PT=…) Resp(CT=JGVqF…) Every Operation is a call to specialized hardware • HSM API challenging relative to REST calls (only Cryptex does it) • Very constrained throughput;VM side channel attacks negated
  23. “Asymmetric” Key Handling Cryptex Client Library Netflix Business Application Cryptex

    Server GetKey(ID=111) Resp(PubValue=iXKQ…) Client Auth TLS Verify We support the basics: AES, HMAC-SHA, RSA • Optimize RSA verify/encrypt by pushing public key to edge • At scale computational intensity of RSA quite apparent
  24. The ongoing struggle with integration // legacy software assumes long

    running and managed machines // with long lasting hand deployed filesystems, not dynamic key loading -Djavax.net.ssl.keyStore=<file pointer to key> Many OSS products provide only this mechanism for key loading • Filesystem-like integration with these is current/future work • When making your OSS API, consider a stream instead of a file
  25. Upcoming and Future We are eagerly awaiting better new features

    to leverage • AWS KMS* — great potential; not ready for us yet • AES Symmetric Only • Not Cross Regional • Extremely constrained throughput (hundreds of ops/sec per zone) • We’re hopeful someday it will make our own products obsolete *https://aws.amazon.com/kms/ We’d love to OSS more (wanna help?)