Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Not screwing up Encryption as a Developer

Not screwing up Encryption as a Developer

Strict transport security, certificate pinning, perfect forward secrecy, initialization vectors, DKIM, HMAC, TLS, AES, CA, OMG. What on earth are all those convoluted encryption terms and does it really matter to me as a developer?

This talk will present in plain English some of the most important and most mis-understood cryptographic concepts in our industry and dive into the areas most critical for you as a developer.

Keeping data secure through cryptography is an integral part of our lives as developers, and it turns out understanding encryption isn't as cryptic as you'd expect!

Simon Sturmer

July 19, 2019
Tweet

More Decks by Simon Sturmer

Other Decks in Technology

Transcript

  1. I'm Simon I'm a developer. I like building stuff and

    learning stuff. Find me on Twitter: @sstur_
  2. Encryption Simply put, encryption is the process of encoding information

    in such a way that only authorized parties can access it. Encryption is part of the broader topic of cryptography. This talk is about how cryptography relates to you as a developer.
  3. Why would a developer need to understand this? Software is

    about processing information. People and businesses trust your software with their information. In many ways, it falls on you as a software developer to understand how to keep information safe.
  4. • Communicating without letting unauthorized parties read it. • Verifying

    the integrity of a message or file — being sure it has not been modified. • Storing data in a way that only you can later read it. Uses of Cryptography • Validating that the party you are communicating with is indeed who they claim to be. • Saving passwords in a way that cannot be reversed to see the original password • Generating unique IDs, such as in Git (...)
  5. Security Fail There are many ways, as developers, we fail

    at security, such as: • forgetting to password protect a database • exposing private data through a web interface or API • writing software with vulnerabilities that allow an attacker to run code on your server • publishing sensitive information to a public repository
  6. Crypto Fail One category of ways developers fail at security

    is around cryptography. • Sending information unencrypted, through insecure channels • Using weak encryption • Not verifying authenticity of a remote party • Storing sensitive info like passwords in insecure ways
  7. Let's talk In order to understand how to do this

    correctly as a developer, we should first talk about: 1. What problems we are trying to solve 2. The different kinds of encryption 3. How they work together to solve such problems
  8. Cryptography can help us • Securely communicate with someone over

    an insecure channel • Securely storing data so that only us (or someone we trust) can retrieve it • Trust: Verify that an unknown party is who they claim to be • Data integrity: Verify that some data sent from that party has not been modified by someone • Key derivation: Creating an encryption key from a text such as a password
  9. Types of cryptography There are 3 important types of cryptography

    we’ll explore today: • Hashing • Symmetric Encryption • Asymmetric Encryption, including: ◦ key exchange ◦ public key cryptography
  10. Hashing • Deterministic: The same input will always generate the

    same output • The input cannot be determined from the input. It is “one way” and as such, information is lost in the process, making it impossible to reconstruct the original. A hash is a sort of one-way message digest. It takes some data of arbitrary length as input (e.g. a password, or an entire file) and generates a fixed length output (the digest or hash) that “represents” the input in a way that follows these two principles..
  11. Collisions Since the output (hash) is a fixed size which

    may be fewer bytes than the input, mathematically it cannot represent as much data, meaning that there are less possible hashes than there are different inputs. Thus, it’s possible that two different inputs result in the same hash value. This is called a collision.
  12. Avoiding Collisions A good hash algorithm minimizes the probability of

    collisions occurring in the wild. Take the SHA-256 hashing algorithm for example (the one used in Bitcoin mining). No two inputs have ever been discovered that produce the same output. The possibility of finding one in our lifetime is incredibly small.
  13. Hashing is often used as a “checksum” which can verify

    the integrity of some data, for example to make sure a network error didn’t cause data corruption during transmission. Uses of Hashing
  14. Uses of Hashing Another use case is to sign a

    message using a secret that only you know. If you later receive that message back (e.g. session cookie) then you know that it was not modified. One way to do this is using HMAC — Hash-based Message Authentication code.
  15. HMAC If you and I both know the secret, then

    I can send you a signed message and you can verify that it was indeed from me, and not modified by someone intercepting the message. But why is it hashed twice?
  16. There’s a weakness of some hashes that allows a length

    extension attack. When a Merkle–Damgård based hash is misused as a message authentication code with construction H(secret + message), and message and the length of secret is known, a length extension attack allows anyone to include extra information at the end of the message and produce a valid hash without knowing the secret. Length Extension Attack
  17. But regardless of good hashing practice, this does not hide

    the contents of the message from anyone. For that we need to use encryption.
  18. Symmetric Encryption Going far back in history, it's common to

    need to encode a message so that only the intended recipient can understand it. In the very old days, the "method of encryption" was the secret. In modern encryption, the method of encryption (cipher) is public but the key is secret.
  19. Substitution Cipher The simplest form of using a key to

    encrypt a message is a "substitution cipher" in which each letter is replaced by a different letter. In such a system, the "key" is the list of substitutions, so the recipient can swap back in the original letters.
  20. A Stronger Substitution Cipher One way to make that stronger

    is to change the character mapping after each character processed. It would be changed according to some predetermined method.
  21. This is how the Enigma machine worked in WW2. The

    way in which the mapping changed each keypress was effectively the key. Enigma
  22. Distributing they Key These keys were distributed manually, on paper

    and the machine needed to be re-configured with each new key.
  23. Key Exchange In symmetric ciphers, like the Enigma, or even

    modern day ones such as AES, both parties need to have a secret key. Getting that key from one person to another was historically done physically. This can present a huge challenge — More on this later.
  24. Modern symmetric algorithms, including AES, encrypt data in blocks. Data

    is divided up into equal size blocks (the block size of the algorithm), padded if necessary, and each block is encrypted individually. Blocks of Data
  25. If you simply use the key to encrypt each block,

    that simple approach is “ECB” which refers to the block mode of the algorithm. The problem is that encryption is deterministic for a given key, meaning that two identical input blocks will result in identical output blocks ... over many blocks, a pattern can emerge. Block Mode: ECB
  26. Block Mode: CBC CBC mode, for example, uses a randomly

    generated IV — initialization vector — as a sort of salt to obfuscate the first block before encryption. Then each subsequent block derives its “salt” from the encrypted representation of the previous block.
  27. Block Mode: CBC Thus, two messages with the same content

    will result in different output, assuming the IV is different each time. The IV itself is not secret, and in fact is necessary to be provided along with the encrypted data since it’s necessary for decryption.
  28. Asymmetric Encryption Prior to the 1970s the only way to

    use encryption to communicate with someone was if you both had a shared secret; some encryption key that you both know but no one else knows. Securely getting a shared secret to someone presented many challenges because it could not be done electronically.
  29. Whitfield Diffie and Martin Hellman published a concept in 1976,

    of negotiating a secret key over an insecure channel. This is a fascinating method known as Diffie Hellman key exchange and is used to this day.
  30. Ron Rivest, Adi Shamir, and Leonard Adleman at MIT came

    up with what is now known as RSA, a method of asymmetric encryption involving a public key and a private key. This is used to this day for many things including TLS (HTTPS).
  31. Diffie Hellman Let's start with Diffie Hellman, a method for

    secure key exchange. The idea is that two parties can communicate in public, and yet still end up with a shared secret that no one else can guess, even if someone is listening in to the whole exchange.
  32. Diffie Hellman To extend the paint analogy a little more,

    it’s based on the principle that it’s easy to mix two colors together, but difficult to determine what two colors went into the mixture. This is a form of a "trapdoor" function — a mathematical problem that is hard to solve, but given a proposed solution, is easy to verify. Such as a combination lock.
  33. Most asynchronous encryption uses some form of prime number factorization

    as the trapdoor function. It’s easy to multiply two numbers together, but hard to determine which two numbers were multiplied together, given only the product.
  34. Use case: Diffie Hellman For example, the Diffie Hellman algorithm

    is used by SSH. At the beginning of the connection, the two computers establish a shared secret using DH and then use that to derive the key used in symmetric encryption for the rest of the session.
  35. Public Key Encryption RSA is the oldest and most commonly

    used form of public key encryption — a type of asynchronous encryption that uses a public + private key pair. Unlike symmetric encryption, there are two keys. Anything encrypted by the public key can only be decrypted by the private key, and vice versa.
  36. Use Cases This allows you to send a message the

    only the receiver can decrypt. There are several interesting uses, we'll focus on two: • Establishing a shared secret • Signing a message for the public
  37. 1. Key Exchange Establishing a shared secret between two parties,

    over a public communication channel, just like Diffie Hellman. For example, if I know your public key, I can generate a random secret, encrypt it with your public key and send it to you. You can then decrypt it and we have a shared secret. The advantage is that it didn’t require the sort of chatty, back-and-forward communication that Diffie Hellman requires.
  38. The disadvantage of this is that it does not provide

    forward secrecy, something we’ll talk about more soon.
  39. Signing a message with RSA Imagine, I want to send

    you a file, and I want to include a way for you to know that it hasn’t been modified. I can hash the file and provide that hash. But that alone doesn’t really prevent intentional modification by a third party, because the adversary can generate a hash too.
  40. 2. Signing a message I would actually encrypt that hash

    value using my private key. Assuming you know my public key, you can decrypt my hash, compute your own hash of the file and verify the two hashes match. This effectively proves that the file came from me (or someone with my private key) and has not been modified.
  41. The hash encrypted with my private key is effectively the

    signature. But you need to know my public key for this to work.
  42. Getting the Public Key You could get it separately, from

    a trusted third party, but then we have a new problem of securely distributing everyone’s public key. This isn’t really feasible at web scale.
  43. The way that it's actually done is that I send

    you the public key along with the file and the signature.
  44. Anyone could send you a file with a signature and

    a public key, but only I can send you a document with a signature that was generated from my key pair.
  45. You need some way to trust that sender is who

    they say they are. This is essentially a new problem, one of identity and trust.
  46. Certificate Pinning If the two computers are both controlled by

    you, such as servers in different data centers, and you want to be sure no one intercepts the communication, you can use “certificate pinning” which is just pre-loading the “certificate” of the other machine on each machine.
  47. Of course, that doesn’t work for the public, who don’t

    have a database of all the valid certificates of every server on the internet.
  48. 3. Chain of trust The most common approach to this

    is to use PKI — public key infrastructure — to establish a chain of trust. This is an important part of SSL (more accurately TLS) used in HTTPS.
  49. Public Key Infrastructure We establish a chain of trust back

    to a well-known trusted authority, a CA — certificate authority. In the case of the web, there are a set of trusted root CAs. These are entities that are globally recognized and widely trusted, such as letsencrypt.org. The certificate (including public key) of each root CA is pre-installed in your browser or operating system.
  50. For this example, let’s say you are communicating with a

    server on the internet, a server which claims to be that of mybank.com. The server will send you their public key along with some information (such as company name), collectively their certificate, which will be “signed” by a root certificate authority. Example of this process
  51. If the CA that signed the server’s certificate is in

    your list of root CAs on your computer, then you have the public key on file and you can validate that signature, effectively validating that the server is who it claims to be.
  52. RSA vs Diffie Hellman So as you can see above,

    RSA is used to solve the problem of identity, something that Diffie Hellman cannot do. However, DH can do something important that RSA also cannot do, and that brings us to Forward Secrecy.
  53. The Problem When you go to a website that uses

    HTTPS, your browser will receive and validate the server’s public key (part of their Certificate) and then generate a session key. It will encrypt that key with the server’s public key, using RSA, and send it to the server. That’s how the two computers do key exchange. The rest of the session uses standard symmetric encryption based on that session key.
  54. Now remember, an adversary can listen to and record this

    entire communication, because the internet is a public network, but they can’t decrypt the conversation, so it’s meaningless, right?
  55. What if, some time later, the adversary is able to

    breach the server and gain access to the private key? There are many ways this could happen.
  56. Now, every session that was recorded, from every previous communication

    can be completely decrypted. It immediately unlocks all past secrets.
  57. The Solution Remember back to Diffie Hellman. if the two

    parties had been using that method of key exchange, then the secret key NEVER goes across the wire. It’s impossible to determine later.
  58. Perfect Forward Secrecy This is the principle behind PFS —

    perfect forward secrecy. We still use RSA to verify the SSL certificates including public key, but we use DH for the key exchange. This essentially provides the best of both worlds.
  59. Speaking of things you need to enable on your server,

    let's talk about HTTP Strict Transport Security — HSTS.
  60. The Problem This is based on the fact that a

    large portion of your users are going to type www.mybank.com into their browser’s address bar, without explicitly typing “https://”. The browser will default to insecure “http:” and then, hopefully you’ve setup your web server to notice this and immediately issue a redirect to send the browser to the secure version.
  61. The Problem However, what if that initial insecure request was

    intercepted by a malicious party? They can send their own spoofed response, directing the user to https://not-my-bank.com which might trick the user into entering their password. Can we use encryption to solve this problem?
  62. Strict Transport Security Well actually this solution just uses a

    simple HTTP header “Strict-Transport-Security” which tells the web browser to never take the user to the insecure version and always go directly to the “https:” version of the site. This is a simple thing you can do as a developer or server admin to make everyone more secure.
  63. DNS DNS requests go over completely unencrypted channels. Any network-level

    attacker can spoof a name request and send back an IP address to a malicious server instead. Since DNS is so insecure, this is also used to censor the internet. In the UK for example, ISPs are required to block certain websites at the DNS level.
  64. DNS over HTTPS There is a protocol called DNS over

    HTTPS or DoH which will make sure that all name lookups happen across a secure channel. The good folks at Mozilla, Cloudflare and others are working to bring this to you this year.
  65. It first appeared in Firefox nightly and is expected to

    land in Firefox stable soon. Other browser makers, including Chrome, are putting it on their roadmap too.
  66. But we still have so much left to encrypt. One

    example is email, importantly, verifying the sender of an email.
  67. DKIM: DomainKeys Identified Mail This is a way to set

    a public key on the DNS for your domain, saying that any email sender that claims to be sending from “[email protected]” needs to sign the email with the private key that matches this public key.
  68. This is something you can and should do today. Almost

    every major email provider supports this.
  69. Importantly, DKIM does not encrypt the body of the message,

    but it does verify the integrity of the sender and that’s a good start to getting strong encryption everywhere.
  70. • Sending information unencrypted, through insecure channels • Passwords saved

    with reversible encryption • Passwords hashed without using salt • Using weak ciphers (encryption algorithms) or hashes ◦ In 2006, using SHA-1 was perfectly acceptable, today it can be cracked easily. • Using a strong cipher but with a leaky “block mode” Common Mistakes
  71. • Storing the key with the data • Putting the

    key in the source code or config file • Not verifying authenticity of a remote party • Poor random number generation • Creating a hash signature that is vulnerable to a length extension attack • "I only have a simple blog site" • 1024-bit RSA keys Common Mistakes
  72. There's a lot you can do. Understanding the underpinnings of

    encryption. Stay up to date on what is considered to be weak or strong in terms of cryptography. Think like an attacker. Where is the weak link in your encryption?