Not screwing up Encryption as a Developer

Slide 1

Slide 1 text

Slide 2

Slide 2 text

I'm Simon I'm a developer. I like building stuﬀ and learning stuﬀ. Find me on Twitter: @sstur_

Slide 3

Slide 3 text

Encryption Simply put, encryption is the process of encoding information in such a way that only authorized parties can access it. Encryption is part of the broader topic of cryptography. This talk is about how cryptography relates to you as a developer.

Slide 4

Slide 4 text

Why would a developer need to understand this? Software is about processing information. People and businesses trust your software with their information. In many ways, it falls on you as a software developer to understand how to keep information safe.

Slide 5

Slide 5 text

Encryption is one important way to keep information safe from attackers and unauthorized parties.

Slide 6

Slide 6 text

● Communicating without letting unauthorized parties read it. ● Verifying the integrity of a message or ﬁle — being sure it has not been modiﬁed. ● Storing data in a way that only you can later read it. Uses of Cryptography ● Validating that the party you are communicating with is indeed who they claim to be. ● Saving passwords in a way that cannot be reversed to see the original password ● Generating unique IDs, such as in Git (...)

Slide 7

Slide 7 text

Security Fail There are many ways, as developers, we fail at security, such as: ● forgetting to password protect a database ● exposing private data through a web interface or API ● writing software with vulnerabilities that allow an attacker to run code on your server ● publishing sensitive information to a public repository

Slide 8

Slide 8 text

Crypto Fail One category of ways developers fail at security is around cryptography. ● Sending information unencrypted, through insecure channels ● Using weak encryption ● Not verifying authenticity of a remote party ● Storing sensitive info like passwords in insecure ways

Slide 9

Slide 9 text

Let's talk In order to understand how to do this correctly as a developer, we should ﬁrst talk about: 1. What problems we are trying to solve 2. The diﬀerent kinds of encryption 3. How they work together to solve such problems

Slide 10

Slide 10 text

Cryptography can help us ● Securely communicate with someone over an insecure channel ● Securely storing data so that only us (or someone we trust) can retrieve it ● Trust: Verify that an unknown party is who they claim to be ● Data integrity: Verify that some data sent from that party has not been modiﬁed by someone ● Key derivation: Creating an encryption key from a text such as a password

Slide 11

Slide 11 text

Types of cryptography There are 3 important types of cryptography we’ll explore today: ● Hashing ● Symmetric Encryption ● Asymmetric Encryption, including: ○ key exchange ○ public key cryptography

Slide 12

Slide 12 text

Hashing

Slide 13

Slide 13 text

Hashing ● Deterministic: The same input will always generate the same output ● The input cannot be determined from the input. It is “one way” and as such, information is lost in the process, making it impossible to reconstruct the original. A hash is a sort of one-way message digest. It takes some data of arbitrary length as input (e.g. a password, or an entire ﬁle) and generates a ﬁxed length output (the digest or hash) that “represents” the input in a way that follows these two principles..

Slide 14

Slide 14 text

Collisions Since the output (hash) is a fixed size which may be fewer bytes than the input, mathematically it cannot represent as much data, meaning that there are less possible hashes than there are different inputs. Thus, it’s possible that two different inputs result in the same hash value. This is called a collision.

Slide 15

Slide 15 text

Avoiding Collisions A good hash algorithm minimizes the probability of collisions occurring in the wild. Take the SHA-256 hashing algorithm for example (the one used in Bitcoin mining). No two inputs have ever been discovered that produce the same output. The possibility of ﬁnding one in our lifetime is incredibly small.

Slide 16

Slide 16 text

Hashing is often used as a “checksum” which can verify the integrity of some data, for example to make sure a network error didn’t cause data corruption during transmission. Uses of Hashing

Slide 17

Slide 17 text

Uses of Hashing Another use case is to sign a message using a secret that only you know. If you later receive that message back (e.g. session cookie) then you know that it was not modiﬁed. One way to do this is using HMAC — Hash-based Message Authentication code.

Slide 18

Slide 18 text

HMAC

Slide 19

Slide 19 text

HMAC If you and I both know the secret, then I can send you a signed message and you can verify that it was indeed from me, and not modiﬁed by someone intercepting the message. But why is it hashed twice?

Slide 20

Slide 20 text

There’s a weakness of some hashes that allows a length extension attack. When a Merkle–Damgård based hash is misused as a message authentication code with construction H(secret + message), and message and the length of secret is known, a length extension attack allows anyone to include extra information at the end of the message and produce a valid hash without knowing the secret. Length Extension Attack

Slide 21

Slide 21 text

But regardless of good hashing practice, this does not hide the contents of the message from anyone. For that we need to use encryption.

Slide 22

Slide 22 text

Symmetric Encryption

Slide 23

Slide 23 text

Symmetric Encryption Going far back in history, it's common to need to encode a message so that only the intended recipient can understand it. In the very old days, the "method of encryption" was the secret. In modern encryption, the method of encryption (cipher) is public but the key is secret.

Slide 24

Slide 24 text

Substitution Cipher The simplest form of using a key to encrypt a message is a "substitution cipher" in which each letter is replaced by a diﬀerent letter. In such a system, the "key" is the list of substitutions, so the recipient can swap back in the original letters.

Slide 25

Slide 25 text

Substitution Cipher

Slide 26

Slide 26 text

Substitution Cipher However, this is easily broken, even without a computer. Anyone know how?

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

A Stronger Substitution Cipher One way to make that stronger is to change the character mapping after each character processed. It would be changed according to some predetermined method.

Slide 29

Slide 29 text

This is how the Enigma machine worked in WW2. The way in which the mapping changed each keypress was eﬀectively the key. Enigma

Slide 30

Slide 30 text

Distributing they Key These keys were distributed manually, on paper and the machine needed to be re-conﬁgured with each new key.

Slide 31

Slide 31 text

This brings us to the next important part of encryption, key exchange.

Slide 32

Slide 32 text

Key Exchange In symmetric ciphers, like the Enigma, or even modern day ones such as AES, both parties need to have a secret key. Getting that key from one person to another was historically done physically. This can present a huge challenge — More on this later.

Slide 33

Slide 33 text

Modern symmetric algorithms, including AES, encrypt data in blocks. Data is divided up into equal size blocks (the block size of the algorithm), padded if necessary, and each block is encrypted individually. Blocks of Data

Slide 34

Slide 34 text

If you simply use the key to encrypt each block, that simple approach is “ECB” which refers to the block mode of the algorithm. The problem is that encryption is deterministic for a given key, meaning that two identical input blocks will result in identical output blocks ... over many blocks, a pattern can emerge. Block Mode: ECB

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

The solution is to use a block mode other than the naïve “ECB” mode.

Slide 37

Slide 37 text

Block Mode: CBC CBC mode, for example, uses a randomly generated IV — initialization vector — as a sort of salt to obfuscate the ﬁrst block before encryption. Then each subsequent block derives its “salt” from the encrypted representation of the previous block.

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Block Mode: CBC Thus, two messages with the same content will result in diﬀerent output, assuming the IV is diﬀerent each time. The IV itself is not secret, and in fact is necessary to be provided along with the encrypted data since it’s necessary for decryption.

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

Asymmetric Encryption

Slide 42

Slide 42 text

Asymmetric Encryption Prior to the 1970s the only way to use encryption to communicate with someone was if you both had a shared secret; some encryption key that you both know but no one else knows. Securely getting a shared secret to someone presented many challenges because it could not be done electronically.

Slide 43

Slide 43 text

But then, in the 1970’s we got Public Key Cryptography in two important forms

Slide 44

Slide 44 text

Whitfield Diffie and Martin Hellman published a concept in 1976, of negotiating a secret key over an insecure channel. This is a fascinating method known as Diffie Hellman key exchange and is used to this day.

Slide 45

Slide 45 text

Ron Rivest, Adi Shamir, and Leonard Adleman at MIT came up with what is now known as RSA, a method of asymmetric encryption involving a public key and a private key. This is used to this day for many things including TLS (HTTPS).

Slide 46

Slide 46 text

Diﬃe Hellman Let's start with Diﬃe Hellman, a method for secure key exchange. The idea is that two parties can communicate in public, and yet still end up with a shared secret that no one else can guess, even if someone is listening in to the whole exchange.

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Diﬃe Hellman To extend the paint analogy a little more, it’s based on the principle that it’s easy to mix two colors together, but diﬃcult to determine what two colors went into the mixture. This is a form of a "trapdoor" function — a mathematical problem that is hard to solve, but given a proposed solution, is easy to verify. Such as a combination lock.

Slide 49

Slide 49 text

Most asynchronous encryption uses some form of prime number factorization as the trapdoor function. It’s easy to multiply two numbers together, but hard to determine which two numbers were multiplied together, given only the product.

Slide 50

Slide 50 text

Use case: Diﬃe Hellman For example, the Diﬃe Hellman algorithm is used by SSH. At the beginning of the connection, the two computers establish a shared secret using DH and then use that to derive the key used in symmetric encryption for the rest of the session.

Slide 51

Slide 51 text

Public Key Encryption RSA is the oldest and most commonly used form of public key encryption — a type of asynchronous encryption that uses a public + private key pair. Unlike symmetric encryption, there are two keys. Anything encrypted by the public key can only be decrypted by the private key, and vice versa.

Slide 52

Slide 52 text

Use Cases This allows you to send a message the only the receiver can decrypt. There are several interesting uses, we'll focus on two: ● Establishing a shared secret ● Signing a message for the public

Slide 53

Slide 53 text

1. Key Exchange Establishing a shared secret between two parties, over a public communication channel, just like Diﬃe Hellman. For example, if I know your public key, I can generate a random secret, encrypt it with your public key and send it to you. You can then decrypt it and we have a shared secret. The advantage is that it didn’t require the sort of chatty, back-and-forward communication that Diﬃe Hellman requires.

Slide 54

Slide 54 text

The disadvantage of this is that it does not provide forward secrecy, something we’ll talk about more soon.

Slide 55

Slide 55 text

Signing a message with RSA Imagine, I want to send you a file, and I want to include a way for you to know that it hasn’t been modified. I can hash the file and provide that hash. But that alone doesn’t really prevent intentional modification by a third party, because the adversary can generate a hash too.

Slide 56

Slide 56 text

2. Signing a message I would actually encrypt that hash value using my private key. Assuming you know my public key, you can decrypt my hash, compute your own hash of the file and verify the two hashes match. This effectively proves that the file came from me (or someone with my private key) and has not been modified.

Slide 57

Slide 57 text

The hash encrypted with my private key is eﬀectively the signature. But you need to know my public key for this to work.

Slide 58

Slide 58 text

Getting the Public Key You could get it separately, from a trusted third party, but then we have a new problem of securely distributing everyone’s public key. This isn’t really feasible at web scale.

Slide 59

Slide 59 text

The way that it's actually done is that I send you the public key along with the ﬁle and the signature.

Slide 60

Slide 60 text

Anyone could send you a ﬁle with a signature and a public key, but only I can send you a document with a signature that was generated from my key pair.

Slide 61

Slide 61 text

So how do you know it’s actually my public key?

Slide 62

Slide 62 text

You need some way to trust that sender is who they say they are. This is essentially a new problem, one of identity and trust.

Slide 63

Slide 63 text

Certificate Pinning If the two computers are both controlled by you, such as servers in different data centers, and you want to be sure no one intercepts the communication, you can use “certificate pinning” which is just pre-loading the “certificate” of the other machine on each machine.

Slide 64

Slide 64 text

Of course, that doesn’t work for the public, who don’t have a database of all the valid certiﬁcates of every server on the internet.

Slide 65

Slide 65 text

3. Chain of trust The most common approach to this is to use PKI — public key infrastructure — to establish a chain of trust. This is an important part of SSL (more accurately TLS) used in HTTPS.

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

Public Key Infrastructure We establish a chain of trust back to a well-known trusted authority, a CA — certiﬁcate authority. In the case of the web, there are a set of trusted root CAs. These are entities that are globally recognized and widely trusted, such as letsencrypt.org. The certiﬁcate (including public key) of each root CA is pre-installed in your browser or operating system.

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

For this example, let’s say you are communicating with a server on the internet, a server which claims to be that of mybank.com. The server will send you their public key along with some information (such as company name), collectively their certiﬁcate, which will be “signed” by a root certiﬁcate authority. Example of this process

Slide 70

Slide 70 text

If the CA that signed the server’s certificate is in your list of root CAs on your computer, then you have the public key on file and you can validate that signature, effectively validating that the server is who it claims to be.

Slide 71

Slide 71 text

RSA vs Diﬃe Hellman So as you can see above, RSA is used to solve the problem of identity, something that Diﬃe Hellman cannot do. However, DH can do something important that RSA also cannot do, and that brings us to Forward Secrecy.

Slide 72

Slide 72 text

The Problem When you go to a website that uses HTTPS, your browser will receive and validate the server’s public key (part of their Certiﬁcate) and then generate a session key. It will encrypt that key with the server’s public key, using RSA, and send it to the server. That’s how the two computers do key exchange. The rest of the session uses standard symmetric encryption based on that session key.

Slide 73

Slide 73 text

Now remember, an adversary can listen to and record this entire communication, because the internet is a public network, but they can’t decrypt the conversation, so it’s meaningless, right?

Slide 74

Slide 74 text

What if, some time later, the adversary is able to breach the server and gain access to the private key? There are many ways this could happen.

Slide 75

Slide 75 text

Now, every session that was recorded, from every previous communication can be completely decrypted. It immediately unlocks all past secrets.

Slide 76

Slide 76 text

The Solution Remember back to Diﬃe Hellman. if the two parties had been using that method of key exchange, then the secret key NEVER goes across the wire. It’s impossible to determine later.

Slide 77

Slide 77 text

Perfect Forward Secrecy This is the principle behind PFS — perfect forward secrecy. We still use RSA to verify the SSL certiﬁcates including public key, but we use DH for the key exchange. This essentially provides the best of both worlds.

Slide 78

Slide 78 text

But you need to enable PFS on your server.

Slide 79

Slide 79 text

Speaking of things you need to enable on your server, let's talk about HTTP Strict Transport Security — HSTS.

Slide 80

Slide 80 text

The Problem This is based on the fact that a large portion of your users are going to type www.mybank.com into their browser’s address bar, without explicitly typing “https://”. The browser will default to insecure “http:” and then, hopefully you’ve setup your web server to notice this and immediately issue a redirect to send the browser to the secure version.

Slide 81

Slide 81 text

The Problem However, what if that initial insecure request was intercepted by a malicious party? They can send their own spoofed response, directing the user to https://not-my-bank.com which might trick the user into entering their password. Can we use encryption to solve this problem?

Slide 82

Slide 82 text

Strict Transport Security Well actually this solution just uses a simple HTTP header “Strict-Transport-Security” which tells the web browser to never take the user to the insecure version and always go directly to the “https:” version of the site. This is a simple thing you can do as a developer or server admin to make everyone more secure.

Slide 83

Slide 83 text

OK, but what about the DNS request, that’s still the weak link in the chain, right?

Slide 84

Slide 84 text

DNS DNS requests go over completely unencrypted channels. Any network-level attacker can spoof a name request and send back an IP address to a malicious server instead. Since DNS is so insecure, this is also used to censor the internet. In the UK for example, ISPs are required to block certain websites at the DNS level.

Slide 85

Slide 85 text

The way to solve this, of course, is with encryption!

Slide 86

Slide 86 text

DNS over HTTPS There is a protocol called DNS over HTTPS or DoH which will make sure that all name lookups happen across a secure channel. The good folks at Mozilla, Cloudﬂare and others are working to bring this to you this year.

Slide 87

Slide 87 text

It ﬁrst appeared in Firefox nightly and is expected to land in Firefox stable soon. Other browser makers, including Chrome, are putting it on their roadmap too.

Slide 88

Slide 88 text

But we still have so much left to encrypt. One example is email, importantly, verifying the sender of an email.

Slide 89

Slide 89 text

DKIM: DomainKeys Identiﬁed Mail This is a way to set a public key on the DNS for your domain, saying that any email sender that claims to be sending from “[email protected]” needs to sign the email with the private key that matches this public key.

Slide 90

Slide 90 text

This is something you can and should do today. Almost every major email provider supports this.

Slide 91

Slide 91 text

Importantly, DKIM does not encrypt the body of the message, but it does verify the integrity of the sender and that’s a good start to getting strong encryption everywhere.

Slide 92

Slide 92 text

So what are common mistakes that developers make?

Slide 93

Slide 93 text

● Sending information unencrypted, through insecure channels ● Passwords saved with reversible encryption ● Passwords hashed without using salt ● Using weak ciphers (encryption algorithms) or hashes ○ In 2006, using SHA-1 was perfectly acceptable, today it can be cracked easily. ● Using a strong cipher but with a leaky “block mode” Common Mistakes

Slide 94

Slide 94 text

● Storing the key with the data ● Putting the key in the source code or conﬁg ﬁle ● Not verifying authenticity of a remote party ● Poor random number generation ● Creating a hash signature that is vulnerable to a length extension attack ● "I only have a simple blog site" ● 1024-bit RSA keys Common Mistakes

Slide 95

Slide 95 text

There's a lot you can do. Understanding the underpinnings of encryption. Stay up to date on what is considered to be weak or strong in terms of cryptography. Think like an attacker. Where is the weak link in your encryption?