Cloud Security, for Real This Time Homomorphic Encryption and the Future of Data Privacy Will explain 1) Def 2) Why important 3) Real world 4) Related work Also: CS research is light into the future, not just ivory tower abstract.

I grew up in a kind of naive era. We thought strong encryption would save the world from tyrrany. PGP! Crypto wars! Clipper chips! Encryption as fashion choice!

Freedom Is Hard; Let’s Go Shopping! Remember 1995? Strong crypto didn’t end tyranny, but it did make shopping easier. Define SSL/TLS? Changed everything.

Browser Server Application TLS: Safe (mostly!), but must decrypt to do business TLS gives you 1) Some assurance you’re connecting to the right server (unless you own a Lenovo), 2) some protection from MITM Good enough for shopping?

Crypto Didn’t Fix the World • Data stolen from merchants • Password reuse • Metadata • Social engineering • Side channels • Implementation errors • GPG too hard to use We tried. But strong crypto didn’t fix all our problems. Better than nothing, but can we do better?

My New Business New business idea. Maybe you want to invest? Ask for income, SSNs of your children, what you spend on health care, bank account passwords, etc., give you (click) pretty charts. Or what if I want to prepare your taxes. I’ll need all your info, and I can ask for real money from state government on your behalf. Good idea?

Uh Oh. Is it even possible to build this kind of business? Home Depot did a lot wrong, sure, but banks who ran pretty clean shops have also suffered major data exfiltration. Need a way out.

Symmetry Consumer Protect PII Zero Install Cloud Service Provider Nothing to Steal Frequent Site Visits Look at what customer wants, you want. Note symmetry. Customer desires may be a bit contradictory, but line up nicely with service provider desires. Symmetry in software = Opportunity!

What if? How can I prepare your taxes without asking for your data, at least not in readable form? You could encrypt and not give me the key, but then how do I perform useful computations?

Homomorphic Encryption In a Nutshell Client Server Data Cyphertext Result Cyphertext Computation Data Plaintext Result Plaintext Define plaintext, cyphertext, computation. Cyphertext should be indistinguishable from random bits(hand waving) Secure! No key exchange (hard)! Keys stay on client. Can anyone spot the problem with this solution? Considered maybe impossible for a long time. Changed in 2009. How? Stop me now if terms don’t make sense.

Awesoma Powa! Plaintext top row. Cyphertext middle. Note symmetries. I can do a homomorphic concatenation on cyphertext! Homomorphic operations don’t have to be the same as corresponding non-homomorphic operation, but in this case it is. We’ll look at stronger choices later, but first…

Let’s launch a startup! concatenatr Join us! New business: Cloud-based, privacy preserving concatenation of strings. Get the VC $$$$, foosball table… But there’s a problem with this idea. (Not because it’s insecure. Nobody cares about that, or they wouldn’t use SnapChat.) Why won’t this work? You’ll never guess…

(Using Goldwasser and Micali’s algorithm developed 20 years earlier) Stupidly enough, it’s patented (by SAP). Cryptographers have been working on HE for a long time. Goldwasser and Micali won Turing award, but for semantic security, not HE. Chose concat example as simple/joke, found the patent later. Security industry may or may not have noticed HE, but patent lawyers have!

Unpadded RSA Back to drawing board. Need a different algorithm. NB: Unpadded RSA is insecure! Simple, but insecure. Cryptosystem security is an end to end pipeline, not a single algorithm. Feel free to ignore the algebra, point is

Pivot! multiplir We make products Awesome! Now add. Uhhh…. Cloud-based, privacy preserving multiplication. Get the VC $$$, front page of Hacker News, then… Click Click. Can we do better? What do we really need?

Fully Homomorphic Encryption • Multiply • Add, subtract, exponents, etc. • Doesn’t have to be (quite) Turing complete • Conditional branching and loops, but… • Cannot perform conditional jumps based on (encrypted) user input What are the operations I really need? Must be able to write any program, but not necessarily execute arbitrary programs. Customer and service provider agree on service in advance. Can do taxes, but not your homework. What operations give me all of the above? (Cannot perform conditional…) => Branch prediction won’t work!

Functional Completeness and Universal Gates • NAND • NOR • AND and NOT • XOR and AND Need a new kind of computer. Want to compute anything, not just *! Let’s start from the basics. Logic gates! If we have homomorphic logic gates we can do what we need. Homomorphic * insufficient. What gates do I need to perform any computation? Homomorphic NAND would be OK. Define NOR. NOR via NANDS. De Morgan’s Laws. What does any of this mean?

Addition, Multiplication Over GF(2) + 0 1 0 0 1 1 1 0 * 0 1 0 0 0 1 0 1 Adding + multiplying a bit very simple. So are computers. Need building blocks which can work homomorphically but be built into anything we need. Start with bits. + looks like XOR. * looks like AND. All I need! Can grow from there.

> def choose(first, second, choose_first): .. return first if choose_first else second .. > choose(True, False, True) => True > choose(True, False, False) => False ﬁrst choose_ﬁrst second Branching hard, but: Here’s a program I wrote. Normal computers eval condition, execute selected path. …so if I have a homomorphic and, or, and not… or just nand, now I can write logic. Branching becomes a truth table. click. As a circuit. Circuits easy.

> def my_factorial(n): .. result = 1 .. while n > 1: .. result *= n .. n -= 1 .. return result > def my_factorial_less_than_20(n): .. result = 1; .. for i in range(2, 20): .. result *= 1 if i > n else i .. return result > my_factorial_less_than_20(4) => 24 > my_factorial_less_than_20(100) => 121645100408832000L > my_factorial_less_than_20(1000) => 121645100408832000L Here’s another program I wrote. Explain factorial. Click. Here’s a really strange version. Why? Note n Program has interesting properties. Bounded loops are decidable! Security vs. efficiency.

Input Data Cyphertext Add (Lossless) Multiply (Lossy) Bootstrappable Reencryption Result Cyphertext Multiply (Lossy) Found strong encryption scheme. Not perfect; has homomorphic + and lossy homomorphic *. Too many *s and can’t decrypt. We will look at bootstrapping in more detail on next slide Explain lossy multiplication here.

E(E(E(plaintext), key), key2), key 3 E(E(plaintext), key), key2 E(plaintext) Plaintext Bootstrappable Encryption Every time you decrypt, you “reset” errors. Only a student with a thesis deadline could have thought of this. Works, but inefficient in time and space. Maybe work around? PKE is slow, but combine with SE for performance.

CryptDB ❖ Query-based encryption ❖ Requires no changes to DB server ❖ Tested on phpBB, OpenEMR, TPC-C, etc. ❖ Only 14-26% slower than unmodiﬁed apps. http://css.csail.mit.edu/cryptdb/ Practical?

Zero Knowledge Proof Image: Wikimedia Commons / User:Dake Applications! I want to talk about 2 party secure computation, but… It’s often the case you want to talk about f(alice_value, bob_value) without revealing either arg. ZKPs do exist, but can be tricky.

2 Party Secure Computation Sends c = E(x) to Bob Computes and sends c’ = E(f(x,y)), ZKP of c’ correctness to Alice Decrypt c’, compute ZKP of valid decryption, and return both to Bob HELLO My Name Is Alice HELLO My Name Is Bob Want to compute f(aliceData, bobData). How does Alice know Bob used correct input? How does Bob know Alice didn’t lie about result?

http://www.internetsociety.org/sites/default/ﬁles/04_1_2.pdf Read this just today. Most work in ML is making model. Also, server and client both have private data. 500* faster than “generic” two party computation tools.

Limitations ! Server doesn’t have data to, e.g. hand off to third parties ! All “new” cryptosystems are relatively untested and security not proven. ! Space issues ! Often computationally expensive ! Client complexity and deployment ! Not always clear when to choose fully homomorphic algorithms. ! Not a cure-all. Metadata and side-channels still a problem ! Moving target! ! Patent encumbered “New” -> (Both in terms of algorithms and implementation.)

Patent Encumbrance • “Nevertheless, the authors of this method to concede that making this scheme practical remains an open problem.” • “There exist well known solutions for secure computation of any function… It seems hard to apply these methods to complete continuous functions or represent Real numbers, since the methods inherently work over finite fields.” • “An encryption scheme with these two properties is called a homomorphic encryption scheme. The Paillier system is one homomorphic encryption scheme, but more ones [sic] exist.” Hand-waving which wouldn’t be allowed in a freshman term paper

Gratitude • Computing Arbitrary Functions of Encrypted Data, by Craig Gentry. Communications of the ACM, Vol. 53, No.3 • Building the Swiss Army Knife, by Boaz Barak and Zvika Brakerski • HElib (source code) • CryptDB: Processing Queries on an Encrypted Database, by Raluca Ada Popa, Catherine M.S. Redfield, Nickolai Zeldovich, and Hari Balakrishnan