David Evans
June 25, 2016
2.8k

# Memory for Data-Oblivious Computation

June 25, 2016

## Transcript

3. ### Theory and Practice in Computing Quotes from Maurice Wilkes’s Turing

Award Lecture (1967) Alan Turing
4. ### Secure Two-Party Computation Alice Bob r = f(a, b) a

b r = f(a, b) Cryptographic Protocol learns nothing about b learns nothing about a
5. ### FOCS 1982 FOCS 1986 Note: neither paper actually describes Yao’s

protocol. Andrew Yao
6. ### Yao’s Protocol: Garbled Circuits Function expressed as a Boolean Circuit

Garbled evaluation: no information leaked Ridiculously expensive (but 1012 cheaper than 10 years ago) Garble Encode Evaluate Decode f garbled circuit F Y a b Generator Evaluator

8. ### Alice Bob Colleen University Rankings A C B A B

C C B A Student Preferences
9. ### Stable Matching Alice Bob Colleen ACB ABC CBA M =

{ (s1 , r1 ), (s2 , r2 ), … } is a stable matching if there is no pair (si , rj ) where both si and rj prefer this match over the given match

12. ### Stable Matching Applications Public schools in New York, Boston Singapore

University Admissions Medical residents in US, Canada, others 35,000 applicants Use Trusted Third Party to run matching algorithm: - Receives all private rankings and keeps confidential - Produces correct result - uncorrupted
13. ### Secure Two-Party Stable Matching Protocol Each group trusts one representative

Doug Tsinghua
14. ### Secure Two-Party Stable Matching Protocol Each group trusts one representative

XOR-share to 2 non-colluding parties Doug S T Tsinghua

17. ### Data-dependent lookup in size-n array Oblivious conditionals: need to always

execute all paths Data-dependent updates to size-n2 array

data
19. ### Circuit for Array Update 19 i == 0 a[0] x

a'[0] Linear Scan: need to touch every array element to hide which one is real i == 1 a[1] x a'[1] i == 2 a[2] x a'[2] i == 3 a[3] x a'[3] …
20. ### Linear Scan Doesn’t Scale Writing a single 32-bit integer: 32

logic gates Raw Yao’s performance ≈ 3M gates per second Write speed ≈ 100,000 elements per second (not hiding access pattern) For hiding access pattern, N = 217 elements requires > 1 second per access
21. ### Traditional ORAM Client Untrusted Server [Goldreich 1987] Security property: all

initialization and access sequences of the same length are indistinguishable to server. Sublinear client-side state Linear server-side encrypted state Initialize Access
22. ### RAM-SC [Gordon, Katz, Kolesnikov, Krell, Malkin, Raykova, Vahlis2012] Alice Bob

MPC Protocol Public ORAM state Public ORAM state Encrypted Results Oblivious ORAM state Initialize Access Encrypted ORAM Data
23. ### Circuit ORAM Access time Xiao Wang, Hubert Chan, and Elaine

Shi. Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound. In ACM CCS 2015. State-of-the- ORAM-Art in 2015 Θ log3 Linear scan

25. ### Problems with SQ-ORAM Design Requires a PRF for each ORAM

access Pseudo-random function: a big circuit in MPC Initialization requires PRF evaluations Requires oblivious sort twice: Shuffling memory according to PRF Removing dummy blocks Solution strategy: use random permutation instead of PRF

28. ### 4-Block ORAM Cost: 5B + B +2 B +3 B

+ … = 11B every 3 accesses
29. ### Linear scan Cost: 4B = 12B/3 Our scheme Cost: 11B/3

Less expensive than linear scan for 4 blocks (8 with overhead)

39. ### Second Access read a[9] Randomly select unused element Randomly select

unused element
40. ### Second Access read a[9] Randomly select unused element Randomly select

unused element
41. ### Second Access read a[9] Randomly select unused element Randomly select

unused element

43. ### Position map 3 0 2 1 0 1 2 3

1 3 0 2 0 1 2 3

46. ### Inverse permutation , , ⋅ / = , ⋅ Alice

picks a random masking permutation Composed permutation revealed to Bob
47. ### Inverse permutation , Bob computes / 12 = 12 ⋅,

12 , / 12 ⋅ , = 12 ⋅ , 12 ⋅ , = 12 / = , ⋅ / 12
48. ### Scheme 1. Shuffle elements 2. Recreate position map 3. Service

= log accesses Amortized cost: Θ log7 per access

50. ### 16-byte blocks 32-byte blocks Pre-Access Cost (not counting initialization) Per-access

cost Good enough for National Residency Match?
51. ### Cost of Matching Best previous result: 128x128 pairs in >

1000 hours [Keller & Scholl 2014] Using Square-Root ORAM: 512x512 pairs in 33 hours Scale needed for national residency match: 35,000 Need 1000x improvement…
52. ### Scaling to National Match Roth-Peranson: asymmetric matchings – Algorithm that

is actually used for NRMP, school matchings, etc. Initialize state by permuting and interleaving Take advantage of data-independent memory patterns: locality, batching, partitioning

54. ### Phase Time Non-Free Gates Gates/second Initialization 2.07 hours 34 B

4.57 M Bidding 15.01 hours 173 B 3.19 M Total 17.08 hours 207 B 3.36 M Simulated 2016 US National Medical Residency Match: 35,476 prospective residents matching with 4836 programs with 30,750 total slots Running between 2 EC2.c4xlarge nodes in same region (1 Gbps)