Memory for
Data-Oblivious
Computation
David Evans
University of Virginia
oblivc.org
Slide 2
Slide 2 text
Memory for
Data-Oblivious Computation
David Evans
University of Virginia
www.cs.virginia.edu/evans
Slide 3
Slide 3 text
Theory and Practice in Computing
Quotes from Maurice Wilkes’s Turing Award Lecture (1967)
Alan Turing
Slide 4
Slide 4 text
Secure Two-Party Computation
Alice Bob
r = f(a, b)
a b
r = f(a, b)
Cryptographic Protocol
learns nothing about b learns nothing about a
Slide 5
Slide 5 text
FOCS 1982
FOCS 1986
Note: neither paper actually
describes Yao’s protocol.
Andrew Yao
Slide 6
Slide 6 text
Yao’s Protocol: Garbled Circuits
Function expressed as a Boolean
Circuit
Garbled evaluation: no
information leaked
Ridiculously expensive (but 1012
cheaper than 10 years ago)
Garble
Encode
Evaluate
Decode
f
garbled circuit F
Y
a b
Generator Evaluator
Slide 7
Slide 7 text
Motivating Application:
Secure Stable Matching
Slide 8
Slide 8 text
Alice
Bob
Colleen
University Rankings
A
C
B
A
B
C
C
B
A
Student Preferences
Slide 9
Slide 9 text
Stable Matching
Alice
Bob
Colleen
ACB
ABC
CBA
M = { (s1
, r1
), (s2
, r2
), … }
is a stable matching if there is
no pair (si
, rj
) where both si
and
rj
prefer this match over the
given match
Slide 10
Slide 10 text
Gale-Shapley Algorithm
Lloyd Shapley (1923-2016)
accepting Nobel Prize (2012)
Slide 11
Slide 11 text
Stable Matching Applications
Public schools in New York, Boston
Singapore University Admissions
Medical residents in
US, Canada, others
35,000 applicants
Slide 12
Slide 12 text
Stable Matching Applications
Public schools in New York, Boston
Singapore University Admissions
Medical residents in
US, Canada, others
35,000 applicants
Use Trusted Third Party to run matching algorithm:
- Receives all private rankings and keeps confidential
- Produces correct result - uncorrupted
Slide 13
Slide 13 text
Secure Two-Party Stable Matching Protocol
Each group trusts one representative
Doug Tsinghua
Slide 14
Slide 14 text
Secure Two-Party Stable Matching Protocol
Each group trusts one representative XOR-share to 2 non-colluding parties
Doug
S
T
Tsinghua
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
Data-dependent
lookup in size-n array
Slide 17
Slide 17 text
Data-dependent
lookup in size-n array
Data-dependent
updates to size-n2 array
Slide 18
Slide 18 text
Data-dependent
lookup in size-n array
Oblivious conditionals:
need to always execute
all paths
Data-dependent
updates to size-n2 array
Slide 19
Slide 19 text
Data-Oblivious Array Access
18
a[i] = x
Depends on private data
Slide 20
Slide 20 text
Circuit for Array Update
19
i == 0
a[0] x
a'[0]
Linear Scan: need to touch every array element to hide which one is real
i == 1
a[1] x
a'[1]
i == 2
a[2] x
a'[2]
i == 3
a[3] x
a'[3]
…
Slide 21
Slide 21 text
Linear Scan Doesn’t Scale
Writing a single 32-bit integer: 32 logic gates
Raw Yao’s performance ≈ 3M gates per second
Write speed ≈ 100,000 elements per second
(not hiding access pattern) For hiding access pattern,
N = 217 elements requires
> 1 second per access
Slide 22
Slide 22 text
Traditional ORAM
Client Untrusted Server
[Goldreich 1987]
Security property: all initialization and access sequences of the
same length are indistinguishable to server.
Sublinear
client-side
state
Linear
server-side
encrypted
state
Initialize
Access
Slide 23
Slide 23 text
RAM-SC
[Gordon, Katz, Kolesnikov, Krell, Malkin, Raykova, Vahlis2012]
Alice Bob
MPC Protocol
Public
ORAM
state
Public
ORAM
state
Encrypted
Results
Oblivious
ORAM
state
Initialize
Access
Encrypted
ORAM
Data
Slide 24
Slide 24 text
Circuit ORAM
Access time
Xiao Wang, Hubert Chan,
and Elaine Shi. Circuit
ORAM: On Tightness of the
Goldreich-Ostrovsky Lower
Bound. In ACM CCS 2015.
State-of-the-
ORAM-Art in
2015
Θ log3
Linear
scan
Slide 25
Slide 25 text
Classical Square-Root ORAM
[Ostrovsky and Goldreich, 1992]
Slide 26
Slide 26 text
Problems with SQ-ORAM Design
Requires a PRF for each ORAM access
Pseudo-random function: a big circuit in MPC
Initialization requires PRF evaluations
Requires oblivious sort twice:
Shuffling memory according to PRF
Removing dummy blocks
Solution strategy: use random
permutation instead of PRF
Slide 27
Slide 27 text
Shuffling Network [Waksman 1968]
Cost per shuffle: 5B
Slide 28
Slide 28 text
4-Block ORAM
Slide 29
Slide 29 text
4-Block ORAM
Cost:
5B
+
B
+2
B
+3
B
+ …
= 11B every 3 accesses
Slide 30
Slide 30 text
Linear scan
Cost: 4B = 12B/3
Our scheme
Cost: 11B/3
Less expensive than linear scan for 4 blocks (8 with overhead)
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
Logical index/4
Logical index/2
Slide 33
Slide 33 text
Logical index/4
Logical index/2
read a[8]
First Access
Slide 34
Slide 34 text
Logical index/4
Logical index/2
read a[8]
First Access
Slide 35
Slide 35 text
Logical index/4
Logical index/2
read a[8]
First Access
Slide 36
Slide 36 text
Logical index/4
Logical index/2
read a[8]
First Access
Slide 37
Slide 37 text
Logical index/4
Logical index/2
read a[8]
First Access
Slide 38
Slide 38 text
After First
Access
Used (Public)
Slide 39
Slide 39 text
Second
Access
read a[9]
Slide 40
Slide 40 text
Second
Access
read a[9]
Randomly select unused element
Slide 41
Slide 41 text
Second
Access
read a[9]
Randomly select unused element
Randomly select unused element
Slide 42
Slide 42 text
Second
Access
read a[9]
Randomly select unused element
Randomly select unused element
Slide 43
Slide 43 text
Second
Access
read a[9]
Randomly select unused element
Randomly select unused element
Slide 44
Slide 44 text
After Second
Access
Slide 45
Slide 45 text
Position map
3 0 2 1
0 1 2 3
1 3 0 2
0 1 2 3
Slide 46
Slide 46 text
Creating position map
Slide 47
Slide 47 text
Creating position map
Slide 48
Slide 48 text
Inverse permutation
,
,
⋅
/
= ,
⋅
Alice picks a
random
masking
permutation
Composed
permutation
revealed to Bob
Scheme
1. Shuffle elements
2. Recreate position map
3. Service =
log accesses
Amortized cost: Θ log7 per access
Slide 51
Slide 51 text
Initialization cost
Slide 52
Slide 52 text
16-byte blocks
32-byte blocks
Pre-Access Cost (not counting initialization)
Per-access cost
Good enough
for National
Residency
Match?
Slide 53
Slide 53 text
Cost of Matching
Best previous result: 128x128 pairs in > 1000 hours
[Keller & Scholl 2014]
Using Square-Root ORAM: 512x512 pairs in 33 hours
Scale needed for national residency match: 35,000
Need 1000x improvement…
Slide 54
Slide 54 text
Scaling to National Match
Roth-Peranson: asymmetric matchings
– Algorithm that is actually used for NRMP, school
matchings, etc.
Initialize state by permuting and interleaving
Take advantage of data-independent memory
patterns: locality, batching, partitioning
Slide 55
Slide 55 text
Oblivious Multilist
Slide 56
Slide 56 text
Phase Time Non-Free Gates Gates/second
Initialization 2.07 hours 34 B 4.57 M
Bidding 15.01 hours 173 B 3.19 M
Total 17.08 hours 207 B 3.36 M
Simulated 2016 US National Medical Residency Match:
35,476 prospective residents matching with 4836 programs with 30,750 total slots
Running between 2 EC2.c4xlarge nodes in same region (1 Gbps)
Slide 57
Slide 57 text
University of Virginia
Charlottesville, Virginia
Jack
Doerner
Samee
Zahur