Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Memory for Data-Oblivious Computation

Memory for Data-Oblivious Computation

David Evans

June 25, 2016
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Secure Two-Party Computation Alice Bob r = f(a, b) a

    b r = f(a, b) Cryptographic Protocol learns nothing about b learns nothing about a
  2. Yao’s Protocol: Garbled Circuits Function expressed as a Boolean Circuit

    Garbled evaluation: no information leaked Ridiculously expensive (but 1012 cheaper than 10 years ago) Garble Encode Evaluate Decode f garbled circuit F Y a b Generator Evaluator
  3. Stable Matching Alice Bob Colleen ACB ABC CBA M =

    { (s1 , r1 ), (s2 , r2 ), … } is a stable matching if there is no pair (si , rj ) where both si and rj prefer this match over the given match
  4. Stable Matching Applications Public schools in New York, Boston Singapore

    University Admissions Medical residents in US, Canada, others 35,000 applicants
  5. Stable Matching Applications Public schools in New York, Boston Singapore

    University Admissions Medical residents in US, Canada, others 35,000 applicants Use Trusted Third Party to run matching algorithm: - Receives all private rankings and keeps confidential - Produces correct result - uncorrupted
  6. Secure Two-Party Stable Matching Protocol Each group trusts one representative

    XOR-share to 2 non-colluding parties Doug S T Tsinghua
  7. Data-dependent lookup in size-n array Oblivious conditionals: need to always

    execute all paths Data-dependent updates to size-n2 array
  8. Circuit for Array Update 19 i == 0 a[0] x

    a'[0] Linear Scan: need to touch every array element to hide which one is real i == 1 a[1] x a'[1] i == 2 a[2] x a'[2] i == 3 a[3] x a'[3] …
  9. Linear Scan Doesn’t Scale Writing a single 32-bit integer: 32

    logic gates Raw Yao’s performance ≈ 3M gates per second Write speed ≈ 100,000 elements per second (not hiding access pattern) For hiding access pattern, N = 217 elements requires > 1 second per access
  10. Traditional ORAM Client Untrusted Server [Goldreich 1987] Security property: all

    initialization and access sequences of the same length are indistinguishable to server. Sublinear client-side state Linear server-side encrypted state Initialize Access
  11. RAM-SC [Gordon, Katz, Kolesnikov, Krell, Malkin, Raykova, Vahlis2012] Alice Bob

    MPC Protocol Public ORAM state Public ORAM state Encrypted Results Oblivious ORAM state Initialize Access Encrypted ORAM Data
  12. Circuit ORAM Access time Xiao Wang, Hubert Chan, and Elaine

    Shi. Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound. In ACM CCS 2015. State-of-the- ORAM-Art in 2015 Θ log3 Linear scan
  13. Problems with SQ-ORAM Design Requires a PRF for each ORAM

    access Pseudo-random function: a big circuit in MPC Initialization requires PRF evaluations Requires oblivious sort twice: Shuffling memory according to PRF Removing dummy blocks Solution strategy: use random permutation instead of PRF
  14. 4-Block ORAM Cost: 5B + B +2 B +3 B

    + … = 11B every 3 accesses
  15. Linear scan Cost: 4B = 12B/3 Our scheme Cost: 11B/3

    Less expensive than linear scan for 4 blocks (8 with overhead)
  16. Inverse permutation , , ⋅ / = , ⋅ Alice

    picks a random masking permutation Composed permutation revealed to Bob
  17. Inverse permutation , Bob computes / 12 = 12 ⋅,

    12 , / 12 ⋅ , = 12 ⋅ , 12 ⋅ , = 12 / = , ⋅ / 12
  18. Scheme 1. Shuffle elements 2. Recreate position map 3. Service

    = log accesses Amortized cost: Θ log7 per access
  19. Cost of Matching Best previous result: 128x128 pairs in >

    1000 hours [Keller & Scholl 2014] Using Square-Root ORAM: 512x512 pairs in 33 hours Scale needed for national residency match: 35,000 Need 1000x improvement…
  20. Scaling to National Match Roth-Peranson: asymmetric matchings – Algorithm that

    is actually used for NRMP, school matchings, etc. Initialize state by permuting and interleaving Take advantage of data-independent memory patterns: locality, batching, partitioning
  21. Phase Time Non-Free Gates Gates/second Initialization 2.07 hours 34 B

    4.57 M Bidding 15.01 hours 173 B 3.19 M Total 17.08 hours 207 B 3.36 M Simulated 2016 US National Medical Residency Match: 35,476 prospective residents matching with 4836 programs with 30,750 total slots Running between 2 EC2.c4xlarge nodes in same region (1 Gbps)