Memory for Data-Oblivious Computation

Memory for Data-Oblivious Computation

40e37c08199ed4d3866ce6e1ff0be06d?s=128

David Evans

June 25, 2016
Tweet

Transcript

  1. Memory for Data-Oblivious Computation David Evans University of Virginia oblivc.org

  2. Memory for Data-Oblivious Computation David Evans University of Virginia www.cs.virginia.edu/evans

  3. Theory and Practice in Computing Quotes from Maurice Wilkes’s Turing

    Award Lecture (1967) Alan Turing
  4. Secure Two-Party Computation Alice Bob r = f(a, b) a

    b r = f(a, b) Cryptographic Protocol learns nothing about b learns nothing about a
  5. FOCS 1982 FOCS 1986 Note: neither paper actually describes Yao’s

    protocol. Andrew Yao
  6. Yao’s Protocol: Garbled Circuits Function expressed as a Boolean Circuit

    Garbled evaluation: no information leaked Ridiculously expensive (but 1012 cheaper than 10 years ago) Garble Encode Evaluate Decode f garbled circuit F Y a b Generator Evaluator
  7. Motivating Application: Secure Stable Matching

  8. Alice Bob Colleen University Rankings A C B A B

    C C B A Student Preferences
  9. Stable Matching Alice Bob Colleen ACB ABC CBA M =

    { (s1 , r1 ), (s2 , r2 ), … } is a stable matching if there is no pair (si , rj ) where both si and rj prefer this match over the given match
  10. Gale-Shapley Algorithm Lloyd Shapley (1923-2016) accepting Nobel Prize (2012)

  11. Stable Matching Applications Public schools in New York, Boston Singapore

    University Admissions Medical residents in US, Canada, others 35,000 applicants
  12. Stable Matching Applications Public schools in New York, Boston Singapore

    University Admissions Medical residents in US, Canada, others 35,000 applicants Use Trusted Third Party to run matching algorithm: - Receives all private rankings and keeps confidential - Produces correct result - uncorrupted
  13. Secure Two-Party Stable Matching Protocol Each group trusts one representative

    Doug Tsinghua
  14. Secure Two-Party Stable Matching Protocol Each group trusts one representative

    XOR-share to 2 non-colluding parties Doug S T Tsinghua
  15. None
  16. Data-dependent lookup in size-n array

  17. Data-dependent lookup in size-n array Data-dependent updates to size-n2 array

  18. Data-dependent lookup in size-n array Oblivious conditionals: need to always

    execute all paths Data-dependent updates to size-n2 array
  19. Data-Oblivious Array Access 18 a[i] = x Depends on private

    data
  20. Circuit for Array Update 19 i == 0 a[0] x

    a'[0] Linear Scan: need to touch every array element to hide which one is real i == 1 a[1] x a'[1] i == 2 a[2] x a'[2] i == 3 a[3] x a'[3] …
  21. Linear Scan Doesn’t Scale Writing a single 32-bit integer: 32

    logic gates Raw Yao’s performance ≈ 3M gates per second Write speed ≈ 100,000 elements per second (not hiding access pattern) For hiding access pattern, N = 217 elements requires > 1 second per access
  22. Traditional ORAM Client Untrusted Server [Goldreich 1987] Security property: all

    initialization and access sequences of the same length are indistinguishable to server. Sublinear client-side state Linear server-side encrypted state Initialize Access
  23. RAM-SC [Gordon, Katz, Kolesnikov, Krell, Malkin, Raykova, Vahlis2012] Alice Bob

    MPC Protocol Public ORAM state Public ORAM state Encrypted Results Oblivious ORAM state Initialize Access Encrypted ORAM Data
  24. Circuit ORAM Access time Xiao Wang, Hubert Chan, and Elaine

    Shi. Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound. In ACM CCS 2015. State-of-the- ORAM-Art in 2015 Θ log3 Linear scan
  25. Classical Square-Root ORAM [Ostrovsky and Goldreich, 1992]

  26. Problems with SQ-ORAM Design Requires a PRF for each ORAM

    access Pseudo-random function: a big circuit in MPC Initialization requires PRF evaluations Requires oblivious sort twice: Shuffling memory according to PRF Removing dummy blocks Solution strategy: use random permutation instead of PRF
  27. Shuffling Network [Waksman 1968] Cost per shuffle: 5B

  28. 4-Block ORAM

  29. 4-Block ORAM Cost: 5B + B +2 B +3 B

    + … = 11B every 3 accesses
  30. Linear scan Cost: 4B = 12B/3 Our scheme Cost: 11B/3

    Less expensive than linear scan for 4 blocks (8 with overhead)
  31. None
  32. Logical index/4 Logical index/2

  33. Logical index/4 Logical index/2 read a[8] First Access

  34. Logical index/4 Logical index/2 read a[8] First Access

  35. Logical index/4 Logical index/2 read a[8] First Access

  36. Logical index/4 Logical index/2 read a[8] First Access

  37. Logical index/4 Logical index/2 read a[8] First Access

  38. After First Access Used (Public)

  39. Second Access read a[9]

  40. Second Access read a[9] Randomly select unused element

  41. Second Access read a[9] Randomly select unused element Randomly select

    unused element
  42. Second Access read a[9] Randomly select unused element Randomly select

    unused element
  43. Second Access read a[9] Randomly select unused element Randomly select

    unused element
  44. After Second Access

  45. Position map 3 0 2 1 0 1 2 3

    1 3 0 2 0 1 2 3
  46. Creating position map

  47. Creating position map

  48. Inverse permutation , , ⋅ / = , ⋅ Alice

    picks a random masking permutation Composed permutation revealed to Bob
  49. Inverse permutation , Bob computes / 12 = 12 ⋅,

    12 , / 12 ⋅ , = 12 ⋅ , 12 ⋅ , = 12 / = , ⋅ / 12
  50. Scheme 1. Shuffle elements 2. Recreate position map 3. Service

    = log accesses Amortized cost: Θ log7 per access
  51. Initialization cost

  52. 16-byte blocks 32-byte blocks Pre-Access Cost (not counting initialization) Per-access

    cost Good enough for National Residency Match?
  53. Cost of Matching Best previous result: 128x128 pairs in >

    1000 hours [Keller & Scholl 2014] Using Square-Root ORAM: 512x512 pairs in 33 hours Scale needed for national residency match: 35,000 Need 1000x improvement…
  54. Scaling to National Match Roth-Peranson: asymmetric matchings – Algorithm that

    is actually used for NRMP, school matchings, etc. Initialize state by permuting and interleaving Take advantage of data-independent memory patterns: locality, batching, partitioning
  55. Oblivious Multilist

  56. Phase Time Non-Free Gates Gates/second Initialization 2.07 hours 34 B

    4.57 M Bidding 15.01 hours 173 B 3.19 M Total 17.08 hours 207 B 3.36 M Simulated 2016 US National Medical Residency Match: 35,476 prospective residents matching with 4836 programs with 30,750 total slots Running between 2 EC2.c4xlarge nodes in same region (1 Gbps)
  57. University of Virginia Charlottesville, Virginia Jack Doerner Samee Zahur

  58. David Evans evans@virginia.edu www.cs.virginia.edu/evans oblivc.org