Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Oblivious Data Abstractions: Circuit Structures and Square-Root ORAM

Oblivious Data Abstractions: Circuit Structures and Square-Root ORAM

Tutorial at Summer School on Secure and Oblivious Computation and Outsourcing
Notre Dame University, 10 May 2016

David Evans

May 10, 2016
Tweet

More Decks by David Evans

Other Decks in Education

Transcript

  1. Plan Memory for Data-Oblivious Computation Special-Purpose Data Structures: Circuit Structures

    Aside: Programming Secure Computations: Obliv-C General-Purpose Memory: Square-Root ORAM
  2. Crazy Things in Typical Code 3 a[i] = x SameeZahur

    and David Evans. Circuit Structures for Improving Efficiency of Security & Privacy Tools. IEEE Security and Privacy (Oakland) 2013.
  3. Circuit for Array Update 4 a[i] = x i ==

    0 a[0] x a'[0] i == 1 a[1] x a’[1] i == 2 a[2] x a’[2] …
  4. Easy (and Common) Case 5 for (i = 0; i

    < n; i++) a[i] += 1 a[0] a[1] a[2] a[n-1] … +1 +1 +1 +1
  5. Locality: Stacks and Queues 6 if (x != 0) a[i]

    += 1 if (a[i] > 10) i += 1 a[i] = 5 t := a.top() + 1 a.cond_update(x != 0, t) a.cond_push(x != 0 && t > 10, *) a.cond_update(x != 0, 5) Data-oblivious code No branching allowed
  6. Naïve Conditional Push 7 … p x a[0] a[1] a[2]

    … a’[0] a’[1] a’[2] …
  7. More Efficient Stack 9 Level 0: 2 9 3 t

    = 3 Level 1: 4 7 t = 2 5 4 Level 2: 8 8 2 3 8 6 … Block size = 2level Each level has 5 blocks, at least 2 full and 2 empty t = 3
  8. 10 2 9 3 t = 3 4 7 t

    = 2 5 4 Level 0 t = 3 Level 1 Level 2 Conditional push (True, 7) 7 2 9 3 t = 4 4 7 t = 2 5 4 t = 3 Conditional push (True, 8) 8 7 2 9 3 t = 5 4 7 t = 2 5 4 t = 3 Shift 8 2 7 t = 3 4 7 t = 3 5 4 9 3 t = 3
  9. 11 2 9 3 t = 3 4 7 t

    = 2 5 4 Level 0 t = 3 Level 1 Level 2 Conditional push (True, 7) 7 2 9 3 t = 4 4 7 t = 2 5 4 t = 3 Conditional push (True, 8) 8 7 2 9 3 t = 5 4 7 t = 2 5 4 t = 3 Shift 8 2 7 t = 3 4 7 t = 3 5 4 9 3 t = 3 Amortized Θ(log n) gates per operation
  10. 0 2 7 9 'A' 'U' 'M' 'R' 'D' 'Y'

    'K' 'C' Batching Array Accesses 16 m[0] = 'A' m[2] = 'U' m[9] = 'M' m[7] = 'R' m[0] = 'D' m[9] = 'Y' m[9] = 'K' m[7] = 'C' Execution trace: indexes and values are private values
  11. 17 m[0] = 'A' m[2] = 'U' m[9] = 'M'

    m[7] = 'R' m[0] = 'D' m[9] = 'Y' m[9] = 'K' m[7] = 'C' Sort by Key m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' stable sort! Batching Updates
  12. 18 m[0] = 'A' m[2] = 'U' m[9] = 'M'

    m[7] = 'R' m[0] = 'D' m[9] = 'Y' m[9] = 'K' m[7] = 'C' Sort by Key m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' stable sort! Batching Updates
  13. 19 m[0] = 'A' m[2] = 'U' m[9] = 'M'

    m[7] = 'R' m[0] = 'D' m[9] = 'Y' m[9] = 'K' m[7] = 'C' Sort by Key m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' stable sort! Compare Adjacent m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' Batching Updates
  14. 20 m[0] = 'A' m[2] = 'U' m[9] = 'M'

    m[7] = 'R' m[0] = 'D' m[9] = 'Y' m[9] = 'K' m[7] = 'C' Sort by Key m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' stable sort! Compare Adjacent m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' Batching Updates
  15. 21 Batching Updates m[0] = 'A' m[2] = 'U' m[9]

    = 'M' m[7] = 'R' m[0] = 'D' m[9] = 'Y' m[9] = 'K' m[7] = 'C' Sort by Key m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' stable sort! Compare Adjacent m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K'
  16. 22 = 'A' = 'U' = 'M' = 'R' =

    'D' = 'Y' = 'K' = 'C' Sort by Key m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' stable sort! Compare Adjacent m[0] = 'A' m[0] = 'D' m[2] = 'U' m[7] = 'R' m[7] = 'C' m[9] = 'M' m[9] = 'Y' m[9] = 'K' m[0] = 'D' m[2] = 'U' m[7] = 'C' m[9] = 'K' m[0] = 'A' m[7] = 'R' m[9] = 'M' m[9] = 'Y' Sort by Liveness output wires Discarded
  17. Example Application: DBScan 24 Density-based clustering: depth-first search to find

    dense clusters Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu. KDD 1996 Alice’s Data Bob’s Data Joint Clusters
  18. 25 Private Input: P – array of points (combines private

    points from both parties) Public inputs:minpts, radius Output:cluster number for each point Conditional Push! Array update!
  19. 26 0 5000 10000 15000 20000 25000 30000 35000 40000

    60 120 240 480 Execution Time (seconds) Data Size Optimized Structures Normal Data Structures 9.7 hours 55 minutes
  20. Data-Oblivious Memory Specialized memory access Circuit structures, protocol agnostic Stacks,

    queues, batched map operations General random access Oblivious RAM But first…Obliv-C
  21. Tools for Building Secure Computations Library-based frameworks: Circuit-level programs Full

    control Low-level programming Little type safety High-level Languages Little control High-level programming Strong type safety
  22. Tools for Building Secure Computations Library-based frameworks: Circuit-level programs Full

    control Low-level programming Little type safety High-level Languages Little control High-level programming Strong type safety PICCO
  23. Library-based frameworks: Circuit-level programs Full control Low-level programming Little type

    safety High-level Languages Little control High-level programming Strong type safety High-level programming Low-level customizability Helpful, escapabletype checking Tools for Building Secure Computations
  24. Obliv-C #include <million.h> int main (int argc, char ∗argv[]) {

    ProtocolDesc pd; ProtocolIO io; int p = (argv[1] == ’1’ ? 1 : 2); sscanf(argv[2], "%d", &io.myinput); // ... set up TCP connections setCurrentParty(&pd, p); execYaoProtocol(&pd, millionaire, &io); printf("Result: %d\n", io->cmp); // ... cleanup }
  25. Escaping ~obliv(var) { … } Code inside ~obliv always executes

    regardless of oblivious condition var is Boolean: oblivious condition Programmer has control! But, not security risk: all private data is still encrypted
  26. Why Mercury? 45 Speed of Sound Air 343 m/s Mercury

    1450 m/s (40° C) Water 1500 m/s (25° C)
  27. Why Mercury? 46 Speed of Sound Air 343 m/s Mercury

    1450 m/s (40° C) Water 1500 m/s (25° C)
  28. 47

  29. 49

  30. Linear Scan Doesn’t Scale Writing a single 32-bit integer: 32

    logic gates Raw Yao’s performance ≈ 1 million gates per second Write speed ≈ 31,250 elements per second (not hiding access pattern) For hiding access pattern, 216 elements per write Around 2 seconds per access
  31. SC-ORAM in Practice • Only faster than linear scan for

    large memories (> 214 blocks) • Expensive to initialize: – Repeated writes – 2 weeks! From Yan’s talk this morning
  32. Circuit ORAM Access time Xiao Wang, Hubert Chan, and Elaine

    Shi. Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound. In ACM CCS 2015.
  33. SC-ORAM in Practice • Only faster than linear scan for

    large memories (> 214 blocks) • Expensive to initialize: – Repeated writes – 2 weeks! From Yan’s talk this morning Circuit ORAM Square-Root ORAM
  34. Problems with SQ-ORAM Design • Requires a PRF for each

    ORAM access – PRF is a big circuit in MPC • Initialization requires PRF evaluations • Requires oblivious sort twice: – Shuffling memory according to PRF – Removing dummy blocks
  35. Problems with GO’s SQ-ORAM • Requires a PRF for each

    ORAM access – PRF is a big circuit in MPC • Initialization requires PRF evaluations • Requires oblivious sort twice: – Shuffling memory according to PRF – Removing dummy blocks Solution strategy: use random permutation instead of PRF
  36. Linear scan Cost: 4B = 12B/3 Our scheme Cost: 11B/3

    Less expensive than linear scan for 4 blocks (8 with overhead)
  37. ∼32 minutes 55,000x standard execution Wall-clock time in seconds for

    full protocol between two EC2 C4.2xlarge nodes (1.03 Gbps)
  38. ∼33 hours (“wikipedia” version) Improved to ∼1 hour with custom

    structures Wall-clock time in seconds for full protocol between two EC2 C4.2xlarge nodes (1.03 Gbps)
  39. Open Problems • Scalability: poly-logarithmic hierarchical ORAM design • Automatic

    optimization: using custom data structures when memory access predictable • Stronger security models: active security – All results are semi-honest model • Establishing Meaningful Trust 64 KB memory 1 s access (∼2000x improvement)
  40. Collaborators Samee Zahur (UVA) Jack Doerner (UVA) Adrià Gascón (U

    of Edinburgh) Jonathan Katz (U Maryland) Mariana Raykova (SRI, Yale) Xiao Wang (U Maryland) Paper: Revisiting Square-Root ORAM Efficient Random Access in Multi-Party Computation IEEE Symposium on Security and Privacy (Oakland), May 2016 http://oblivc.org/docs/sqoram.pdf Code: http://oblivc.org