Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Mercury Delay Lines to Magnetic Core Memories: Progress in Oblivious Memories

From Mercury Delay Lines to Magnetic Core Memories: Progress in Oblivious Memories

Talk at Theory and Practice of Secure Multi-Party Computation in Aarhus, Denmark
1 June 2016

David Evans

July 03, 2016
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. From Mercury Delay Lines to Magnetic Core Memories:
    Progress in Oblivious Memories
    David Evans
    University of Virginia
    www.cs.virginia.edu/evans
    oblivc.org
    Theory and Practice of Secure Multiparty Computation 2016
    Aarhus University
    1 June 2016

    View Slide

  2. Building MPC Applications
    Application-Specific
    Custom Protocols
    Custom Data Structures
    Data-Oblivious
    Algorithms
    General Purpose
    Generic Protocols(e.g., Yao’s)
    Library Data Structures
    General-Purpose ORAM
    Standard Algorithms

    View Slide

  3. Setting
    Semi-honest model
    Two-party computation
    Mostly standard assumptions
    (although implementation uses Free-XOR)

    View Slide

  4. Oblivious Data Structures
    Samee Zahur and David Evans. Circuit Structures
    for Improving Efficiency of Security & Privacy Tools.
    IEEE Security and Privacy (Oakland) 2013.

    View Slide

  5. Crazy Things in Typical Code
    5
    a[i] = x

    View Slide

  6. Circuit for Array Update
    6
    i == 0
    a[0] x
    a'[0]
    i == 1
    a[1] x
    a'[1]
    i == 2
    a[2] x
    a' [2]

    View Slide

  7. Easy (and Common) Case
    7
    for (i = 0; i < n; i++)
    a[i] += 1
    a[0] a[1] a[2] a[n-1]

    +1 +1 +1 +1

    View Slide

  8. Locality: Stacks and Queues
    8
    if (x != 0)
    a[i] += 1
    if (a[i] > 10)
    i += 1
    a[i] = 5
    t := a.top() + 1
    a.cond_update(x != 0, t)
    a.cond_push(x != 0 && t > 10, *)
    a.cond_update(x != 0, 5)
    Data-oblivious code
    No branching allowed

    View Slide

  9. Naïve Conditional Push
    9

    p
    x a[0] a[1] a[2] …
    a’[0] a’[1] a’[2] …

    View Slide

  10. Naïve Conditional Push
    10

    True
    7 2 9 3 …
    7 2 9 …

    View Slide

  11. More Efficient Stack
    11
    Level 0: 2 9 3
    t = 3
    Level 1: 4 7
    t = 2
    5 4
    Level 2: 8 8 2 3 8 6

    Block size = 2level
    Each level has 5 blocks, at least 2 full and 2 empty
    t = 3

    View Slide

  12. Efficient queue operations

    View Slide

  13. Spatial Locality
    Not just for stacks and queues
    Access cost Θ log

    View Slide

  14. Temporal
    Batching
    Θ(n log2 n)

    View Slide

  15. Example Application: DBScan
    15
    Density-based clustering:
    depth-first search to find dense clusters
    Martin Ester, Hans-Peter Kriegel,
    Jörg Sander, Xiaowei Xu. KDD 1996
    Alice’s Data Bob’s Data Joint Clusters

    View Slide

  16. 16
    Private Input: P – array of points
    (combines private points from both parties)
    Public inputs:minpts, radius
    Output:cluster number for each point
    Conditional Push!
    Array update!

    View Slide

  17. 17
    0
    5000
    10000
    15000
    20000
    25000
    30000
    35000
    40000
    60 120 240 480
    Execution Time (seconds)
    Data Size
    Optimized Structures
    Normal Data Structures
    9.7 hours
    55 minutes

    View Slide

  18. Data-Oblivious Memory
    Specialized memory access
    Circuit structures, protocol agnostic
    Stacks, queues, batched map operations
    General random access
    Oblivious RAM
    But first…Obliv-C

    View Slide

  19. Tools for Building Secure Computations
    Library-based
    frameworks:
    Circuit-level
    programs
    Full control
    Low-level programming
    Little type safety
    High-level
    Languages
    Little control
    High-level programming
    Strong type safety

    View Slide

  20. Library-based
    frameworks:
    Circuit-level
    programs
    Full control
    Low-level programming
    Little type safety
    High-level
    Languages
    Little control
    High-level programming
    Strong type safety
    High-level programming
    Low-level customizability
    Helpful, escapabletype checking
    Tools for Building Secure Computations

    View Slide

  21. Obliv-C

    View Slide

  22. Obliv-C
    #include
    int main (int argc, char ∗argv[]) {
    ProtocolDesc pd;
    ProtocolIO io;
    int p = (argv[1] == ’1’ ? 1 : 2);
    sscanf(argv[2], "%d", &io.myinput);
    // ... set up TCP connections
    setCurrentParty(&pd, p);
    execYaoProtocol(&pd, millionaire, &io);
    printf("Result: %d\n", io->cmp);
    // ... cleanup }

    View Slide

  23. Oblivious Conditionals
    from obinary_search

    View Slide

  24. Actual code…with all the ugly parts

    View Slide

  25. Escaping
    ~obliv(var) {

    }
    Code inside ~obliv always executes
    regardless of oblivious condition
    var is Boolean: oblivious condition
    Programmer has control! But, not security risk: all private data is still encrypted

    View Slide

  26. Implementing Oblivious Queue

    View Slide

  27. View Slide

  28. http://oblivc.org/

    View Slide

  29. Historical
    Excursion
    Journal of the ACM,
    January 1968

    View Slide

  30. 30
    (In same Jan 1968 JACM as Waksman Network!)

    View Slide

  31. Delay
    Lines
    31

    View Slide

  32. Mercury Delay Lines
    32
    0/1

    View Slide

  33. Why Mercury?
    33
    Speed of Sound
    Air 343 m/s
    Mercury 1450 m/s (40° C)
    Water 1500 m/s (25° C)

    View Slide

  34. 34

    View Slide

  35. 35

    View Slide

  36. 36
    MIT Project Whirlwind, 1951
    2K 16-bit words
    with “no waiting”!
    Magnetic
    Core
    Memory

    View Slide

  37. Oblivious RAM

    View Slide

  38. Linear Scan Doesn’t Scale
    Writing a single 32-bit integer: 32 logic gates
    Raw Yao’s performance ≈ 3M gates per second
    Write speed ≈ 100,000 elements per second
    (not hiding access pattern)
    For hiding access pattern, N = 217 elements
    requires > 1 second per access

    View Slide

  39. Traditional ORAM
    Client Untrusted Server
    [Goldreich 1987]
    Security property: all initialization and access sequences
    of the same length are indistinguishable to server.
    Sublinear
    client-side
    state
    Linear
    server-side
    encrypted
    state
    Initialize
    Access

    View Slide

  40. RAM-SC
    [Gordon, Katz, Kolesnikov, Krell, Malkin, Raykova, Vahlis 2012]
    Alice Bob
    MPC Protocol
    Public
    ORAM
    state
    Public
    ORAM
    state
    Encrypted
    Results
    Oblivious
    ORAM state
    Initialize
    Access

    View Slide

  41. Circuit ORAM Access time
    Xiao Wang, Hubert
    Chan, and Elaine Shi.
    Circuit ORAM: On
    Tightness of the
    Goldreich-Ostrovsky
    Lower Bound. In
    ACM CCS 2015.
    State-of-
    the-ORAM-
    Art in 2015
    Θ log3
    Linear scan

    View Slide

  42. Results Summary
    (Not including
    initialization cost)
    + ~ 1 week to initialize

    View Slide

  43. Classical Square-Root ORAM

    View Slide

  44. Problems with SQ-ORAM Design
    • Requires a PRF for each ORAM access
    – PRF is a big circuit in MPC
    • Initialization requires PRF evaluations
    • Requires oblivious sort twice:
    – Shuffling memory according to PRF
    – Removing dummy blocks
    Solution strategy: use random permutation instead of PRF

    View Slide

  45. Shuffling Network [Waksman 1968]
    Cost per shuffle: 5B

    View Slide

  46. 4-Block ORAM

    View Slide

  47. 4-Block ORAM
    Cost: 5B + B +2B +3B + …
    = 11B every 3 accesses

    View Slide

  48. Linear scan
    Cost: 4B = 12B/3
    Our scheme
    Cost: 11B/3
    Less expensive than linear scan for 4 blocks (8 with overhead)

    View Slide

  49. View Slide

  50. Logical index/4
    Logical index/2

    View Slide

  51. Logical index/4
    Logical index/2
    read a[8]
    First Access

    View Slide

  52. Logical index/4
    Logical index/2
    read a[8]
    First Access

    View Slide

  53. Logical index/4
    Logical index/2
    read a[8]
    First Access

    View Slide

  54. Logical index/4
    Logical index/2
    read a[8]
    First Access

    View Slide

  55. Logical index/4
    Logical index/2
    read a[8]
    First Access

    View Slide

  56. After First Access
    Used (Public)

    View Slide

  57. Second Access
    read a[9]

    View Slide

  58. Second Access
    read a[9]
    Randomly select unused element

    View Slide

  59. Second Access
    read a[9]
    Randomly select unused element
    Randomly select unused element

    View Slide

  60. Second Access
    read a[9]
    Randomly select unused element
    Randomly select unused element

    View Slide

  61. Second Access
    read a[9]
    Randomly select unused element
    Randomly select unused element

    View Slide

  62. After Second Access

    View Slide

  63. Position map
    3 0 2 1
    0 1 2 3
    1 3 0 2
    0 1 2 3

    View Slide

  64. Creating position map

    View Slide

  65. Creating position map

    View Slide

  66. Inverse permutation
    8

    8

    ;
    = 8

    Alice picks a
    random masking
    permutation
    Composed
    permutation
    revealed to Bob

    View Slide

  67. Inverse permutation
    8
    Bob computes
    ;
    => = => ⋅ 8
    =>
    8
    ;
    => ⋅ 8
    = => ⋅ 8
    => ⋅ 8
    = =>
    ;
    = 8

    ;
    =>

    View Slide

  68. Scheme
    1. Shuffle elements
    2. Recreate position map
    3. Service = log accesses
    Amortized cost: Θ logA per access

    View Slide

  69. Initialization cost

    View Slide

  70. 16-byte blocks
    32-byte blocks
    Pre-Access Cost (not counting initialization)
    Have we reached
    the magnetic core
    memory era yet?

    View Slide

  71. 16-byte blocks
    32-byte blocks
    Whirlwind I (1951)
    30 s, 2048 x 16-bit words

    View Slide

  72. 16-byte blocks
    32-byte blocks
    Z3 (1941)
    Whirlwind I (1951)
    30 s, 2048 x 16-bit words

    View Slide

  73. Wall-clock time in seconds for full protocol between two EC2 C4.2xlarge nodes (1.03 Gbps)

    View Slide

  74. ∼32 minutes
    55,000x standard execution
    Wall-clock time in seconds for full protocol between two EC2 C4.2xlarge nodes (1.03 Gbps)

    View Slide

  75. ∼33 hours (“wikipedia” version)
    Improved to ∼1 hour with custom structures
    Wall-clock time in seconds for full protocol between two EC2 C4.2xlarge nodes (1.0 Gbps)

    View Slide

  76. Open Problems
    • Scalability: poly-logarithmic
    hierarchical ORAM design
    • Automatic optimization: using
    custom data structures when
    memory access predictable
    • Stronger security models: active
    security
    – All results are semi-honest model
    • Establishing Meaningful Trust
    64 KB memory
    1 s access
    (∼2000x improvement)

    View Slide

  77. Collaborators
    Samee Zahur
    Jack Doerner
    David Evans
    Xiao Wang
    Jonathan Katz
    Mariana Raykova Adrià Gascón
    Code and Paper: oblivc.org/sqoram

    View Slide

  78. David Evans
    [email protected]
    www.cs.virginia.edu/evans
    OblivC.org
    mightBeEvil.org

    View Slide