Realization and Evaluation of Measurement Feedback Coherent Ising Machines for Combinatorial Optimization Problems

Slide 1

Slide 1 text

Realization and Evaluation of Measurement Feedback Coherent Ising Machines for Combinatorial Optimization Problems ૊߹ͤ࠷దԽ໰୊ͷͨΊͷ ଌఆϑΟʔυόοΫܕίώʔϨϯτɾΠδϯάϚγϯ ͷ࣮ݱͱධՁ Yoshitaka Haribara Advisor: Prof. Kazuyuki Aihara January 23, 2018 Department of Mathematical Informatics University of Tokyo

Slide 39

Slide 39 text

Single Trial N = 16 Cubic graph (Möbius ladder) N = 100 Random graph (Erdös-Rényi) and in cases in which exact solutions are not easy to obtain, we can find good approximate solutions. The schematic of our experimental setup (Fig. 1) shows that our Ising machine is formed by the combination of time-division– multiplexed OPOs (18) in a single fiber-ring cavity, with measurement and feedback (injec- tion) stages that act to couple the pulses in the cavity such that the Ising Hamiltonian is realized. Details are provided in the supplementary materials (26). imental schematic of a measurement-feedback–based coherent Ising machine. n–multiplexed pulsed degenerate optical parametric oscillator is formed by a nonlinear dically poled lithium niobate (PPLN)] in a fiber ring cavity containing 160 pulses. A fraction e is measured and used to compute a feedback signal that effectively couples the ependent pulses in the cavity. IM, intensity modulator; PM, phase modulator; LO, local G, second-harmonic generation; FPGA, field-programmable gate array. A Roundtrip Number OPO Pulse In-Phase Amplitude (arb.) 0 50 100 150 -600 -400 -200 0 200 400 600 OPO 1 OPO 2 OPO 3 OPO 4 OPO 5 OPO 6 OPO 7 OPO 8 OPO 9 OPO 10 OPO 11 OPO 12 OPO 13 OPO 14 OPO 15 OPO 16 Computation Time (µs) Computation Time (µs) 0 50 100 150 200 Graph A er of Problem Instances 400 600 800 1000 All 4060 16-vertex cubic graphs Roundtrip Number Graph Cut Size 0 50 100 150 0 5 10 15 20 25 0 50 100 150 200 Ising Energy -20 -10 0 10 20 Graph A MAX CUT / Ground State Energy Graph A 100% 38% Graph C 58% Graph B Exact Solution Exact Solution Exact Solution on November 6, 2 http://science.sciencemag.org/ Downloaded from 0 50 100 150 200 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 Round trip number in-phase amplitude 0 50 100 150 200 0 5 10 15 20 Round trip number Cut ments Jij ) and the length-N vector h (with elements hi ). We have realized a system with a scalable architecture that uses measurement feedback in place of a network of optical delay lines [which were used in initial, low-connectivity, nonre- programmable demonstrations of the concept (18, 24, 25)]. Our 100-spin Ising machine allows connections between any spin and any other spin and is fully programmable. We show that measurement-feedback–based OPO Ising machines can solve many different Ising problems, and in cases in which exact solutions are not easy to obtain, we can find good approximate solutions. The schematic of our experimental setup (Fig. 1) shows that our Ising machine is formed by the combination of time-division– multiplexed OPOs (18) in a single fiber-ring cavity, with measurement and feedback (injec- tion) stages that act to couple the pulses in the cavity such that the Ising Hamiltonian is realized. Details are provided in the supplementary materials (26). ack–based coherent Ising machine. etric oscillator is formed by a nonlinear cavity containing 160 pulses. A fraction ck signal that effectively couples the ulator; PM, phase modulator; LO, local ammable gate array. Roundtrip Number 50 100 150 OPO 1 OPO 2 OPO 3 OPO 4 OPO 5 OPO 6 OPO 7 OPO 8 OPO 9 OPO 10 OPO 11 OPO 12 OPO 13 OPO 14 OPO 15 OPO 16 mputation Time (µs) Computation Time (µs) 50 100 150 200 ph A Roundtrip Number Graph Cut Size 0 50 100 150 0 5 10 15 20 25 0 50 100 150 200 Ising Energy -20 -10 0 10 20 Graph A MAX CUT / Ground State Energy on November 6, 2016 http://science.sciencemag.org/ Downloaded from between qubits remains a major challenge (15), with important implications for the efficiency of AQC/QA systems (16). Networks of coupled optical parametric oscil- lators (OPOs) are an alternative physical system, with an unconventional operating mech- anism (17–20), for solving the Ising problem (21, 22) and by extension many other combinatorial optimization problems (23). For- mally, the N-spin Ising problem is to find the configuration of spins si ∈ f−1; þ1g (i = 1, ..., N) that minimizes the energy function H ¼ − X 1≤i < j ≤N Jij si sj − X 1≤i ≤N hi si , where the particular problem instance being solved is specified by the N × N matrix J (with elements Jij ) and the length-N vector h (with elements hi ). We have realized a system with a scalable architecture that uses measurement feedback in place of a network of optical delay lines [which were used in initial, low-connectivity, nonre- programmable demonstrations of the concept (18, 24, 25)]. Our 100-spin Ising machine allows connections between any spin and any other spin and is fully programmable. We show that measurement-feedback–based OPO Ising machines can solve many different Ising problems, and in cases in which exact solutions are not easy to obtain, we can find good approximate solutions. The schematic of our experimental setup (Fig. 1) shows that our Ising machine is formed by the combination of time-division– multiplexed OPOs (18) in a single fiber-ring cavity, with measurement and feedback (injec- tion) stages that act to couple the pulses in the cavity such that the Ising Hamiltonian is realized. Details are provided in the supplementary materials (26). Fig. 1. Experimental schematic of a measurement-feedback–based coherent Ising machine. A time-division–multiplexed pulsed degenerate optical parametric oscillator is formed by a nonlinear crystal [periodically poled lithium niobate (PPLN)] in a fiber ring cavity containing 160 pulses. A fraction of each pulse is measured and used to compute a feedback signal that effectively couples the otherwise-independent pulses in the cavity. IM, intensity modulator; PM, phase modulator; LO, local oscillator; SHG, second-harmonic generation; FPGA, field-programmable gate array. Graph A Roundtrip Number OPO Pulse In-Phase Amplitude (arb.) 0 50 100 150 -600 -400 -200 0 200 400 600 OPO 1 OPO 2 OPO 3 OPO 4 OPO 5 OPO 6 OPO 7 OPO 8 OPO 9 OPO 10 OPO 11 OPO 12 OPO 13 OPO 14 OPO 15 OPO 16 Computation Time (µs) Computation Time (µs) 0 50 100 150 200 Graph A em Instances 600 800 1000 All 4060 16-vertex cubic graphs Roundtrip Number Graph Cut Size 0 50 100 150 0 5 10 15 20 25 0 50 100 150 200 Ising Energy -20 -10 0 10 20 Graph A MAX CUT / Ground State Energy f Runs 60 80 100 100% State Energy 58% Exact Solution Exact Solution xact Solution RESEARCH | REPORTS on November 6, 2016 http://science.sciencemag.org/ Downloaded from the M energ Th (cubi Fig. 2 matr Fig. 4. Results with various- size and various-density random graphs. (A) Observed probability of obtaining a solution whose cut size is at least x% of the global optimum (maximum cut), as a function of graph size N, for random cubic graph instances. Error bars indicate 1 SD, which is dominated by the difference in difficulty between the various problem instances. (B) The runtime that would be required to obtain a solution of a particular accuracy with 99% probability. (C) The evolution of the in-phase components ci of the N = 100 OPO pulses as a function of the computation time, for a single run with the graph shown in the (D) inset. (D) The graph cut size achieved as a function of the computation time. (Inset) The graph being solved. (E) Observed success probability of obtaining a solution with a particular accuracy as a function of the density of edges in the graph. Experiments were performed on randomly generated N = 100-vertex graphs Runtime to obtain 99% Success Probability (s) 0 10-4 10-3 10-2 10-1 100 Rand Graph Size N=|V | Success Probability (%) 0 20 40 60 80 100 0 20 40 60 80 100 100% 99% 98% 97% 96% 95% 94% 93% 92% 91% 90% Solution accuracy Random cubic graphs Roundtrip Number OPO Pulse In-Phase Amplitude (arb.) 0 50 100 -600 -400 -200 0 200 400 600 0 50 100 150 Graph D: |V|=100, |E|=495 A B C Computation Time (µs) Computation Time (µs) Roundtrip Number 0 50 100 0 100 200 300 Ising Energy -200 -100 0 100 200 300 400 0 50 100 150 MAX CUT / Ground State Energy Graph Cut Size Graph D: |V |=100, |E|=495 D Ising Energy -160 -140 -120 Ising Energy -100 -80 -60 Ising Energy -20 0 20 Fig. 3. Results with various-size Möbius ladder graphs. (A) Observed probability of obtaining a ground state of the Möbius ladder graph in a single run, as a function of the size N of the graph. Multiple 100-run batches were performed for each graph size to obtain the standard deviations, which are shown as error bars. (B to D) Histograms of obtained solutions in 100 runs for the graphs shown in the insets. 0 50 100 150 200 -2 -1 0 1 2 Round trip number In-phase amplitude 0 50 100 150 200 220 240 260 280 300 320 340 Round trip number Cut 616 4 NOVEMBER 2016 • VOL 354 ISSUE 6312 solution whose cut size is at least x% of the global optimum (maximum cut), as a function of graph size N, for random cubic graph instances. Error bars indicate 1 SD, which is dominated by the difference in difficulty between the various problem instances. (B) The runtime that would be required to obtain a solution of a particular accuracy with 99% probability. (C) The evolution of the in-phase components ci of the N = 100 OPO pulses as a function of the computation time, for a single run with the graph shown in the (D) inset. (D) The graph cut size achieved as a function of the computation time. (Inset) The graph being solved. (E) Observed success probability of obtaining a solution with a particular accuracy as a function of the density of edges in the graph. Experiments were performed on randomly generated N = 100-vertex graphs with fixed numbers of edges. Error bars indicate 1 SD. Graph Size N=|V | Success Probability 0 20 40 60 80 0 20 40 60 80 Roundtrip Number OPO Pulse In-Phase Amplitude (arb.) 0 50 100 -600 -400 -200 0 200 400 600 0 50 100 150 Graph D: |V|=100, |E|=495 C Computation Time (µs) 1 2 3 Graph Cut Size D Computation Time (μs) Simulation Experiment Simulation Experiment 39 ground state [P. L. McMahon*, A. Marandi*, Y.haribara, et al., Science 354, 614 (2016)]

Slide 46

Slide 46 text

G22: random G39: scale-free K2000 : complete Node degree distribution Best known 13359 2408 − SA 50 ms (Max in 100) 13336 2384 32781 SA 50 ms (Ave. in 100) 13298 2359 32314 CIM 5 ms (Max in 100) 13313 2361 33191 CIM 5 ms (Ave. in 100) 13248 2328 32457 GW-SDP 12992 2200 29619 Cut value histogram 2000 nodes, 19990 edges edge weight: w∈{0, +1} 2000 nodes, 11778 edges edge weight: w∈{-1, 0, +1} 2000 nodes, 1999000 edges edge weight: w∈{-1, +1} SA CIM GW-SDP Node degree 10 15 20 25 30 35 Amount 200 150 100 50 0 Node degree 1 10 100 Amount 1000 100 10 1 Node degree 1990 1990 2000 2010 Amount 2000 1000 0 G22: random G39: scale-free K2000 : complete Node degree distribution Best known 13359 2408 − SA 50 ms (Max in 100) 13336 2384 32781 SA 50 ms (Ave. in 100) 13298 2359 32314 CIM 5 ms (Max in 100) 13313 2361 33191 CIM 5 ms (Ave. in 100) 13248 2328 32457 GW-SDP 12992 2200 29619 Cut value histogram 2000 nodes, 19990 edges edge weight: w∈{0, +1} 2000 nodes, 11778 edges edge weight: w∈{-1, 0, +1} 2000 nodes, 1999000 edges edge weight: w∈{-1, +1} SA CIM GW-SDP Node degree 10 15 20 25 30 35 Amount 200 150 100 50 0 Node degree 1 10 100 Amount 1000 100 10 1 Node degree 1990 1990 2000 2010 Amount 2000 1000 0 Cut value 30000 31000 32000 33000 Amount 20 15 10 5 0 Amount 20 15 10 5 0 Cut value 13000 13200 13400 35 30 25 Cut value 2200 2300 2400 Amount 20 15 10 5 0 25 (50 ms) G22: random G39: scale-free K2000 : complete Node degree distribution Best known 13359 2408 SA 5 ms (Max in 100) 13245 2339 30118 SA 5 ms (Ave. in 100) 13183 2297 29308 CIM 5 ms (Max in 100) 13313 2361 33191 CIM 5 ms (Ave. in 100) 13248 2328 32457 GW-SDP 12992 2200 29619 Cut value histogram 2000 nodes, 19990 edges edge weight: w{0, +1} 2000 nodes, 11778 edges edge weight: w{-1, 0, +1} 2000 nodes, 1999000 edges edge weight: w{-1, +1} SA CIM GW-SDP Table 1 Node degree 10 15 20 25 30 35 Amount 200 150 100 50 0 Node degree 1 10 100 Amount 1000 100 10 1 Node degree 1990 1990 2000 2010 Amount 2000 1000 0 SA5msẚ㍑ (5 ms) (5 ms) (5 ms) Performance of solution quality (CUT histogram in 100 trials) 46 SA and SDP : Intel Xeon X5650 @ 2.67 GHz, Westmere (2010) [T. Inagaki, Y. Haribara, et al., Science 354, 603 (2016)]

Slide 74

Slide 74 text

SAͷGPUʹΑΔฒྻԽ • ؔ਺࠷దԽ: ਖ਼نԽ Schwefelؔ਺ • NVIDIA FeForce GTX 470 • ಉظߋ৽ͷฒྻԽ͸Τ ϥʔݮ • ܭࢉ࣌ؒ 76x [A.M. Ferreiro, et al., J. Global. Optim. 57, 863 (2012)] 876 J Glob Optim (2013) 57:863–890 Table 1 Error of the solution obtained by the algorithm, both in the value of the function at the minimum (columns | fa − fr |, where fa is the objective function value found by the algorithm and fr is the exact function value in the real minimum) and in the minimum (columns relative error, measured in · 2) n V0 V1 V2 | fa − fr | Relative error | fa − fr | Relative error | fa − fr | Relative error 8 1.3190 × 10−1 2.4283 × 10−3 1.2891 × 10−2 7.4675 × 10−4 1.7000 × 10−5 4.1656 × 10−5 16 2.3712 × 10−1 3.2557 × 10−3 7.4586 × 10−2 1.8240 × 10−3 1.9000 × 10−6 5.0686 × 10−5 32 3.3774 × 10−1 3.8852 × 10−3 2.8171 × 10−1 3.5468 × 10−3 1.5730 × 10−4 6.0577 × 10−5 64 7.9651 × 10−1 5.9664 × 10−3 9.7831 × 10−1 6.6126 × 10−3 3.1880 × 10−4 1.2132 × 10−4 128 1.9198 9.2648 × 10−3 3.0461 1.1674 × 10−2 1.2225 × 10−4 1.5304 × 10−4 256 3.6230 1.2733 × 10−2 9.5765 8.0283 × 10−2 1.4953 × 10−2 8.2214 × 10−4 512 7.3054 1.8097 × 10−2 26.2282 4.0424 × 10−1 4.6350 × 10−1 4.5503 × 10−3 Table 2 Performance of CUDA version versus sequential version with one CPU core for different number of parameters n V0 V1 V2 Time Time Speedup Time Speedup 8 1,493.7686 5.5436 269.4595 5.6859 262.7121 16 2,529.3072 15.3942 164.3027 15.5889 162.2502 32 4,618.5820 56.9808 81.0550 60.1882 76.7356 64 8,773.0560 106.6075 82.2930 110.2702 79.5596 128 17,169.0000 210.9499 81.3890 215.5416 79.6552 256 34,251.9240 455.4910 75.1978 462.8035 74.0096 512 68,134.5760 871.7434 78.1589 893.7668 76.2330 (ඇಉظ) (ຖεςοϓಉظ) (γϦΞϧ) (ඇಉظ) (ຖεςοϓಉظ) (γϦΞϧ) ical experiments have been performed with the following hardware and software configu- rations: a GPU Nvidia Geforce GTX470, a recent CPU Xeon E5620 clocked at 2.4 Ghz with 16 GB of RAM, CentOS Linux, NVIDIA CUDA SDK 3.2 and GNU C++ compiler 4.1.2. In what follows, we denote by V0 the sequential implementation, by V1 the parallel asynchronous version and by V2 the parallel synchronous one. 4.1 Analysis of a sample test problem: normalized Schwefel function A typical benchmark for testing optimization techniques is the normalized Schwefel function (see [31], for example): f (x x x) = − 1 n n i=1 xi sin |xi | , −512 ≤ xi ≤ 512, x x x = (x1, . . . , xn). (6) For any dimension n, the global minimum is achieved at the point x x x , the coordinates of which are x i = 420.968746, i = 1, . . . , n, and f (x x x ) = −418.982887. Table 1 illustrates the accuracy for the three versions of the SA algorithm: sequential (V0), asynchronous (V1) and synchronous (V2). For these three versions we use the following configuration: T0 = 1000, Tmin = 0.01, N = 100 and ρ = 0.99. For the parallel versions we use the choice b = 256 and g = 64, for the number of threads per block (block size) and the number of blocks per grid (grid size), respectively, so that the number of Markov chains is 16,384. With this configuration, the algorithm performs 1.8776×109 function evaluations in all cases. 123 code, following the ideas of Sect. 2.2, so that both codes ons and their performance can thus be compared. The numer- ormed with the following hardware and software configu- GTX470, a recent CPU Xeon E5620 clocked at 2.4 Ghz inux, NVIDIA CUDA SDK 3.2 and GNU C++ compiler by V0 the sequential implementation, by V1 the parallel the parallel synchronous one. blem: normalized Schwefel function ptimization techniques is the normalized Schwefel function |xi | , −512 ≤ xi ≤ 512, x x x = (x1, . . . , xn). (6) minimum is achieved at the point x x x , the coordinates of = 1, . . . , n, and f (x x x ) = −418.982887. y for the three versions of the SA algorithm: sequential (V0), nous (V2). For these three versions we use the following = 0.01, N = 100 and ρ = 0.99. For the parallel versions = 64, for the number of threads per block (block size) and rid size), respectively, so that the number of Markov chains n, the algorithm performs 1.8776×109 function evaluations 123 ( x ∈ Rn) Image source : [http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO_files/Page2530.htm]

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text