On the edge: State tomography, boundaries, and model selection

Slide 1

Slide 1 text

ON THE EDGE: STATE TOMOGRAPHY, BOUNDARIES, AND MODEL SELECTION Travis L Scholten Center for Computing Research Sandia National Labs CQuIC Talk University of New Mexico 2015 December 2 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. CCR Center for Computing Research 59

Slide 2

Slide 2 text

I am interested in tomography, the characterization of quantum systems. 58

Slide 3

Slide 3 text

Characterizing a system means estimating/ inferring something about it. ˆ ⇢ Tomography is a statistical inference problem Estimator (Things with hats) (Experiment) (Inference) Data Estimates POVM {Ej } 57

Slide 4

Slide 4 text

The “best” estimator is very accurate. Many measures of accuracy: We seek high accuracy relative to an unknown truth. Quantum Fidelity Relative Entropy Trace Distance Hilbert-Schmidt Distance 56

Slide 5

Slide 5 text

The “ideal” estimator would be very accurate… and would not ﬁt noise in the data. “Ideal” impossible to achieve! 55

Slide 6

Slide 6 text

We do not have truth, only data. How do we be accurate and ﬁt well? Model selection. Truth Data 54 Model selection: Find best model Fit parameters in model

Slide 7

Slide 7 text

People already use model selection in quantum information…but are they justiﬁed in doing so? Schwarz/van Enk - 2013 Error models in quantum computation: an application of model selection Guta et. al. - 2012 Rank-based model selection  for multiple ions quantum tomography van Enk/Blume-Kohout - 2013 When quantum tomography goes wrong:  drift of quantum sources and other errors 53

Slide 8

Slide 8 text

Model selection techniques currently used in quantum tomography may have problems. Such as loglikelihood ratio tests, or Akaike’s AIC.

Slide 9

Slide 9 text

Quantum information makes connections to statistical inference in many ways. Model = parametrized family  of probability distributions Probabilities via the Born rule: Pr(E) = tr(⇢E) Hypothesis = point in the model M H 51 H $ ⇢ M $ {⇢1, ⇢2, · · · }

Slide 10

Slide 10 text

Given some data, plausibility of a model/hypothesis  is quantiﬁed by its likelihood. What is the probability assigned to the data seen? L(H) = Pr(Data|H) Hypothesis: Just compute it! L ( M ) = max H2M Pr (Data |H ) Models: Just maximize! We use likelihoods to  compare models/hypotheses  and to make estimates. 50

Slide 11

Slide 11 text

Quantum information makes connections to statistical inference in many ways. State discrimination  is an instance of  simple hypothesis testing Which state is it? ⇢ Neyman-Pearson lemma tells us  this is the most powerful test. ( ⇢, ) = 2 log ⇣ L(⇢) L( ) ⌘ () 0 Choose the highest likelihood! 49 ⇢ () < 0

Slide 12

Slide 12 text

Quantum information makes connections to statistical inference in many ways. Entanglement veriﬁcation  is an instance of  composite hypothesis testing Which region is it? Separable Entangled 48 Choose the highest likelihood! ( HA , HB ) = 2 log ⇣ L(HA) L(HB) ⌘ HB () 0 HA () < 0 HA HB

Slide 13

Slide 13 text

Quantum information makes connections to statistical inference in many ways. State tomography  is an instance of  model ﬁtting Which parameters are best? Maximum likelihood estimation ˆ ⇢ ˆ ⇢ = argmax ⇢ L ( ⇢ ) 47 Choose the highest likelihood!

Slide 14

Slide 14 text

Quantum information makes connections to statistical inference in many ways. Z-diagonal state vs not  is an instance of  (nested) model selection Is the true state on the line? 0 =) MB ??? LLRS never negative! ( MA, MB) = 2 log ⇣ L(MA) L(MB) ⌘ MA MB 46 Choose the highest likelihood!

Slide 15

Slide 15 text

How will we investigate  model selection  in state tomography?

Slide 16

Slide 16 text

I am studying a paradigmatic problem: tomography of continuous-variable systems. Optical modes of light… …as Wigner functions  or density matrices. ˆ ⇢ 44

Slide 17

Slide 17 text

The models I consider are subspaces of an inﬁnite-dimensional Hilbert space. |0i |1i |2i |d 1i Other models are possible (e.g., by rank). Hd = Span (|0i, |1i, · · · |d 1i) Md = {⇢ | ⇢ 2 B(Hd), Tr(⇢) = 1, ⇢ 0} Models based on   low-energy approximation  and smoothness of   Wigner function. 43

Slide 18

Slide 18 text

The models I consider are nested inside one another. Md ⇢ Md+1 How can we use likelihoods  to compare them? We have to tackle  nested model selection 42 |0i |1i |2i |d 1i

Slide 19

Slide 19 text

We rethink using the LLRS for nested model selection, based on its null expected value. N h i Md Md0 ( Md, Md0 ) = 2 log ⇣ L(Md) L(Md0 ) ⌘ 41 Case One: Smaller Model is False

Slide 20

Slide 20 text

We rethink using the LLRS for nested model selection, based on its null expected value. Case Two: Both Models True (Null case) N h i Md Md0 40 ( Md, Md0 ) = 2 log ⇣ L(Md) L(Md0 ) ⌘

Slide 21

Slide 21 text

N Md Md0 39 We rethink using the LLRS for nested model selection, based on its null expected value. Devise a threshold to compare the observed value against  and rule out the smaller model thresh & h i

Slide 22

Slide 22 text

We compare the LLRS to its threshold value ( Md, Md0 ) = 2 log ⇣ L(Md) L(Md0 ) ⌘ 38 We rethink using the LLRS for nested model selection, based on its null expected value. Md0 () thresh Md Md0 N

Slide 23

Slide 23 text

Asymptotic convergence of the LLRS is a consequence of the Wilks Theorem. 1938: Wilks gives distribution of LLRS (Md, Md0 ) ⇠ 2 pd0 pd 37 Gaussian distribution of estimates One ﬂuctuating parameter = one unit of LLRS

Slide 24

Slide 24 text

Another model selection technique  relies on this result. Information criteria trade off between  ﬁtting data well and having high accuracy Use of Akaike’s AIC is common Relies on Wilks Theorem  (bias of estimator of KL divergence) My work helps us create a quantum information criterion 36

Slide 25

Slide 25 text

We now have potential tools for nested model selection  in tomography.  How do they perform?

Slide 26

Slide 26 text

I performed a Monte Carlo study of the LLRS and its behavior. Studied: - 17 true states (supported on low-energy subspace) 34

Slide 27

Slide 27 text

I performed a Monte Carlo study of the LLRS and its behavior. Studied: - 17 true states - 100 random datasets for each state (coherent state POVM) Data = {↵j | ↵j 2 C, Pr(↵j) = h↵j |⇢true |↵j i/⇡} 33

Slide 28

Slide 28 text

I performed a Monte Carlo study of the LLRS and its behavior. Studied: - 17 true states - 100 random datasets for each state (coherent state POVM) 32 - 10K to 100K samples for each dataset

Slide 29

Slide 29 text

I performed a Monte Carlo study of the LLRS and its behavior. Studied: - 17 true states - 100 random datasets for each state (coherent state POVM) 31 - 10K to 100K samples for each dataset - MLE over {2…10}-dimensional Hilbert spaces {M2, M3, · · · , M10 } Lots of supercomputer time!

Slide 30

Slide 30 text

The results were puzzling.

Slide 31

Slide 31 text

I checked four predictions of the Wilks Theorem on the behavior of the LLRS. Only one matched. Predictions: Asymptotic convergence Distribution independent of truth A particular expected value Distribution depends on reconstruction dimension 29

Slide 32

Slide 32 text

When the truth is in the smaller model, we observe asymptotic convergence. 28 Wilks: Expected value asymptotes Reality: Expected value asymptotes

Slide 33

Slide 33 text

Monte Carlo averages and expectation values do not agree at all. Wilks: Expected value increases with reconstruction dimension 27 Reality: Expected value essentially constant

Slide 34

Slide 34 text

Wilks theorem predictions for distribution of LLRS do not agree with simulation. Wilks: Distribution independent of true state 26 Reality: Distribution depends on true state

Slide 35

Slide 35 text

Wilks theorem predictions for distribution of LLRS do not agree with simulation. Wilks: Distribution depends strongly on dimension 25 Reality: Distribution depends weakly on dimension

Slide 36

Slide 36 text

Theorems are not “wrong”,  only “not applicable”. Why does the Wilks Theorem not apply?

Slide 37

Slide 37 text

State tomography is  on the edge. Let’s see why.

Slide 38

Slide 38 text

The ﬁrst edge is the positivity constraint. This shows up a lot in quantum information. ˆ ⇢ 0 ˆ ⇢  0 22

Slide 39

Slide 39 text

The ﬁrst edge is the positivity constraint. This shows up a lot in quantum information. Positivity “piles up” estimates  on the boundary ! Fluctuations normal  to boundary are diminished ! Estimator is biased ˆ ⇢ 0 21

Slide 40

Slide 40 text

The second edge is that the models I use nest on the boundary of one another. ˆ ⇢ = 0 B B B @ ⇢00 ⇢01 ⇢02 · · · ⇢10 ⇢11 ⇢12 · · · ⇢20 ⇢21 ⇢22 · · · . . . . . . . . . ... 1 C C C A 20

Slide 41

Slide 41 text

When the true state is mixed, you avoid the ﬁrst edge, but still run right into the second. 19 Boundaries are unavoidable in state tomography

Slide 42

Slide 42 text

The Wilks Theorem cannot be applied on boundaries - they introduce constraints. 18 ⇠ 2 1

Slide 43

Slide 43 text

The Wilks Theorem cannot be applied on boundaries - they introduce constraints. Boundaries change distribution of MLEs,  which changes distribution of LLRS. 17 ⇠ 1 2 2 1

Slide 44

Slide 44 text

State tomography is  on the edge. So must our  model selection be.

Slide 45

Slide 45 text

Proving a “qWilks theorem” would be hard, in general. Distribution of LLRS depends on true state Quantum state space hard to reason about Distribution depends on dimension + 6 3 15 1 + 2 = = ???????? 2 8

Slide 46

Slide 46 text

Can we ﬁnd a replacement for the Wilks theorem which respects boundaries? 14 Let’s talk about work in progress.

Slide 47

Slide 47 text

Can we ﬁnd a replacement for the Wilks theorem which respects boundaries? Quantum states = unitary DOF + classical simplexes 13 Gaussian distribution of estimates Wilks Theorem says: We model LLRS as: One ﬂuctuating parameter = one unit of LLRS

Slide 48

Slide 48 text

Can we ﬁnd a replacement for the Wilks theorem which respects boundaries? Quantum states = unitary DOF + classical simplexes LLRS depends on rank of true state ˆ ⇢ = 0 B B B @ ⇢00 ⇢01 ⇢02 · · · ⇢10 ⇢11 ⇢12 · · · ⇢20 ⇢21 ⇢22 · · · . . . . . . . . . ... 1 C C C A 2 units of LLRS per rank 12

Slide 49

Slide 49 text

Can we ﬁnd a replacement for the Wilks theorem which respects boundaries? LLRS depends on spectral ﬂuctuations Requires Monte Carlo for  simulating effect of boundaries Quantum states = unitary DOF + classical simplexes 11 h ( Md, Md+1) i = 2 rank( ⇢true) + Simplex Result

Slide 50

Slide 50 text

How well does this replacement work? Not as accurate as we expected…what is going on? 10

Slide 51

Slide 51 text

How does LLRS behave  when truth and estimates are close? Expand LLRS as function of  true state to second order (Md, Md0 ) = (⇢true, Md0 ) (⇢true, Md) Helpful trick: 09 Wilks and our model both rely on the  existence of a second-order Taylor series expansion. Is Taylor expansion  a good predictor of LLRS?

Slide 52

Slide 52 text

Curvature Error Addresses the question “If estimate close to truth,  what does the LLRS do?” The Taylor series allows us to calculate an approximate expected value. (⇢true, Md) = (⇢true, ˆ ⇢) ⇡ 0 + @ @⇢ ˆ ⇢ (⇢true ˆ ⇢) + 1 2 @2 @⇢2 ˆ ⇢ (⇢true ˆ ⇢)2 h (⇢true, ˆ ⇢)i ⇡ tr ✓⌧ @2L @⇢2 ˆ ⇢ |⇢true ˆ ⇢iihh⇢true ˆ ⇢| ◆ 08

Slide 53

Slide 53 text

In the absence of any boundaries,  the expected value equals that of Wilks. \ h ( ⇢0, ˆ ⇢ )i ⇡ tr ⇣ ˆ I (ˆ ⇢ ) Cov (ˆ ⇢ ) ⌘ ⇡ d 2 1 No boundaries = no bias in MLE Does not predict distribution, however h (⇢0, ˆ ⇢)i ⇡ tr ✓⌧ @2L @⇢2 ˆ ⇢ |⇢0 ˆ ⇢iihh⇢0 ˆ ⇢| ◆ 07

Slide 54

Slide 54 text

Taylor series seems inaccurate…is something wrong? Most likely. 06 Asymptotic limit? Error in code? We shall see!

Slide 55

Slide 55 text

What was the point of  all that math?

Slide 56

Slide 56 text

We are going to have a way to do nested model selection in quantum tomography! Cannot use Wilks (too high!) Replacement: Unitary DOF + Simplex (Still too high!) 04 Models have boundaries! Construct Taylor series (reduces to Wilks) Use Taylor series to determine when our model will work Devise quantum replacement for Wilks

Slide 57

Slide 57 text

What is next?

Slide 58

Slide 58 text

There are many ways forward. Use these results to create estimator of expected value Make a quantum information criterion A model selection rule for displaced/squeezed states Apply active subspace methods to speed up optimization What’s with compressed sensing and model selection? 02

Slide 59

Slide 59 text

A year ago, I thought  model selection using  the LLRS was easy…  Today, I am certain it is  vastly harder than I  (and others!) thought.

Slide 60

Slide 60 text

Model selection in  quantum state tomography  is hard because we  have to deal with boundaries.