Ami Wiesel

Ami Wiesel (HUJI) 1/36 Deep learning solutions to estimation and
detection Ami Wiesel The Hebrew University of Jerusalem (HUJI) October 17, 2022

Ami Wiesel (HUJI) 2/36 Thanks ▶ Tzvi Diskin ▶ Yiftach
Beer ▶ Yoav Wald ▶ Uri Ukon ▶ Yonina Eldar ▶ Google

Ami Wiesel (HUJI) 3/36 2002 vs 2022 ▶ Model ▶
Parameter estimation ▶ Hypothesis testing ▶ Algorithms ▶ (synthetic) Data ▶ Regression ▶ Classification ▶ Neural networks

Ami Wiesel (HUJI) 4/36 Learning without bias Detection with constant
false alarm rate

Ami Wiesel (HUJI) 5/36 Outline Learning without bias Detection with
constant false alarm rate

Ami Wiesel (HUJI) 6/36 Parameter estimation: 2002 vs 2022 ▶
Model ▶ Maximum Likelihood ▶ Inference was slow ▶ Asymptotically unbiased ▶ Cramer Rao Bound for all ▶ (synthetic) Data ▶ Regression ▶ Inference is fast ▶ Fitted on training set ▶ Best if train=test

Ami Wiesel (HUJI) 7/36 Estimation metrics ▶ Classical metrics: BIASˆ
y(y) = E [ˆ y (x) |y] − y VARˆ y(y) = E ∥ˆ y(x) − E [ˆ y (x)]∥2 y MSEˆ y(y) = E ∥ˆ y(x) − y∥2 y = VARˆ y(y) + ∥BIASˆ y(y)∥2 ▶ Bayesian metric: BMSEˆ y = E [MSEˆ y(y)]

Ami Wiesel (HUJI) 8/36 Parameter estimation ▶ Classical approaches:
XXXXXX X min ˆ y(·) MSEˆ y(y) ▶ Minimum Variance Unbiased Estimation (MVUE) ▶ Maximum Likelihood is asymptotically MVUE BIASˆ y (y) = 0 ∀ y ▶ Bayesian approach: ▶ Minimize BMSE min ˆ y(·) E [MSEˆ y (y)] Learning is Bayesian with respect to training set

Ami Wiesel (HUJI) 9/36 Bias Constrained Estimation (BCE) ▶ Standard
learning DN = {yi , xi }N i=1 min ˆ y∈H ˆ EN ∥ˆ y(x) − y∥2 ▶ BCE: penalize the average squared bias DNM = yi , {xij }M j=1 N i=1 min ˆ y∈H ˆ ENM ∥ˆ y(x) − y∥2 + λˆ EN ˆ EM [ˆ y(x) − y|y] 2

Ami Wiesel (HUJI) 10/36 Collecting a BCE dataset DNM Synthetic
data Data augmentation ▶ Fictitious prior pfake(y) ▶ Generate {yi }N i=1 ▶ For each yi generate {xj (yi )}M j=1 Khobahi, Gabrielli, Naimipour, Dreifuerst,...

Ami Wiesel (HUJI) 11/36 Minimum Variance Unbiased Estimator (MVUE) Theorem
Under technical conditions, BCE is asymptotically MVUE. ▶ Maximum Likelihood is also asymptotically MVUE. ▶ BCE approximates it using deep learning. ▶ Asymptotically in everything! ▶ Note that we penalize the average bias (rather than the max). ▶ Asymptotically, achieves Cramer Rao bound for any value.

Ami Wiesel (HUJI) 12/36 BCE with linear architecture Theorem ˆ
y = Ax A = ˆ ENM yxT 1 λ + 1 ˆ ENM xxT + 1 − 1 λ + 1 R −1 R = ˆ EN ˆ EM [x|y] ˆ EM xT |y Compare to the Bayesian linear MMSE (linear regression) A = ENM yxT ENM xxT −1

Ami Wiesel (HUJI) 13/36 BCE with linear architecture and linear
model Theorem ˆ y = Ax x = Hy + n A = HT ˆ Σ−1 x H + 1 λ + 1 ˆ Σ−1 y −1 HT ˆ Σ−1 x Compare to Weighted Least Squares estimator (= MVUE) A = HT ˆ Σ−1 x H −1 HT ˆ Σ−1 x Gauss-Markov theorem.

Ami Wiesel (HUJI) 14/36 Experiment: SNR estimation My MSc with
Messer in 2002. MMSE is best on training dist. BCE is always near MLE (EM). xi = ai h + ni ai = ±1 w.p. 1 2 ni ∼ N(0, σ2) ρ = h2 σ2

Ami Wiesel (HUJI) 15/36 Experiment: covariance estimation Structured covariance [Chaudhuri]
p(x; Σ) ∼ N(0, Σ)   1 + y1 0 0 1 2 y6 0 0 1 + y2 0 1 2 y7 0 0 0 1 + y3 0 1 2 y8 1 2 y6 1 2 y7 0 1 + y4 1 2 y9 0 0 1 2 y8 1 2 y9 1 + y5   EMMSE is best on training dist. BCE is always near MVUE.

Ami Wiesel (HUJI) 16/36 BCE for averaging in test time
▶ Example: sensor networks. ▶ Example: test-time augmentation. Averaging Unbiasedness is necessary for consistent averaging. BCE is asymptotically unbiased.

Ami Wiesel (HUJI) 17/36 Experiment: Augmentation in test time ▶
CIFAR10 ▶ Random cropping and flipping ▶ Soft labels via distillation ▶ Both in train and in test ▶ BCE outperforms MMSE [Krizhevsky, Simonyan, Han,...]

Ami Wiesel (HUJI) 18/36 Fairness literature ▶ Related to “fairness”
“out of distribution generalization”. ▶ Invariant Risk Minimization (IRM) by Arjovski ▶ Calibration and OOD by Wald, and more... ▶ We protect the labels themselves rather than the environments.

Ami Wiesel (HUJI) 19/36 Outline Learning without bias Detection with
constant false alarm rate

Ami Wiesel (HUJI) 20/36 Parameter estimation: 2002 vs 2022 ▶
Model ▶ Likelihood Ratio Test ▶ Neyman Pearson ▶ Constant false alarm rate ▶ (synthetic) Data ▶ Classification ▶ Minimum prob of error ▶ Works well if train=test

Ami Wiesel (HUJI) 21/36 Simple Hypothesis Testing Goal x ∼
p(x; y) y ∈ {0; 1} Design a detector T(x) ≷ γ that maximizes PTPR(z) = P(T(x) > γ; y = 1) subject to a false alarm constraint PFPR(z) = P(T(x) > γ; y = 0). LRT = classifier is optimal and easy to learn TLRT(x) = 2 log p(x; y = 1) p(x; y = 0) Easy to learn as a Bayes optimal classifier. Can also optimize AUC-ROC, e.g., Herschtal, Brefeld, etc.

Ami Wiesel (HUJI) 22/36 Composite Hypothesis Testing x ∼ p(x;
z) y = 0 : z ∈ Z0 noise only y = 1 : z ∈ Z1 target (Ill-posed) Goal Design a detector T(x) ≷ γ that maximizes PTPR(z) = P(T(x) > γ; z ∈ Z1) subject to a constant false alarm rate (CFAR) constraint on PFPR(z) = P(T(x) > γ; z ∈ Z0) for all z ∈ Z0.

Ami Wiesel (HUJI) 23/36 Generalized Likelihood Ratio Test (GLRT) ▶
GLRT is the standard approach TGLRT(x) = 2 log maxz∈Z1 p(x; z) maxz∈Z0 p(x; z) ▶ Pros: Under regular asymptotic conditions, TGLRT(x) asymp ∼ χ2 r (0) y = 0 χ2 r (λ) y = 1 and has a constant false alarm rate (CFAR). ▶ Cons: likelihood, optimizations, asymptotic.

Ami Wiesel (HUJI) 24/36 Learning to detect targets Learning detectors
▶ Choose pfake(y) and pfake(z; y). ▶ For each i = 1, · · · , N: Generate yi . Generate zi given yi . Generate xi given zi . ▶ Solve min ˆ T∈T 1 N N i=1 L( ˆ T(xi ), yi ). References: Ziemann, Kucer and Theiler, Girard, De La Mata-Moya and many more. . .

Ami Wiesel (HUJI) 25/36 Learning to detect targets is easy
▶ Also in composite hypothesis (unlike estimation). ▶ Target detection in Gaussian noise with unknown variance. ▶ A ̸= 0 and σ are deterministic and unknown. xi = A + σni i = 1, · · · , N

Ami Wiesel (HUJI) 26/36 But... learned classifiers are not CFAR!

Ami Wiesel (HUJI) 27/36 Learning CFAR detectors CFAR-NET ▶ Choose
pfake(y) and pfake(z; y). ▶ For each i = 1, · · · , N: Generate yi . Generate zi given yi . Generate xi given zi . ▶ Solve min ˆ T∈T 1 N N i=1 L( ˆ T(xi ), yi ) + α ˆ R( ˆ T). ˆ R( ˆ T) = i,˜ i under y=0 d { ˆ T(xij }M j=1 ); { ˆ T(˜ x˜ ij )}M j=1 Ensures that ˆ T has the same distribution under all zi .

Ami Wiesel (HUJI) 28/36 Learning CFAR detectors II CFAR penalty
ˆ R( ˆ T) = i,˜ i under y=0 d { ˆ T(xij }M j=1 ); { ˆ T(˜ x˜ ij )}M j=1 ▶ Differentiable distance between distributions. ▶ We use MMD by Gretton et al: dMMD = 1 N2 i,j k(Xi , Xj ) + 1 N2 i,j k(Yi , Yj ) − 2 N2 i,j k(Xi , Yj ) ▶ Can also use a GAN like loss.

Ami Wiesel (HUJI) 29/36 Detection in i.i.d. noise with unknown
variance ▶ Gaussian noise: ▶ non-Gaussian noise:

Ami Wiesel (HUJI) 30/36 Detection in correlated noise ▶ Gaussian
noise covariance estimated using secondary data. ▶ Adaptive Matched Filter (AMF): x = As + w0 xi = wi i = 1, · · · , n w0, wi ∼ N(0, Σ) TAMF(x) = sT ˆ Σ−1x 2 sT ˆ Σ−1s ˆ Σ = 1 n n i=1 wi wT i ▶ Diagonally loaded (LAMF) for regularization Σ + λI.

Ami Wiesel (HUJI) 31/36 CFAR-NET in correlated noise ▶ LAMF,
NET and CFARnet are better than AMF. ▶ Unlike CFARnet, the LAMF and NET are highly non-CFAR.

Ami Wiesel (HUJI) 32/36 Real Hyperspectral data ▶ Pavia University
dataset. ▶ 10 labeled materials. ▶ Partial AUC in (0 − 0.05). material net CFARnet unlabeled 0.49 0.47 1 0.31 0.38 2 0.74 0.77 3 0.33 0.35 4 0.69 0.73 5 0.27 0.34 6 0.49 0.53 7 0.47 0.72 8 0.41 0.49 9 0.88 0.9

Ami Wiesel (HUJI) 33/36 How is this related to the
classics? Roughly speaking ▶ Simple tests: LRT = Bayes optimal classifier. ▶ Composite tests: GLRT = Bayes + CFAR. BayesCFAR : min ˆ T,γ Pr(1T≥γ ̸= y) s.t. ˆ T is CFAR Exact equivalence requires assumptions....

Ami Wiesel (HUJI) 34/36 GLRT solves Bayes CFAR BayesCFAR :
min ˆ T,γ Pr(1T≥γ ̸= y) s.t. ˆ T is CFAR Theorem Consider an asymptotic linear Gaussian model with a large enough σ2 r then there exists a threshold γ such that GLRT solves BayesCFAR. ▶ Linear model x = Hzr + n. ▶ Noise covariance is parameterized arbitrarily by zn. ▶ CFAR-NET approximates it using deep learning.

Ami Wiesel (HUJI) 35/36 Fairness literature ▶ CFAR-NET is very
similar to ▶ Setting is slightly different. ▶ CFAR-NET is non-symmetric. ▶ CFAR-NET is cheaper in our settings (1D MMD).

Ami Wiesel (HUJI) 36/36 Conclusions ▶ Everyone is switching to
deep learning. ▶ But don’t forget the classics. ▶ To make a regressor closer to MLE/MVUE, add a bias penalty. ▶ To make a classifier closer to GLRT, add a CFAR penalty. ▶ Thank you!

Ami Wiesel

Ami Wiesel

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript