Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ami Wiesel

S³ Seminar
October 17, 2022

Ami Wiesel

(Hebrew University of Jerusalem)

Title — Deep learning solutions to estimation and detection

Abstract — In this talk, we will discuss the use of deep learning in statistical signal processing. We will address settings in which the classical solutions are intractable and will propose modern approaches based on neural networks. We will begin with parameter estimation and focus on learning non-linear minimum variance unbiased estimators (MVUE). Next, we will switch to detection theory and focus on learning classifiers with constant false alarm rates (CFAR). In both settings, we provide deep learning methods that achieve these goals in practice, as well as theory that highlights the relations to the classical likelihood based solutions.

References
Learning to estimate without bias https://arxiv.org/pdf/2110.12403.pdf

CFARnet: deep learning for target detection with constant false alarm rate https://arxiv.org/pdf/2208.02474.pdf

Biography — Ami Wiesel received the B.Sc. and M.Sc. degrees in electrical engineering from Tel-Aviv University, Tel-Aviv, Israel, in 2000 and 2002, respectively, and the Ph.D. degree in electrical engineering from the Technion - Israel Institute of Technology, Haifa, Israel, in 2007. He was a postdoctoral fellow in the University of Michigan, Ann Arbor, USA, during 2007–2009. He is currently an Associate Professor in the Rachel and Selim Benin School of Computer Science and Engineering, Hebrew University of Jerusalem, Israel.

S³ Seminar

October 17, 2022
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Ami Wiesel (HUJI) 1/36
    Deep learning solutions
    to estimation and detection
    Ami Wiesel
    The Hebrew University of Jerusalem (HUJI)
    October 17, 2022

    View Slide

  2. Ami Wiesel (HUJI) 2/36
    Thanks
    ▶ Tzvi Diskin
    ▶ Yiftach Beer
    ▶ Yoav Wald
    ▶ Uri Ukon
    ▶ Yonina Eldar
    ▶ Google

    View Slide

  3. Ami Wiesel (HUJI) 3/36
    2002 vs 2022
    ▶ Model
    ▶ Parameter estimation
    ▶ Hypothesis testing
    ▶ Algorithms
    ▶ (synthetic) Data
    ▶ Regression
    ▶ Classification
    ▶ Neural networks

    View Slide

  4. Ami Wiesel (HUJI) 4/36
    Learning without bias
    Detection with constant false alarm rate

    View Slide

  5. Ami Wiesel (HUJI) 5/36
    Outline
    Learning without bias
    Detection with constant false alarm rate

    View Slide

  6. Ami Wiesel (HUJI) 6/36
    Parameter estimation: 2002 vs 2022
    ▶ Model
    ▶ Maximum Likelihood
    ▶ Inference was slow
    ▶ Asymptotically unbiased
    ▶ Cramer Rao Bound for all
    ▶ (synthetic) Data
    ▶ Regression
    ▶ Inference is fast
    ▶ Fitted on training set
    ▶ Best if train=test

    View Slide

  7. Ami Wiesel (HUJI) 7/36
    Estimation metrics
    ▶ Classical metrics:
    BIASˆ
    y(y) = E [ˆ
    y (x) |y] − y
    VARˆ
    y(y) = E ∥ˆ
    y(x) − E [ˆ
    y (x)]∥2 y
    MSEˆ
    y(y) = E ∥ˆ
    y(x) − y∥2 y
    = VARˆ
    y(y) + ∥BIASˆ
    y(y)∥2
    ▶ Bayesian metric:
    BMSEˆ
    y = E [MSEˆ
    y(y)]

    View Slide

  8. Ami Wiesel (HUJI) 8/36
    Parameter estimation
    ▶ Classical approaches:


    XXXXXX
    X
    min
    ˆ
    y(·)
    MSEˆ
    y(y)
    ▶ Minimum Variance Unbiased Estimation (MVUE)
    ▶ Maximum Likelihood is asymptotically MVUE
    BIASˆ
    y
    (y) = 0 ∀ y
    ▶ Bayesian approach:
    ▶ Minimize BMSE
    min
    ˆ
    y(·)
    E [MSEˆ
    y
    (y)]
    Learning is Bayesian with respect to training set

    View Slide

  9. Ami Wiesel (HUJI) 9/36
    Bias Constrained Estimation (BCE)
    ▶ Standard learning
    DN = {yi , xi }N
    i=1
    min
    ˆ
    y∈H
    ˆ
    EN ∥ˆ
    y(x) − y∥2
    ▶ BCE: penalize the average squared bias
    DNM = yi , {xij }M
    j=1
    N
    i=1
    min
    ˆ
    y∈H
    ˆ
    ENM ∥ˆ
    y(x) − y∥2 + λˆ
    EN
    ˆ
    EM [ˆ
    y(x) − y|y]
    2

    View Slide

  10. Ami Wiesel (HUJI) 10/36
    Collecting a BCE dataset DNM
    Synthetic data Data augmentation
    ▶ Fictitious prior pfake(y)
    ▶ Generate {yi }N
    i=1
    ▶ For each yi generate {xj (yi )}M
    j=1
    Khobahi, Gabrielli, Naimipour,
    Dreifuerst,...

    View Slide

  11. Ami Wiesel (HUJI) 11/36
    Minimum Variance Unbiased Estimator (MVUE)
    Theorem
    Under technical conditions, BCE is asymptotically MVUE.
    ▶ Maximum Likelihood is also asymptotically MVUE.
    ▶ BCE approximates it using deep learning.
    ▶ Asymptotically in everything!
    ▶ Note that we penalize the average bias (rather than the max).
    ▶ Asymptotically, achieves Cramer Rao bound for any value.

    View Slide

  12. Ami Wiesel (HUJI) 12/36
    BCE with linear architecture
    Theorem
    ˆ
    y = Ax
    A = ˆ
    ENM yxT
    1
    λ + 1
    ˆ
    ENM xxT + 1 −
    1
    λ + 1
    R
    −1
    R = ˆ
    EN
    ˆ
    EM [x|y] ˆ
    EM xT |y
    Compare to the Bayesian linear MMSE (linear regression)
    A = ENM yxT ENM xxT
    −1

    View Slide

  13. Ami Wiesel (HUJI) 13/36
    BCE with linear architecture and linear model
    Theorem
    ˆ
    y = Ax
    x = Hy + n
    A = HT ˆ
    Σ−1
    x
    H +
    1
    λ + 1
    ˆ
    Σ−1
    y
    −1
    HT ˆ
    Σ−1
    x
    Compare to Weighted Least Squares estimator (= MVUE)
    A = HT ˆ
    Σ−1
    x
    H
    −1
    HT ˆ
    Σ−1
    x
    Gauss-Markov theorem.

    View Slide

  14. Ami Wiesel (HUJI) 14/36
    Experiment: SNR estimation
    My MSc with Messer in 2002.
    MMSE is best on training dist.
    BCE is always near MLE (EM).
    xi = ai h + ni
    ai = ±1 w.p.
    1
    2
    ni ∼ N(0, σ2)
    ρ =
    h2
    σ2

    View Slide

  15. Ami Wiesel (HUJI) 15/36
    Experiment: covariance estimation
    Structured covariance [Chaudhuri]
    p(x; Σ) ∼ N(0, Σ)


    1 + y1 0 0 1
    2
    y6 0
    0 1 + y2 0 1
    2
    y7 0
    0 0 1 + y3 0 1
    2
    y8
    1
    2
    y6
    1
    2
    y7 0 1 + y4
    1
    2
    y9
    0 0 1
    2
    y8
    1
    2
    y9 1 + y5


    EMMSE is best on training dist.
    BCE is always near MVUE.

    View Slide

  16. Ami Wiesel (HUJI) 16/36
    BCE for averaging in test time
    ▶ Example: sensor networks.
    ▶ Example: test-time augmentation.
    Averaging
    Unbiasedness is necessary for consistent averaging.
    BCE is asymptotically unbiased.

    View Slide

  17. Ami Wiesel (HUJI) 17/36
    Experiment: Augmentation in test time
    ▶ CIFAR10
    ▶ Random cropping and flipping
    ▶ Soft labels via distillation
    ▶ Both in train and in test
    ▶ BCE outperforms MMSE
    [Krizhevsky, Simonyan, Han,...]

    View Slide

  18. Ami Wiesel (HUJI) 18/36
    Fairness literature
    ▶ Related to “fairness” “out of distribution generalization”.
    ▶ Invariant Risk Minimization (IRM) by Arjovski
    ▶ Calibration and OOD by Wald, and more...
    ▶ We protect the labels themselves rather than the
    environments.

    View Slide

  19. Ami Wiesel (HUJI) 19/36
    Outline
    Learning without bias
    Detection with constant false alarm rate

    View Slide

  20. Ami Wiesel (HUJI) 20/36
    Parameter estimation: 2002 vs 2022
    ▶ Model
    ▶ Likelihood Ratio Test
    ▶ Neyman Pearson
    ▶ Constant false alarm rate
    ▶ (synthetic) Data
    ▶ Classification
    ▶ Minimum prob of error
    ▶ Works well if train=test

    View Slide

  21. Ami Wiesel (HUJI) 21/36
    Simple Hypothesis Testing
    Goal x ∼ p(x; y) y ∈ {0; 1}
    Design a detector T(x) ≷ γ that maximizes
    PTPR(z) = P(T(x) > γ; y = 1)
    subject to a false alarm constraint
    PFPR(z) = P(T(x) > γ; y = 0).
    LRT = classifier is optimal and easy to learn
    TLRT(x) = 2 log
    p(x; y = 1)
    p(x; y = 0)
    Easy to learn as a Bayes optimal classifier.
    Can also optimize AUC-ROC, e.g., Herschtal, Brefeld, etc.

    View Slide

  22. Ami Wiesel (HUJI) 22/36
    Composite Hypothesis Testing
    x ∼ p(x; z)
    y = 0 : z ∈ Z0 noise only
    y = 1 : z ∈ Z1 target
    (Ill-posed) Goal
    Design a detector T(x) ≷ γ that maximizes
    PTPR(z) = P(T(x) > γ; z ∈ Z1)
    subject to a constant false alarm rate (CFAR) constraint on
    PFPR(z) = P(T(x) > γ; z ∈ Z0)
    for all z ∈ Z0.

    View Slide

  23. Ami Wiesel (HUJI) 23/36
    Generalized Likelihood Ratio Test (GLRT)
    ▶ GLRT is the standard approach
    TGLRT(x) = 2 log
    maxz∈Z1
    p(x; z)
    maxz∈Z0
    p(x; z)
    ▶ Pros: Under regular asymptotic conditions,
    TGLRT(x) asymp

    χ2
    r
    (0) y = 0
    χ2
    r
    (λ) y = 1
    and has a constant false alarm rate (CFAR).
    ▶ Cons: likelihood, optimizations, asymptotic.

    View Slide

  24. Ami Wiesel (HUJI) 24/36
    Learning to detect targets
    Learning detectors
    ▶ Choose pfake(y) and pfake(z; y).
    ▶ For each i = 1, · · · , N:
    Generate yi .
    Generate zi given yi .
    Generate xi given zi .
    ▶ Solve
    min ˆ
    T∈T
    1
    N
    N
    i=1
    L( ˆ
    T(xi ), yi ).
    References: Ziemann, Kucer and Theiler, Girard, De La
    Mata-Moya and many more. . .

    View Slide

  25. Ami Wiesel (HUJI) 25/36
    Learning to detect targets is easy
    ▶ Also in composite hypothesis (unlike estimation).
    ▶ Target detection in Gaussian noise with unknown variance.
    ▶ A ̸= 0 and σ are deterministic and unknown.
    xi = A + σni i = 1, · · · , N

    View Slide

  26. Ami Wiesel (HUJI) 26/36
    But... learned classifiers are not CFAR!

    View Slide

  27. Ami Wiesel (HUJI) 27/36
    Learning CFAR detectors
    CFAR-NET
    ▶ Choose pfake(y) and pfake(z; y).
    ▶ For each i = 1, · · · , N:
    Generate yi .
    Generate zi given yi .
    Generate xi given zi .
    ▶ Solve min ˆ
    T∈T
    1
    N
    N
    i=1
    L( ˆ
    T(xi ), yi ) + α ˆ
    R( ˆ
    T).
    ˆ
    R( ˆ
    T) =
    i,˜
    i under y=0
    d { ˆ
    T(xij }M
    j=1
    ); { ˆ
    T(˜

    ij
    )}M
    j=1
    Ensures that ˆ
    T has the same distribution under all zi .

    View Slide

  28. Ami Wiesel (HUJI) 28/36
    Learning CFAR detectors II
    CFAR penalty
    ˆ
    R( ˆ
    T) =
    i,˜
    i under y=0
    d { ˆ
    T(xij }M
    j=1
    ); { ˆ
    T(˜

    ij
    )}M
    j=1
    ▶ Differentiable distance between distributions.
    ▶ We use MMD by Gretton et al:
    dMMD =
    1
    N2
    i,j
    k(Xi , Xj ) +
    1
    N2
    i,j
    k(Yi , Yj ) −
    2
    N2
    i,j
    k(Xi , Yj )
    ▶ Can also use a GAN like loss.

    View Slide

  29. Ami Wiesel (HUJI) 29/36
    Detection in i.i.d. noise with unknown variance
    ▶ Gaussian noise:
    ▶ non-Gaussian noise:

    View Slide

  30. Ami Wiesel (HUJI) 30/36
    Detection in correlated noise
    ▶ Gaussian noise covariance estimated using secondary data.
    ▶ Adaptive Matched Filter (AMF):
    x = As + w0
    xi = wi i = 1, · · · , n
    w0, wi ∼ N(0, Σ)
    TAMF(x) =
    sT ˆ
    Σ−1x
    2
    sT ˆ
    Σ−1s
    ˆ
    Σ =
    1
    n
    n
    i=1
    wi wT
    i
    ▶ Diagonally loaded (LAMF) for regularization Σ + λI.

    View Slide

  31. Ami Wiesel (HUJI) 31/36
    CFAR-NET in correlated noise
    ▶ LAMF, NET and CFARnet are better than AMF.
    ▶ Unlike CFARnet, the LAMF and NET are highly non-CFAR.

    View Slide

  32. Ami Wiesel (HUJI) 32/36
    Real Hyperspectral data
    ▶ Pavia University dataset.
    ▶ 10 labeled materials.
    ▶ Partial AUC in (0 − 0.05).
    material net CFARnet
    unlabeled 0.49 0.47
    1 0.31 0.38
    2 0.74 0.77
    3 0.33 0.35
    4 0.69 0.73
    5 0.27 0.34
    6 0.49 0.53
    7 0.47 0.72
    8 0.41 0.49
    9 0.88 0.9

    View Slide

  33. Ami Wiesel (HUJI) 33/36
    How is this related to the classics?
    Roughly speaking
    ▶ Simple tests: LRT = Bayes optimal classifier.
    ▶ Composite tests: GLRT = Bayes + CFAR.
    BayesCFAR :
    min ˆ
    T,γ
    Pr(1T≥γ ̸= y)
    s.t. ˆ
    T is CFAR
    Exact equivalence requires assumptions....

    View Slide

  34. Ami Wiesel (HUJI) 34/36
    GLRT solves Bayes CFAR
    BayesCFAR :
    min ˆ
    T,γ
    Pr(1T≥γ ̸= y)
    s.t. ˆ
    T is CFAR
    Theorem
    Consider an asymptotic linear Gaussian model with a large
    enough σ2
    r
    then there exists a threshold γ such that GLRT
    solves BayesCFAR.
    ▶ Linear model x = Hzr + n.
    ▶ Noise covariance is parameterized arbitrarily by zn.
    ▶ CFAR-NET approximates it using deep learning.

    View Slide

  35. Ami Wiesel (HUJI) 35/36
    Fairness literature
    ▶ CFAR-NET is very similar to
    ▶ Setting is slightly different.
    ▶ CFAR-NET is non-symmetric.
    ▶ CFAR-NET is cheaper in our settings (1D MMD).

    View Slide

  36. Ami Wiesel (HUJI) 36/36
    Conclusions
    ▶ Everyone is switching to deep learning.
    ▶ But don’t forget the classics.
    ▶ To make a regressor closer to MLE/MVUE, add a bias penalty.
    ▶ To make a classifier closer to GLRT, add a CFAR penalty.
    ▶ Thank you!

    View Slide