Susovan Pal

Convergence of Mean Shift Algorithm with radially symmetric kernels with
suﬃciently large bandwidth Susovan PAL December 10, 2021 Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Introduction to Mean Shift(MS) algorithm Input: discrete data generated by
a random variable with unknown probability density (PDF) Goal: to locate the maxima or local maximas of the probability density from the discrete data Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Mean Shift: uses Mean Shift is used for clustering Image
segmentation requires clustering, so Mean Shift is useful here, below (a) is the original image, (b), (c) are segmented Multivalued regression, e.g. for covariates/features following a Gaussian mixture model Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

High level description of the algorithm Start with a data
point y1 and ﬁx a region of interest centered at y1 Calculate the (weighted) sample mean in that region, weighted by the kernel depending on the distance. Far away points should get lower weights, near ones higher weights. The sample mean should be close to where points are relatively dense, i.e. the PDF is high Iteratively shift the center of the region of interest to this sample mea, hence ’mean shift’. The idea is that these sample means should approach the modes of the PDF. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Mean shift: visualization of the previous algorithm Susovan PAL Convergence
of Mean Shift Algorithm with radially symmetric ker

Mean Shift: setup {x1, x2, ...xn} be a set of
n independent data points in Rd drawn from some unknown distribution. A radially symmetric kernel is a function K : Rd → [0, ∞) so that: K ∈ L1(Rd ), Rd K = 1. K(x) = k(||x||2) for some suitable non-increasing k : [0, ∞) → [0, ∞). Note that for popular Gaussian kernels, k(r) := C.exp(−r/2), C is a constant to make the total integral 1. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Mean Shift: description of the algorithm I Since the goal
is to find the modes of the KDE of the unknown PDF, we define first the KDE as: fh,k(x) ≡ f (x) := cN 1 n n i=1 k || x − xi h ||2 (1) Above, h is referred as ’bandwidth’, which serves the purpose of radius, but isn’t the radius itself. Small bandwidth means the weights decrease fast, so essentially points in a small distance from the center of region of interests are considered to calculate sample mean. Large bandwiths means farther points are also considered to calculate sample mean. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Mean Shift: description of the algorithm II Assuming that k
is diﬀerentiable with derivative k , taking the gradient of (1) w.r.t. x yields: (g = −k ) ∇fh,k(x) = 2cN nh2 n i=1 g(|| x − xi h ||2)    n i=1 xi g(||x−xi h ||2) n i=1 g(||x−xi h ||2) − x    (2) The second term is called the mean shift (MS) vector, mh,g (x), and hence (2) can be written in the form: ∇fh,k(x) = ∇fh,g (x) 2cN h2c N mh,g (x) (3) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Mean Shift: description of the algorithm III The modes of
the estimated density function fh,k are located at the zeros of the gradient function ∇fh,k = 0, i.e., . Equating (2) to zero reveals that the modes of the estimated pdf are ﬁxed points of the following function: mh,g (x) + x = n i=1 xi g(||x−xi h ||2) n i=1 g(||x−xi h ||2) (4) The MS algorithm initializes the mode estimate sequence to be one of the observed data. The mode estimate yj in the j- th iteration is updated as: yj+1 = mh,g (yj ) + yj = n i=1 xi g(||yj −xi h ||2) n i=1 g(||yj −xi h ||2) (5) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Convergence of Mean Shift : brief history The MS algorithm
iterates until the norm of the difference between two consecutive mode estimates becomes less than some predefined threshold, i.e. mh,g is small. Proof of convergence of {yj } is unknown for general kernels Proof in one dimension for general kernels is known (Ghassabeh) Proof in arbitrary dimension for Gaussian kernel is known only for sufficiently large bandwidth (Ghassabeh) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Convergence of Mean Shift : strategy of proof 1 Some
facts already known about the proof of convergence: Theorem 1 There exists h0 > 0 so that, for each h ≥ h0, the Hesssian of fh,k, denoted by Hess(fh,k) is nonsingular at all the stationary points of fh,k, i.e. where ∇fh,k vanishes. In particular, h0 := max1≤i≤n xi . Theorem 2 If the Hessian matrix of any C2 function at its stationary points is nonsingular, the stationary points are isolated. (Proof: inverse function theorem on ∇fh,k ) Theorem 3 Let xi ∈ Rd , i = 1, 2...n Assume that the stationary points of the estimated pdf are isolated. Then the mode estimate sequence {yj } in (5) converges. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Convergence of Mean Shift : strategy of proof 2 Ghassabeh
proved Theorem 1 for Gaussian kernels: ∃h0 > 0, ∀h ≥ h0, Hess(fh,k) is nonsingular for Gaussian kernels with k(r) = Ce−r/2. This made the subsequent Theorem 2, Theorem 3 go through. This means we just need to be able to generalize Theorem 1 in order to ensure {yj } converges. This is precisely what we do. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Convergence of Mean Shift : new theorem Below is a
slightly weaker version of generalization of Theorem 1: Theorem 4 (P.) Assume that K(.) = k( . 2) is strictly positive definite with the first two derivatives of k finite at 0. Then ∃h0 > 0 so that ∀h ≥ h0, Hess(fh,k) is nonsingular, therefore by Theorem 2 and Theorem 3 the mode estimate sequence {yi = yi (h)} corresponding to Kh,k, converges ∀h ≥ h0. Remark 1 Note that this theorem is slighly weaker (as of now) in the sense that it doesn’t explicitly find h0, unlike Ghassebeh’s result where h0 := max1≤i≤n xi . Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Proof of convergence of mean shift for general kernels Our
proof will require an alternate characterization of positive definite kernel matrices, so we state some definitions below: Definition 5 (Completely monotone functions) A function k : [0, ∞) → R is called completely monotone if: k ∈ C0([0, ∞)) ∩ C∞((0, ∞)) (2)(−1)l k(l) ≥ 0 Examples: k(r) = rs, s ≤ 0, e−sr , s ≥ 0, ln(1 + 1 r ), e1/r . If f , g are completely monotone, c, d > 0, then cf + dg, fg are also completely monotone! Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Alternative characterizations of completely monotone function I Next, we state
two alternate characterizations of completely monotone functions, on in each slide. Theorem 6 (Hausdorﬀ-Bernstein-Widder theorem: Laplace transform characterization of completely monotone functions) A function k : [0, ∞) → R completely monotone if and only it is the Laplace transform of a ﬁnite non-negative Borel measure µ on [0, ∞), i.e. k is of the form: k(r) = Lµ(r) = ∞ 0 e−rtdµ(t) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Alternative characterizations of completely monotone function II First recall that
a function K is called non-negative definite if the corresponding kernel matrix K := [K(xi − xj )]1≤i,j≤n positive definite. This amounts to the fact that K := [k(||xi − xj ||2)1≤i,j≤n] is non-negative definite. Theorem 7 (Non-negative definite kernel matrices are of constructed from completely monotone functions) A function k is completely monotone on [0, ∞) if and only if K(x) := k(||x||2) is non-negative definite and radial on Rd ∀d ∈ N That it the matrix K with Kij := K(xi − xj ) = k( xi − xj 2) is non-negative definite. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Connection to our problem Thanks to the previous two theorems,
one can write any non-negative definite radially symmetric kernel matrix K = (Kij ) as Kij = ∞ 0 e−t xi −xj 2 dµ(t) for some finite Borel measure µ. This should be useful! Something more is true: Theorem 8 A non-constant function k is completely monotone on [0, ∞) if and only if Φ(x) := k(||x||2) is strictly positive definite and radial on Rd ∀d ∈ N. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Connection to our problem And ﬁnally, something more: Theorem 9
A non-constant function k : [0, ∞) → R completely monotone if and only it is the Laplace transform of a ﬁnite non-negative Borel measure µ on [0, ∞) not of the form cδ0, c > 0, i.e. k is of the form: k(r) = Lµ(r) = ∞ 0 e−rtdµ(t) (6) where µ is not of the form cδ0, c > 0 Note that for Gaussian kernels, k(r) = e−r , µ := δ1. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Details of the proof of Theorem 5, slide 12/13 -
Part I Start by computing the gradient and Hessian of the KDE; we will abbreviate the notation fh,k by f . Recall: ˆ f (x) = cN 1 n n i=1 k || x − xi h ||2 Taking gradient w.r.t. x, ∇ˆ f (x) = cN n i=1 xi − x h2 ∞ 0 2t exp −t|| x − xi h ||2 dµ(t) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Part II Hess ˆ f (x) = cN n i=1 ∞ 0 2t h2 −ID×D + 2t h2 (x − xi )(x − xi )T exp −t||x−xi h ||2 dµ( Deﬁne the following two functions, respectively: C : Rd → R by: C(x) := ∞ 0 2t n i=1 exp −t|| x − xi h ||2 dµ(t) A : Rd → Rd×d by: A(x) := n i=1 ∞ 0 4t2 (x − xi )(x − xi )T exp −t|| x − xi h ||2 dµ(t) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Part III For a motivation, note the right/later part of the Hessian H(x) is nothing but: cNh−2A(x)h−2, and the left/earlier part is nothing but: −cNh−2C(x), i.e.: H(x) = −cN C(x) h2 ID×D + cN A(x) h4 Next we argue by contradiction. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Part IV If possible, assume that, H is singular (not full rank) So H has a zero eigenvalue v ∈ RD×1 so that H(x)v = 0 ∈ RD×1. Use the above expression for H above: H(x) = −cN C(x) h2 ID×D + cN A(x) h4 This yields: −cN C(x) h2 ID×D + cN A(x) h4 v = 0 =⇒ A(x)v = h2C(x)v =⇒ n i=1 (x − xi )(x − xi )T ∞ 0 4t2exp −t||x−xi h ||2 dµ(t) v = h2 n i=1 ∞ 0 2t exp −t||x−xi h ||2 dµ(t) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Part IV Let us abbreviate notation and write the above as: n i=1 (x − xi )(x − xT i )Ji (h)v = −2h2 n i=1 Ii (h)v Where: Ii (h) := ∞ 0 2texp −t|| x − xi h ||2 dµ(t) Ji (h) := ∞ 0 4t2exp −t|| x − xi h ||2 dµ(t) Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Part V Note that as h → ∞, each Ii (h) → ∞ 0 tdµ(t) = −k (0) and Ji (h) → 4k (0), so the above equation becomes: 4 n i=1 (x − xi )(x − xT i )k (0) = −2h2 n i=1 k (0) Gives a contradiction as h → ∞, because the first two derivatives of K at 0 being finite implies that k (0), k (0) are finite. So there exists a sufficiently large h0 > 0 so that ∀h ≥ h0, the above equation won’t hold, meaning that Hessian H(x) has no zero eigenvalue. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Practical usefulness of the result Choose a large bandwidth h
is like considering too many points to calculate the sample mean, so this may help us not get a mode estimate sequence converging to a local point of density So clusters may not be found correctly On the other hand choosing too small h won’t also help us locate the local sample mean = local centroid of density correctly Hence the result may not be useful Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Possible future work Finding a necessary and suﬃcient criterion for
convergence It has been shown that yj+1 − yj → 0, but this doesn’t mean it will be a Cauchy sequence, e.g. uj := sin( √ j) is a bounded, non-Cauchy, non-convergent sequence where uj+1 − uj → 0. We can still leverage the property. Convergence of mean shift on manifolds should be a mathematically interesting theory to look into, as KDE on manifolds with nice properties is an active area of research. Susovan PAL Convergence of Mean Shift Algorithm with radially symmetric ker

Thanks THANK YOU!!! Susovan PAL Convergence of Mean Shift Algorithm
with radially symmetric ker

Susovan Pal

Susovan Pal

S³ Seminar

More Decks by S³ Seminar

Featured

Transcript

Convergence of Mean Shift Algorithm with radially symmetric kernels with

Introduction to Mean Shift(MS) algorithm Input: discrete data generated by

Mean Shift: uses Mean Shift is used for clustering Image

High level description of the algorithm Start with a data

Mean shift: visualization of the previous algorithm Susovan PAL Convergence

Mean Shift: setup {x1, x2, ...xn} be a set of

Mean Shift: description of the algorithm I Since the goal

Mean Shift: description of the algorithm II Assuming that k

Mean Shift: description of the algorithm III The modes of

Convergence of Mean Shift : brief history The MS algorithm

Convergence of Mean Shift : strategy of proof 1 Some

Convergence of Mean Shift : strategy of proof 2 Ghassabeh

Convergence of Mean Shift : new theorem Below is a

Proof of convergence of mean shift for general kernels Our

Alternative characterizations of completely monotone function I Next, we state

Alternative characterizations of completely monotone function II First recall that

Connection to our problem Thanks to the previous two theorems,

Connection to our problem And ﬁnally, something more: Theorem 9

Details of the proof of Theorem 5, slide 12/13 -

Details of the proof of Theorem 5, slide 12/13 -

Details of the proof of Theorem 5, slide 12/13 -

Details of the proof of Theorem 5, slide 12/13 -

Details of the proof of Theorem 5, slide 12/13 -

Details of the proof of Theorem 5, slide 12/13 -

Practical usefulness of the result Choose a large bandwidth h

Possible future work Finding a necessary and suﬃcient criterion for

Thanks THANK YOU!!! Susovan PAL Convergence of Mean Shift Algorithm