Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Office of Research Illinois Institute of Technology hickernell@iit.edu mypages.iit.edu/~hickernell Thanks to the Illinois Tech SIAM Student Chapter for the invitation Thanks to many students and collaborators for teaching me Slides available at speakerdeck.com/riesz-representation-theorem Please interrupt and ask questions October 19, 2021
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Main Idea What seems obvious for two-dimensional vectors becomes a power tool for numerical analysis 2/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Main Idea What seems obvious for two-dimensional vectors becomes a power tool for numerical analysis Obvious If LINEAR : R2 → R is any linear, real-valued function, meaning, LINEAR(cf + h) = c LINEAR(f) + LINEAR(h) ∀f, h ∈ R2, c ∈ R, then LINEAR(f) can be represented as an inner product: ∃ coefficient g ∈ R2 such that LINEAR(f) = g1 f1 + g2 f2 = gTf =: ⟨g, f⟩ ≡ g • f ∀f ∈ R2. 2/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Main Idea What seems obvious for two-dimensional vectors becomes a power tool for numerical analysis Obvious If LINEAR : R2 → R is any linear, real-valued function, meaning, LINEAR(cf + h) = c LINEAR(f) + LINEAR(h) ∀f, h ∈ R2, c ∈ R, then LINEAR(f) can be represented as an inner product: ∃ coefficient g ∈ R2 such that LINEAR(f) = g1 f1 + g2 f2 = gTf =: ⟨g, f⟩ ≡ g • f ∀f ∈ R2. Power Tool Generalization—Riesz Representation Theorem—gives error bounds for numerical algorithms, e.g., LINEAR(f) [0,1]d f(t) dt − 1 n n i=1 f(xi ) ⩽ BAD(x1 , . . . , xn ) BAD(f) 2/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Main Idea What seems obvious for two-dimensional vectors becomes a power tool for numerical analysis Obvious If LINEAR : R2 → R is any linear, real-valued function, meaning, LINEAR(cf + h) = c LINEAR(f) + LINEAR(h) ∀f, h ∈ R2, c ∈ R, then LINEAR(f) can be represented as an inner product: ∃ coefficient g ∈ R2 such that LINEAR(f) = g1 f1 + g2 f2 = gTf =: ⟨g, f⟩ ≡ g • f ∀f ∈ R2. Power Tool Generalization—Riesz Representation Theorem—gives error bounds for numerical algorithms, e.g., LINEAR(f) [0,1]d f(t) dt average of f e.g., option price − 1 n n i=1 f(xi ) average of f values e.g., payoffs under various scenarios ⩽ BAD(x1 , . . . , xn ) BAD(f) 2/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Main Idea What seems obvious for two-dimensional vectors becomes a power tool for numerical analysis Obvious If LINEAR : R2 → R is any linear, real-valued function, then LINEAR(f) can be represented as an inner product Power Tool Generalization—Riesz Representation Theorem—gives error bounds for numerical algorithms, e.g., LINEAR(f) [0,1]d f(t) dt average of f e.g., option price − 1 n n i=1 f(xi ) average of f values e.g., payoffs under various scenarios ⩽ BAD(x1 , . . . , xn ) Concentrate on choosing the xi well BAD(f) 2/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Why Is this My Favorite Theorem? [0,1]d f(t) dt − 1 n n i=1 f(xi ) ⩽ BAD(x1 , . . . , xn ) BAD(f), My most cited paper [1] according to Google Scholar is a simple application of the Riesz Representation Theorem 3/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for R2 Theorem If LINEAR : R2 → R is any linear, real-valued function, and ⟨h, f⟩ := h1 f1 + h2 f2 = hTf, then there exists a unique representer g ∈ R2, dependent on LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ R2. Proof. Existence. Let e1 = (1, 0)T and e2 = (0, 1)T. Then for all f = (f1 , f2 )T, LINEAR(f) = LINEAR e1 f1 + e2 f2 = LINEAR(e1 )f1 + LINEAR(e2 )f2 by linearity = ⟨g, f⟩ , where g = LINEAR(e1 ), LINEAR(e2 ) T 4/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for R2 Theorem If LINEAR : R2 → R is any linear, real-valued function, and ⟨h, f⟩ := h1 f1 + h2 f2 = hTf, then there exists a unique representer g ∈ R2, dependent on LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ R2. Proof. Existence. Let e1 = (1, 0)T and e2 = (0, 1)T. Then for all f = (f1 , f2 )T, LINEAR(f) = LINEAR e1 f1 + e2 f2 = LINEAR(e1 )f1 + LINEAR(e2 )f2 by linearity = ⟨g, f⟩ , where g = LINEAR(e1 ), LINEAR(e2 ) T Example. If LINEAR(e1 ) = −3 and LINEAR(e2 ) = 2, then LINEAR(f) = −3f1 , +2f2 = (−3, 2)T, f . 4/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for R2 Theorem If LINEAR : R2 → R is any linear, real-valued function, and ⟨h, f⟩ := h1 f1 + h2 f2 = hTf, then there exists a unique representer g ∈ R2, dependent on LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ R2. Proof. Existence. Done. Uniqueness. If g and ˜ g are both representers, i.e., LINEAR(f) = ⟨g, f⟩ = ⟨ ˜ g, f⟩ for all f ∈ Rd, then 0 = LINEAR(g − ˜ g) − LINEAR(g − ˜ g) = ⟨g, g − ˜ g⟩ − ⟨ ˜ g, g − ˜ g⟩ = ⟨g − ˜ g, g − ˜ g⟩ = ∥g − ˜ g∥2 so g − ˜ g = 0 and g = ˜ g. 4/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for a Hilbert Space (V, ⟨·, ·⟩) Theorem (Riesz Representation Theorem for Hilbert Spaces) If LINEAR : V → R is any bounded linear real-valued function on the Hilbert space (V, ⟨·, ·⟩), then there exists a unique g ∈ V, called the representer of LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ V. Hilbert space = a vector space (vectors can be added and multiplied by scalars) that is complete under ∥·∥ (sequences that should converge do) e.g., R is complete, Q is not may be infinite dimensional, e.g., all f : [0, 1] → R with a first derivative bounded means sup f∈V |LINEAR(f)| ∥f∥ < ∞ (automatic for finite dimensional V, but look here ) linear + bounded =⇒ continuous, linear + continuous =⇒ bounded Can we prove this theorem without referring to a basis for V? 6/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for a Hilbert Space (V, ⟨·, ·⟩) Theorem (Riesz Representation Theorem for Hilbert Spaces) If LINEAR : V → R is any bounded linear real-valued function on the Hilbert space (V, ⟨·, ·⟩), then there exists a unique g ∈ V, called the representer of LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ V. Proof. Existence. Define ker(LINEAR) = {v ∈ V : LINEAR(v) = 0} as the subspace of V that LINEAR maps into 0. If ker(LINEAR) = V, then all vectors in V are mapped to 0 and g = 0. 6/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for a Hilbert Space (V, ⟨·, ·⟩) Theorem (Riesz Representation Theorem for Hilbert Spaces) If LINEAR : V → R is any bounded linear real-valued function on the Hilbert space (V, ⟨·, ·⟩), then there exists a unique g ∈ V, called the representer of LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ V. Proof. Existence. Define ker(LINEAR) = {v ∈ V : LINEAR(v) = 0} as the subspace of V that LINEAR maps into 0. If ker(LINEAR) = V, then all vectors in V are mapped to 0 and g = 0. Otherwise, pick any nonzero g⊥ ∈ {u ∈ V : ⟨u, v⟩ = 0 ∀v ∈ ker(LINEAR)} How? , i.e., g⊥ is orthogonal to all vectors in ker(LINEAR). 6/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for a Hilbert Space (V, ⟨·, ·⟩) Theorem (Riesz Representation Theorem for Hilbert Spaces) If LINEAR : V → R is any bounded linear real-valued function on the Hilbert space (V, ⟨·, ·⟩), then there exists a unique g ∈ V, called the representer of LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ V. Proof. Existence. Define ker(LINEAR) = {v ∈ V : LINEAR(v) = 0} as the subspace of V that LINEAR maps into 0. If ker(LINEAR) = V, then all vectors in V are mapped to 0 and g = 0. Otherwise, pick any nonzero g⊥ ∈ {u ∈ V : ⟨u, v⟩ = 0 ∀v ∈ ker(LINEAR)} How? , i.e., g⊥ is orthogonal to all vectors in ker(LINEAR). For any f ∈ V, let h = LINEAR(f)g⊥ − LINEAR(g⊥ )f, and note that LINEAR(h) = LINEAR LINEAR(f)g⊥ − LINEAR(g⊥ )f = LINEAR(f) LINEAR(g⊥ ) − LINEAR(g⊥ ) LINEAR(f) = 0, so h ∈ ker(LINEAR). 6/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for a Hilbert Space (V, ⟨·, ·⟩) Theorem (Riesz Representation Theorem for Hilbert Spaces) If LINEAR : V → R is any bounded linear real-valued function on the Hilbert space (V, ⟨·, ·⟩), then there exists a unique g ∈ V, called the representer of LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ V. Proof. Existence. Define ker(LINEAR) = {v ∈ V : LINEAR(v) = 0} as the subspace of V that LINEAR maps into 0. If ker(LINEAR) = V, then all vectors in V are mapped to 0 and g = 0. Otherwise, pick any nonzero g⊥ ∈ {u ∈ V : ⟨u, v⟩ = 0 ∀v ∈ ker(LINEAR)} How? , i.e., g⊥ is orthogonal to all vectors in ker(LINEAR). For any f ∈ V, let h = LINEAR(f)g⊥ − LINEAR(g⊥ )f, and note that LINEAR(h) = LINEAR LINEAR(f)g⊥ − LINEAR(g⊥ )f = LINEAR(f) LINEAR(g⊥ ) − LINEAR(g⊥ ) LINEAR(f) = 0, so h ∈ ker(LINEAR). The choice of g⊥ implies that 0 = ⟨g⊥ , h⟩ = LINEAR(f) ⟨g⊥ , g⊥ ⟩ − LINEAR(g⊥ ) ⟨g⊥ , f⟩ , LINEAR(f) = LINEAR(g⊥ ) ⟨g⊥ , f⟩ ⟨g⊥ , g⊥ ⟩ = ⟨g, f⟩ for g := LINEAR(g⊥ )g⊥ ∥g∥2 . 6/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Riesz Representation Theorem for a Hilbert Space (V, ⟨·, ·⟩) Theorem (Riesz Representation Theorem for Hilbert Spaces) If LINEAR : V → R is any bounded linear real-valued function on the Hilbert space (V, ⟨·, ·⟩), then there exists a unique g ∈ V, called the representer of LINEAR, for which LINEAR(f) = ⟨g, f⟩ for all f ∈ V. Proof. Existence. Done. Uniqueness. Same proof as before. 6/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes What Can We Do with the Riesz Representation Theorem? Theorem (Error Bound for Numerical Integration) Let (V, ⟨·, ·⟩) be a Hilbert space of functions on [0, 1]d. Suppose that integration and function evaluation are both bounded, linear real-valued functions on V. Then the error of approximating the integral of a function in V by the sample mean is [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ = cos ∡(η, f) ∥η∥ ∥f∥ ⩽ ∥η∥ ∥f∥ ∀f ∈ V, for some representer η ∈ V that depends on x1 , . . . , xn , but not on f. Significance Error bound separates the f dependent part from the algorithm dependent part (η) Algorithm developers can concentrate on making ∥η∥ small Providential if η is nearly orthogonal to f [2], but don’t count on it 7/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes What Can We Do with the Riesz Representation Theorem? Theorem (Error Bound for Numerical Integration) Let (V, ⟨·, ·⟩) be a Hilbert space of functions on [0, 1]d. Suppose that integration and function evaluation are both bounded, linear real-valued functions on V. Then the error of approximating the integral of a function in V by the sample mean is [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ = cos ∡(η, f) ∥η∥ ∥f∥ ⩽ ∥η∥ ∥f∥ ∀f ∈ V, for some representer η ∈ V that depends on x1 , . . . , xn , but not on f. Proof. Note that INT : f → [0,1]d f(t) dt and AVG : f → 1 n n i=1 f(xi ) are bounded, linear real-valued functions. 7/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes What Can We Do with the Riesz Representation Theorem? Theorem (Error Bound for Numerical Integration) Let (V, ⟨·, ·⟩) be a Hilbert space of functions on [0, 1]d. Suppose that integration and function evaluation are both bounded, linear real-valued functions on V. Then the error of approximating the integral of a function in V by the sample mean is [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ = cos ∡(η, f) ∥η∥ ∥f∥ ⩽ ∥η∥ ∥f∥ ∀f ∈ V, for some representer η ∈ V that depends on x1 , . . . , xn , but not on f. Proof. Note that INT : f → [0,1]d f(t) dt and AVG : f → 1 n n i=1 f(xi ) are bounded, linear real-valued functions. Thus, so is ERR = INT − AVG. By the Reisz Representation Theorem, there exists a representer η ∈ V such that [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ERR(f) = ⟨η, f⟩. 7/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes What Can We Do with the Riesz Representation Theorem? Theorem (Preliminary Error Bound for Numerical Integration) Let (V, ⟨·, ·⟩) be a Hilbert space of functions on [0, 1]d. Suppose that integration and function evaluation are both bounded, linear real-valued functions on V. Then the error of approximating the integral of a function in V by the sample mean is [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ = cos ∡(η, f) ∥η∥ ∥f∥ ⩽ ∥η∥ ∥f∥ ∀f ∈ V, for some representer η ∈ V that depends on x1 , . . . , xn , but not on f. Significance Error bound separates the f dependent part from the algorithm dependent part (η) Algorithm developers can concentrate on making ∥η∥ small Providential if η is nearly orthogonal to f [2], but don’t count on it How to find η? 7/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Reproducing Kernels [3] Suppose that (V, ⟨·, ·⟩) is Hilbert space of functions on Ω for which function evaluation is a bounded, linear functional. Then there exists, K : Ω × Ω → R called a reproducing kernel for which K(t, x) = K(x, t) symmetry , K(·, x) ∈ V belonging , f(x) = ⟨K(·, x), f⟩ reproduction ∀t, x ∈ Ω, f ∈ V What do reproducing kernels look like for V = Rd Look here 8/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Reproducing Kernels [3] Suppose that (V, ⟨·, ·⟩) is Hilbert space of functions on Ω for which function evaluation is a bounded, linear functional. Then there exists, K : Ω × Ω → R called a reproducing kernel for which K(t, x) = K(x, t) symmetry , K(·, x) ∈ V belonging , f(x) = ⟨K(·, x), f⟩ reproduction ∀t, x ∈ Ω, f ∈ V Combining with the Riesz Representation Theorem ERR(f) := [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ , representer η =? η(x) = reproduction ⟨K(·, x), η⟩ = symmetry ⟨η, K(·, x)⟩ = representer ERR K(·, x) 8/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Reproducing Kernels [3] Suppose that (V, ⟨·, ·⟩) is Hilbert space of functions on Ω for which function evaluation is a bounded, linear functional. Then there exists, K : Ω × Ω → R called a reproducing kernel for which K(t, x) = K(x, t) symmetry , K(·, x) ∈ V belonging , f(x) = ⟨K(·, x), f⟩ reproduction ∀t, x ∈ Ω, f ∈ V Combining with the Riesz Representation Theorem ERR(f) := [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ , representer η =? η(x) = reproduction ⟨K(·, x), η⟩ = symmetry ⟨η, K(·, x)⟩ = representer ERR K(·, x) = [0,1]d K(t, x) dt − 1 n n i=1 K(xi , x) 8/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Reproducing Kernels [3] Suppose that (V, ⟨·, ·⟩) is Hilbert space of functions on Ω for which function evaluation is a bounded, linear functional. Then there exists, K : Ω × Ω → R called a reproducing kernel for which K(t, x) = K(x, t) symmetry , K(·, x) ∈ V belonging , f(x) = ⟨K(·, x), f⟩ reproduction ∀t, x ∈ Ω, f ∈ V Combining with the Riesz Representation Theorem ERR(f) := [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ , representer η =? η(x) = reproduction ⟨K(·, x), η⟩ = symmetry ⟨η, K(·, x)⟩ = representer ERR K(·, x) = [0,1]d K(t, x) dt − 1 n n i=1 K(xi , x) ∥η∥2 = ⟨η, η⟩ = representer ERR(η) 8/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Reproducing Kernels [3] Suppose that (V, ⟨·, ·⟩) is Hilbert space of functions on Ω for which function evaluation is a bounded, linear functional. Then there exists, K : Ω × Ω → R called a reproducing kernel for which K(t, x) = K(x, t) symmetry , K(·, x) ∈ V belonging , f(x) = ⟨K(·, x), f⟩ reproduction ∀t, x ∈ Ω, f ∈ V Combining with the Riesz Representation Theorem ERR(f) := [0,1]d f(t) dt − 1 n n i=1 f(xi ) = ⟨η, f⟩ , representer η =? η(x) = reproduction ⟨K(·, x), η⟩ = symmetry ⟨η, K(·, x)⟩ = representer ERR K(·, x) = [0,1]d K(t, x) dt − 1 n n i=1 K(xi , x) ∥η∥2 = ⟨η, η⟩ = representer ERR(η) = [0,1]2d K(t, x) dt dx − 2 n n i=1 [0,1]d K(xi , x) dx + 1 n2 n i,j=1 K(xi , xj ) 8/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Putting It Together Theorem (Error Bound for Numerical Integration) Let (V, ⟨·, ·⟩) be a Hilbert space of functions on [0, 1]d with reproducing kernel, K. Suppose that integration and function evaluation are both bounded, linear real-valued functions on V. Then the error of approximating the integral of a function in V by the sample mean is [0,1]d f(t) dx − 1 n n i=1 f(xi ) ⩽ BAD(x1 , . . . , xn ) BAD(f), where BAD2(x1 , . . . , xn ) = [0,1]2d K(t, x) dt dx − 2 n n i=1 [0,1]d K(xi , x) dx + 1 n2 n i,j=1 K(xi , xj ), BAD(f) = ∥f∥ . For an explicit example of a K and BAD(x1 , . . . , xn ) Look here 9/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Why Do Others Cite This Paper? [0,1]d f(t) dt − 1 n n i=1 f(xi ) ⩽ BAD(x1 , . . . , xn ) BAD(f), You can pick reproducing kernel K well and analyze Convergence How fast BAD(x1 , . . . , xn ) → 0 with n for clever x1 , x2 , . . . Tractability How this convergence depends on d 10/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes References 1. H., F. J. A Generalized Discrepancy and Quadrature Error Bound. Math. Comp. 67, 299–322 (1998). 2. H., F. J. The Trio Identity for Quasi-Monte Carlo Error Analysis. in Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Stanford, USA, August 2016 (eds Glynn, P. & Owen, A.) (Springer-Verlag, Berlin, 2018), 3–27. 3. N. Aronszajn. Theory of Reproducing Kernels. Trans. Amer. Math. Soc. 68, 337–404 (1950). 12/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes How Can a Linear Function on a Hilbert Space be Unbounded? Back Short answer: V must be infinite dimensional 13/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes How Can a Linear Function on a Hilbert Space be Unbounded? Back Short answer: V must be infinite dimensional Consider a vector space of real-valued sequences with the typical inner product: V := {f = (f1 , f2 , . . .)T : fi ∈ R, ∥f∥ < ∞}, ⟨f, h⟩ := f1 h1 + f2 h2 + f3 h3 + · · · Define the linear real-valued function LINEAR(f) = f1 + 2f2 + 3f3 + 4f4 + · · · Let ei := (0, . . . , 0, 1 ith position , 0, . . .)T. For any ε, δ > 0, choose i > ε/δ. Then ∥δei∥ = δ, but sup f̸=0 |LINEAR(f)| ∥f∥ ⩾ sup i=1,2,... |LINEAR(δei )| ∥δei∥ = sup i=1,2,... iδ δ = sup i=1,2,... i = ∞ Cannot guarantee that |LINEAR(f)| is small enough, no matter how small you make ∥f∥. This LINEAR is unbounded. 13/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Proof of the Cauchy-Schwarz Inequality Back Theorem (Cauchy-Schwarz) Let (V, ⟨·, ·⟩) be a real-valued inner product space. Then ⟨f, h⟩ ⩽ ∥f∥ ∥h∥ ∀f, h ∈ V, (C-S) with equality iff c1 f + c2 h = 0 for some nonzero (c1 , c2 ). Proof of Inequality. If f or h are zero, the inequality becomes an equality by direct calculation. For any nonzero f, h ∈ V, define the quadratic polynomial p as follows: p(t) := ∥tf + h∥2 = ⟨tf + h, tf + h⟩ = t2 ⟨f, f⟩ + 2t ⟨f, h⟩ + ⟨h, h⟩ = t2 ∥f∥2 + 2t ⟨f, h⟩ + ∥h∥2 . Since p(t) ⩾ 0 by definition, p cannot have two roots, which means that ⟨f, h⟩ 2 − ∥f∥2 ∥h∥2 must be non-positive. This implies the inequality for ⟨f, h⟩ . 14/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes Proof of the Cauchy-Schwarz Inequality Back Theorem (Cauchy-Schwarz) Let (V, ⟨·, ·⟩) be a real-valued inner product space. Then ⟨f, h⟩ ⩽ ∥f∥ ∥h∥ ∀f, h ∈ V, (C-S) with equality iff c1 f + c2 h = 0 for some nonzero (c1 , c2 ). Proof of Equality. Recall that p(t) := ∥tf + h∥2 = t2 ∥f∥2 + 2t ⟨f, h⟩ + ∥h∥2 . Equality in (C-S) happens iff ⟨f, h⟩ = ∥f∥ ∥h∥, which implies that p has a single root, t0 . Then 0 = p(t0 ) = ∥t0 f + h∥2 , which is true iff t0 f + h = 0 by the definition of a norm. 14/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes There Exists a Nonzero g⊥ Orthogonal to All of ker(V) Back Lemma If LINEAR : V → R is any bounded, linear functional on the Hilbert space (V, ⟨·, ·⟩), and ker(LINEAR) := LINEAR−1({0}) = {f ∈ V : LINEAR(f) = 0} ̸= V, then there exists a nonzero g⊥ ∈ {h ∈ V : ⟨h, f⟩ = 0 ∀f ∈ ker(LINEAR)}. Proof. Let g⊥ = h − k. Since h / ∈ ker(LINEAR) and k ∈ ker(LINEAR), then h ̸= k and ∥g⊥ ∥ = ∥h − k∥ = dist h, ker(LINEAR) ̸= 0. Pick any f ∈ ker(LINEAR), and note that k + tf ∈ ker(LINEAR) for all t ∈ R. Thus, ∥g⊥ ∥2 ⩽ ∥h − (k + tf)∥2 = ∥g⊥ − tf∥2 = ∥g⊥ ∥2 − 2t ⟨g⊥ , f⟩ + t2 ∥f∥2 0 ⩽ t −2 ⟨g⊥ , f⟩ + t ∥f∥2 ∀t ∈ R The only way to ensure this inequality for all t is for ⟨g⊥ , f⟩ = 0. 15/17
Thm for Hilbert Spaces Reproducing Kernels References Deleted Scenes What Does a Reproducing Kernel Look Like for V = Rd? Back The functions have domain Ω := {1, . . . , d}, and are represented as vectors of matrices. Pick a symmetric, positive definite (all eigenvalues are positive) matrix A ∈ Rd×d to define the inner product ⟨f, h⟩ := fTAh, where f = f(i) d i=1 . Then the reproducing kernel, K, is defined by K(i, j) d i,j=1 = K := A−1. Note that K(i, j) = K(j, i) because A is symmetric and thus so is K K(·, j) = jth column of K =: Kj ∈ Rd = V ⟨K(·, j), f⟩ = KT j Af = ej f = f(j) since K := A−1 where ej := (0, . . . , 0, 1 jth position , 0, . . .)T 16/17