25

# Reproducing Kernel Tutorial ## Fred J. Hickernell

November 15, 2021

## Transcript

1. Reproducing Kernels
Fred J. Hickernell
Department of Applied Mathematics Center for Interdisciplinary Scientific Computation
Office of Research
Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell
Thanks to Mac Hyman for the invitation
Thanks to many students and collaborators
Slides available at speakerdeck.com/fjhickernell/reproducing-kernel-tutorial
November 15, 2021

2. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
What Can We Do with Reproducing Kernel Hilbert Spaces?
Use the Reisz Representation Theorem to derive error bounds for algorithms for linear problems,
such as integration, function approximation, solving linear differential equations
Derive optimal algorithms
Determine how fast the error bounds decay to zero as the computational effort increases, and even
whether convergence depends significantly on the number of variables
Include trends, BUT I have not prepared that for today.
Derive a parallel analysis using Gaussian processes where the reproducing kernel is interpreted as a
covariance kernel
2/20

3. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
What Can We Do with Reproducing Kernel Hilbert Spaces?
Use the Reisz Representation Theorem to derive error bounds for algorithms for linear problems,
such as integration, function approximation, solving linear differential equations, BUT
You must be able to solve the problem for your reproducing kernel
You must pick a kernel that matches your input function
You may need to tune the kernel parameters
Derive optimal algorithms, BUT it takes O(n3) operations to compute the weights
Determine how fast the error bounds decay to zero as the computational effort increases, and even
whether convergence depends significantly on the number of variables
Include trends, BUT I have not prepared that for today.
Derive a parallel analysis using Gaussian processes where the reproducing kernel is interpreted as a
covariance kernel , BUT I have not prepared that for today
2/20

4. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on {1, . . . , d}, aka Vectors
Let F := all functions on {1, . . . , d} “=” Rd
Pick a symmetric, positive definite (positive eigenvalues) matrix W ∈ Rd×d to define an inner product
⟨f, h⟩ := fTWh, ∀f, h ∈ F, where f = f(t) d
t=1
3/20

5. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on {1, . . . , d}, aka Vectors
Let F := all functions on {1, . . . , d} “=” Rd
Pick a symmetric, positive definite (positive eigenvalues) matrix W ∈ Rd×d to define an inner product
⟨f, h⟩ := fTWh, ∀f, h ∈ F, where f = f(t) d
t=1
Reproducing kernel, K, is defined by K(t, x) d
t,x=1
= K := W−1, and has the properties
Symmetry K(t, x) = K(x, t) because W is symmetric and thus so is K
Positive Definiteness K(xi
, xj
) n
i,j=1
is positive definite for any distinct x1
, . . . , xn ∈ {1, . . . , d}
Belonging K(·, x) = xth column of K =: Kx ∈ F
Reproduction ⟨K(·, x), f⟩ = KT
x
Wf = ex
f = f(x) since K := W−1; ex
:= (0, . . . , 0, 1
xth position
, 0, . . .)T
3/20

6. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on {1, . . . , d}, aka Vectors
Let F := all functions on {1, . . . , d} “=” Rd
Pick a symmetric, positive definite (positive eigenvalues) matrix W ∈ Rd×d to define an inner product
⟨f, h⟩ := fTWh, ∀f, h ∈ F, where f = f(t) d
t=1
Reproducing kernel, K, is defined by K(t, x) d
t,x=1
= K := W−1, and has the properties
Symmetry K(t, x) = K(x, t) because W is symmetric and thus so is K
Positive Definiteness K(xi
, xj
) n
i,j=1
is positive definite for any distinct x1
, . . . , xn ∈ {1, . . . , d}
Belonging K(·, x) = xth column of K =: Kx ∈ F
Reproduction ⟨K(·, x), f⟩ = KT
x
Wf = ex
f = f(x) since K := W−1; ex
:= (0, . . . , 0, 1
xth position
, 0, . . .)T
Riesz Representation Theorem says that for any linear function, LINEAR, there is a representer g such
that LINEAR(f) = ⟨g, f⟩ = gTWf. Note

g(1)
.
.
.
g(d)

= g = KWg =

KT
1
Wg
.
.
.
KT
d
Wg

=

⟨K(·, 1), g⟩
.
.
.
⟨K(·, d), g⟩

=

LINEAR(K(·, 1))
.
.
.
LINEAR(K(·, d))

3/20

7. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on {1, . . . , d}, aka Vectors
Let F be a vector space of functions
define an inner product
W is gone
Reproducing kernel, K, has the properties
Symmetry K(t, x) = K(x, t)
Positive Definiteness K(xi
, xj
) n
i,j=1
is positive definite for any distinct x1
, . . . , xn ∈ {1, . . . , d}
Belonging K(·, x) ∈ F
Reproduction ⟨K(·, x), f⟩ = f(x)
Riesz Representation Theorem says that
LINEAR(f) = ⟨g, f⟩

g(1)
.
.
.
g(d)

=

LINEAR(K(·, 1))
.
.
.
LINEAR(K(·, d))

3/20

8. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on General Domains 
Suppose that (F, ⟨·, ·⟩) is a Hilbert space of functions on Ω for which function evaluation is bounded.
Then there exists a unique reproducing kernel K : Ω × Ω → R for which
K(t, x) = K(x, t)
symmetry
, K(·, x) ∈ F
belonging
, f(x) = ⟨K(·, x), f⟩
reproduction
∀t, x ∈ Ω, f ∈ F
K(X, X) = K(xi
, xj
) n
i,j=1
is positive definite for any n × d X with distinct rows lying in Ω
F is the completion of {c1
K(·, x1
) + · · · + cn
K(·, xn
) : n ∈ N, c ∈ Rn}; any K satisfying the above implies F
4/20

9. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on General Domains 
Suppose that (F, ⟨·, ·⟩) is a Hilbert space of functions on Ω for which function evaluation is bounded.
Then there exists a unique reproducing kernel K : Ω × Ω → R for which
K(t, x) = K(x, t)
symmetry
, K(·, x) ∈ F
belonging
, f(x) = ⟨K(·, x), f⟩
reproduction
∀t, x ∈ Ω, f ∈ F
K(X, X) = K(xi
, xj
) n
i,j=1
is positive definite for any n × d X with distinct rows lying in Ω
F is the completion of {c1
K(·, x1
) + · · · + cn
K(·, xn
) : n ∈ N, c ∈ Rn}; any K satisfying the above implies F
Riesz Representation Theorem says that for any bounded LINEAR : F → R there exists a representer
g ∈ F such that LINEAR(f) = ⟨g, f⟩ for all f ∈ F. What is g?
4/20

10. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on General Domains 
Suppose that (F, ⟨·, ·⟩) is a Hilbert space of functions on Ω for which function evaluation is bounded.
Then there exists a unique reproducing kernel K : Ω × Ω → R for which
K(t, x) = K(x, t)
symmetry
, K(·, x) ∈ F
belonging
, f(x) = ⟨K(·, x), f⟩
reproduction
∀t, x ∈ Ω, f ∈ F
K(X, X) = K(xi
, xj
) n
i,j=1
is positive definite for any n × d X with distinct rows lying in Ω
F is the completion of {c1
K(·, x1
) + · · · + cn
K(·, xn
) : n ∈ N, c ∈ Rn}; any K satisfying the above implies F
Riesz Representation Theorem says that for any bounded LINEAR : F → R there exists a representer
g ∈ F such that LINEAR(f) = ⟨g, f⟩ for all f ∈ F. What is g?
g(x) =
reproduction
⟨K(·, x), g⟩ =
symmetry
⟨g, K(·, x)⟩ =
representer
LINEAR K(·, x) ∀x ∈ Ω
4/20

11. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Reproducing Kernels for Functions on General Domains 
Suppose that (F, ⟨·, ·⟩) is a Hilbert space of functions on Ω for which function evaluation is bounded.
Then there exists a unique reproducing kernel K : Ω × Ω → R for which
K(t, x) = K(x, t)
symmetry
, K(·, x) ∈ F
belonging
, f(x) = ⟨K(·, x), f⟩
reproduction
∀t, x ∈ Ω, f ∈ F
K(X, X) = K(xi
, xj
) n
i,j=1
is positive definite for any n × d X with distinct rows lying in Ω
F is the completion of {c1
K(·, x1
) + · · · + cn
K(·, xn
) : n ∈ N, c ∈ Rn}; any K satisfying the above implies F
Riesz Representation Theorem says that for any bounded LINEAR : F → R there exists a representer
g ∈ F such that LINEAR(f) = ⟨g, f⟩ for all f ∈ F. What is g?
g(x) =
reproduction
⟨K(·, x), g⟩ =
symmetry
⟨g, K(·, x)⟩ =
representer
LINEAR K(·, x) ∀x ∈ Ω
∥g∥2 = ⟨g, g⟩ =
representer
LINEAR(g) = LINEAR·· LINEAR· K(·, ··)
Do not need the definition of ⟨·, ·⟩ to compute g and ∥g∥
4/20

12. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Squared Exponential Kernel on R
The squared exponential (aka Gaussian) kernel for
univariate functions takes the form
K(t, x) = A exp −γ2 |t − x|2 , t, x ∈ R
corresponds to the Hilbert space of functions with
norm [2, (6.18)]
∥f∥2 = A

π

m=0
R
f(m)(x) 2
dx
m!4mγ2m+1
which means that functions have all deriviatives
square integrable.
5/20

13. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Squared Exponential Kernel on Rd
The squared exponential kernel for d-variate functions takes the
form
K(t, x) = A exp −γ2
1
|t1
− x1
|2 − · · · − γ2
d
|td
− xd
|2 ,
t, x ∈ Rd
corresponds to the Hilbert space of functions with norm
∥Dmf∥2
2
:=
Rd
∂∥m∥1 f(x)
∂xm1
1
· · · ∂xmd
d
2
dx
∥f∥2 = A

π
m∈Nd
0
∥Dmf∥2
2
∥m∥1
! 4∥m∥1
d
k=1
γ2mj
k
which means that functions have all deriviatives square inte-
grable. This kernel is stationary. It is isotropic if γ1
= · · · = γd
.
6/20

14. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Matérn Kernels
A popular family of kernels with a range of smooth-
ness depending on r with an associate norm that is
not simple to write down:
Kr
(t, x) = A ∥t − x∥r
2
Mod Bessel Secr
(γ ∥t − x∥2
)
K1/2
(t, x) = A1/2
exp(−γ ∥t − x∥2
) not very smooth
K3/2
(t, x) = A3/2
(1 + γ ∥t − x∥2
) exp(−γ ∥t − x∥2
) somewhat smoother
7/20

15. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
The Centered Discrepancy Kernel 
A reproducing kernel used to analyze cubatures gives the
weighted centered discrepancy takes the form
K(t, x) :=
d
k=1
1 +
γk
2
|tk
− 1/2| + |xk
− 1/2| − |tk
− xk
| .
t, x ∈ [0, 1]d
which corresponds to the Hilbert space for functions defined on
[0, 1]d with the following norm:
∥Dmf∥2
2
:=
Rd
∂∥m∥1 f(x)
∂xm1
1
· · · ∂xmd
d
2
xj=1/2 for mj=0 j s.t. mj>0
dxj
∥f∥2 := A
∥m∥∞⩽1
∥Dmf∥2
2
γk
Mixed partial derivatives of up to order one in each coordinate
must be square integrable.
8/20

16. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
The Delta Kernel
A reproducing kernel with an uncountable basis is
K(t, x) :=
1 + γ, t = x,
1, otherwise,
t, x ∈ [0, 1]d
which corresponds to the Hilbert space for functions
that are a constant everywhere except possibly at a
countable number of points.
I(f) =
[0,1]d
f(x) dx
∥f∥2 := |I(f)|2 +
x∈[0,1]d
|f(x) − I(f)|2
γ
This Hilbert space has an uncountable basis.
9/20

17. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Hilbert Spaces of Signed Measures 
Let M be the Hilbert spaced of measures on Ω that is the completion of
{c1
δx1
+ · · · + cn
δxn
: n ∈ N, c ∈ Rn} under the norm induced by
⟨µ, ν⟩M
:=
Ω×Ω
K(t, x) (µ × ν)(dt × dx)
and δx
is the Dirac measure, i.e.,

f(t) δx(dt) = f(x). There exists a one-to-one and onto, isometric (I
think) mapping T : M → F defined as
T(µ)(x) :=

K(t, x) µ(dt) ∀x ∈ Ω, µ ∈ M
such that ⟨T(ν), f⟩ =

f(x) ν(dx).
If T(νx) is the representer for the solution of a differential equation at x, is νx
the Green’s function?
10/20

18. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Can We Findlike the W for functions on {1, . . . , d}
Let M be the Hilbert spaced of measures on Ω that is the completion of
{c1
δx1
+ · · · + cn
δxn
: n ∈ N, c ∈ Rn} under the norm induced by
⟨µ, ν⟩M
:=
Ω×Ω
K(t, x) (µ × ν)(dt × dx)
and δx
is the Dirac measure, i.e.,

f(t) δx(dt) = f(x).
Is there a measure ω on Ω × Ω such that
Ω×Ω
K(t, s)K(u, x) ω(ds × du) = K(t, x) ∀t, x ∈ Ω?
This would be like the W for functions on {1, . . . , d}
11/20

19. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Separable Hilbert Spaces, i.e., Those with Countable Bases
Hilbert spaces of functions on Ω with countable bases can be written in terms of an L2(Ω) basis
f(x) =
k
f(k)φk(x),

φk(x)φl(x) dx = δk,l
and the reproducing kernel is
K(t, x) =
k
λkφk(t)φk(x), note that

K(x, x) dx =
k
λk < ∞, so λk → 0
⟨f, g⟩ :=
k
f(k)^
g(k)
λk
, since this implies ⟨K(·, x), f⟩ :=
k
λkφk(x)f(k)
λk
=
k
φk(x)f(k) = f(x)
If we formally define the distribution m(x) =
k
f(k)φk(x)
λk
, then

K(t, x)m(t) dt =
Ω k,l
λkφk(t)φk(x)
f(l)φl(t)
λl
dt =
k
φk(x)f(k) = f(x)
This m(x)dx seems to be the µ(dx) that gets mapped into f
It would seem that W(t, x) =
k
φk(t)φk(x)
λk
, which is not a convergent series
12/20

20. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Error for Approximating Linear Functionals
Suppose that
Linear SOL : F → R is the desired solution (integral, derivative at a point, etc.)
X = (x1
, . . . , xn
)T is the array of data sites
APPX,α(f) = α1
f(x1
) + · · · + αn
f(xn
) = αTf(X) is the approximation
Then the approximation error has a tight upper bound of
SOL(f) − APPX,α(f)
linear, bounded
= ⟨g, f⟩ ⩽ ∥g∥
∥f∥
where g(x) = SOL − APPX,α
K(·, x)
∥g∥2 = SOL − APPX,α
··
SOL − APPX,α
·
K(·, ··)
= SOL·· SOL· K(·, ··) − 2αT SOL K(X, ·) + αTK(X, X)α
APPX
badness does not require ⟨·, ·⟩, but only K K(X, ·) = K(xi
, ·) n
i=1
, K(X, X) = K(xi
, xj
) n
i,j=1
Optimal weights are α = K(X, X)−1 SOL K(X, ·) ; optimal data sites, X, are hard nonlinear optimization
13/20

21. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
An Example of the Cubature Error Bound 
For the problem and non-optimal approximation
SOL(f) =
[0,1]d
f(x) dx, APPX
=
1
n
n
i=1
f(xi
),
and the reproducing kernel
K(t, x) :=
d
k=1
1 +
γk
2
|tk
− 1/2| + |xk
− 1/2| − |tk
− xk
| t, x ∈ [0, 1]d
the error bound is |SOL(f) − APPX
, . . . , xn
, . . . , xn
) = ∥representer of the error∥2
=
13
12
d

2
n
n
i=1
d
k=1
1 +
γk
2
|xik
− 1/2| − |xik
− 1/2|2
+
1
n2
n
i,j=1
d
k=1
1 +
1
2
|xik
− 1/2| + xjk
− 1/2 − xik
− xjk
Requires O(dn2) operations to compute
BAD(f) = ∥f − f(1/2, . . . , 1/2)∥, which is impractical to compute
14/20

22. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Optimal Function Approximation
Consider function evaluation at x, i.e., SOLx(f) = f(x). In this case,
SOL··
x
SOL·
x
K(·, ··) = K(x, x), SOLx
K(X, ·) = K(X, x)
and the optimal algorithm is
APPx,X,opt
(f) = K(x, X)K(X, X)−1f(X)
f(x) − K(x, X)K(X, X)−1f(X) 2
⩽ K(x, x) − K(x, X)K(X, X)−1K(X, x) ∥f∥2
⩽ K(x, x) − K(x, X)K(X, X)−1K(X, x)
only depends on X
f − K(·, X)K(X, X)−1f(X)
best approximation to f
2
APP·,X,opt
(f) is in the Hilbert space, and even in the span of K(·, x1
), . . . , K(·, xn
), so
K(·, X)K(X, X)−1f(X)
best approximation
⊥ f − K(·, X)K(X, X)−1f(X)
error of approximation
The optimal linear approximation for an arbitrary linear functional is just the linear functional applied to the
optimal function approximation
K(X, X) can be ill-conditioned for smooth kernels and lots of data
15/20

23. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Cardinal Functions
To visualize the effect of an additional data point on the function approximation, we plot
APP·,X,opt
(f) = K(·, X)K(X, X)−1f(X) for f(X) = ei
(all data are zero but one)
The cardinal functions of the smoother kernel is more oscillatory
16/20

24. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Why Is the Optimal Approximation Linear?
Fix x ∈ Ω. Let
BX,f(X),R
= {g ∈ F : ∥g∥2
⩽ R2 + ∥APP·,X,opt
(f)∥2 , g(X) = f(X)} functions that look like f
BX,⊥,R
= {h ∈ F : ∥h∥ ⩽ R, h(X) = 0} functions that vanish at the data sites
= {h ∈ F : ∥h∥ ⩽ R, ⟨h, K(·, x1
)⟩ = · · · = ⟨h, K(·, xn
)⟩ = 0}
Any g ∈ BX,f(X),R
may be written as g = APP·,X,opt
(f) + g⊥
with g⊥
∈ BX,⊥,R
yopt
:= argmin
y∈R
ERR(y)
ERR(y) := max
g∈BX,f(X),R
|g(x) − y| = max
g⊥∈BX,⊥,R
|APPx,X,opt
(f) + g⊥
(x) − y|
Since for every g⊥
∈ BX,⊥,R
it also is true that −g⊥
∈ BX,⊥,R
, the optimal choice of y is APPx,X,opt
(f).
17/20

25. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
Tuning the Kernel Parameters
Virtually all reproducing kernels have parameters, θ, that govern smoothness and shape. To ensure that
your function is typical for the reproducing kernel Hilbert space, F, one should likely tune these
parameters from the function data. Here is a proposal:
θopt
= argmin
θ
log f(X)TKθ(X, X)−1f(X)
squared norm of the minimum norm interpolant
+
1
n
log(det(Kθ(X, X)))
This corresponds to choosing θ to minimize the volume of the ellipsoidal solid in Rn consisting of all
possible function data whose minimum-norm interpolants have an Fθ
-norm no greater than that of the
observed interpolant.
It also corresponds to using empirical Bayes when working in the Gaussian process setting with
covariance kernels, Kθ
18/20

26. Thank you
These slides are available at
speakerdeck.com/fjhickernell/reproducing-kernel-tutorial

27. Background Rep Ker & Riesz Rep Thm Kernel Ex Assoc Measures Error Bds References
References
1. N. Aronszajn. Theory of Reproducing Kernels. Trans. Amer. Math. Soc. 68, 337–404 (1950).
2. Rasmussen, C. E. & Williams, C. Gaussian Processes for Machine Learning. (online version at
http://www.gaussianprocess.org/gpml/) (MIT Press, Cambridge, Massachusetts, 2006).
3. H., F. J. A Generalized Discrepancy and Quadrature Error Bound. Math. Comp. 67, 299–322 (1998).
4. H., F. J. Goodness-of-Fit Statistics, Discrepancies and Robust Designs. Statist. Probab. Lett. 44,
73–78 (1999).
20/20