Upgrade to Pro — share decks privately, control downloads, hide ads and more …

プレ・コントラスト関数の幾何学

DeepFlow, Inc.
April 28, 2022
320

 プレ・コントラスト関数の幾何学

コントラスト関数とは多様体上の非対称な距離2乗型関数で,統計モデル上ではダイバージェンス(または相対エントロピーとも呼ばれる)の概念を含むものである.コントラスト関数からは(擬)リーマン計量と互いに双対的な捩れのないアファイン接続が誘導される. プレ・コントラスト関数は量子情報幾何学や非保存系統計学の幾何学理論を構成することを動機として導入されたものである.直感的には方向に依存する非対称な距離型関数である. 本講演ではコントラスト関数と,それを構成するために有用なアファインはめ込みの理論を概観したのち,プレ・コントラスト関数の幾何学やアファイン分布論を解説する.

https://connpass.com/event/241136/

DeepFlow, Inc.

April 28, 2022
Tweet

More Decks by DeepFlow, Inc.

Transcript

  1. Geometry of Pre-contrast Functions Hiroshi Matsuzoe Nagoya Institute of Technology

    1 Statistical manifolds and contrast functions 2 Affine immersions 3 Centroaffine immersions of codimension two 4 Quasi statistical manifolds and pre-contrast functions 5 Affine distributions 6 Generalized Pythagorean theorem ྔࢠͱݹయͷ෺ཧͱزԿ 2022 ʢܰྔ൛ʣ
  2. Geometric pre-divergences 2 Half a Century of Information Geometry (IG)

    (40-years of duality in IG) • Information geometry is an interdisciplinary field that ap- plies the techniques of differential geometry to study prob- ability theory and statistics. It studies statistical mani- folds, which are Riemannian manifolds whose points cor- respond to probability distributions. (Wikipedia (Engslish Ver. 2022 Jan. 27)) • In a narrow sense, it refers to the differential geometry of dual (conjugate) affine connections. (Fujiwara (2015) (Wikipedia (Japanese Ver. 2021 Oct. 4))) • (I cannot read Spanish version, but the explanation seems to focus more on the differential geometry of estimation.)
  3. Geometric pre-divergences 3 Half a Century of Information Geometry (IG)

    (40-years of duality in IG) Geometry of statistical models • Hotelling (1930) Rao (1945) The Fisher information matrix is a Riemannian metric. Conjugate connections • Blaschke (around 1920) Affine surface theory • Norden (1937, 1945) Sen (1944—1946) Parallel translations w.r.t. a pair of affine connections preserve a Riemmanian metric. Chentsov (1972) (in Russian)   “Statistical Decision Rules and Optimal Inference” Geometry of statistical models. — Invariance criterion — Non Levi-Civita connections — Category of statistical decisions   Western statisticians did not know his result.
  4. Geometric pre-divergences 4 Half a Century of Information Geometry (IG)

    (40-years of duality in IG) Geometry of statistical models Conjugate connections Chentsov (1972) (in Russian)   “Statistical Decision Rules and Optimal Inference” — Invariance criterion — Non Levi-Civita connections — Category of statistical decisions   • Efron (1975) pointed out the importance of curvature in statistical inference, and David pointed out the connection is not Levi-Civita. Nagaoka-Amari (1982) (unpublished technical report)   “Differential geometry of smooth families of probability distributions” — re-discovered dual (conjugate) affine connections — applied dual affine connections to geometry of statistical models   After (re-)discovery of dual affine connections, IG has been applied vari- ous fields of mathematical sciences.
  5. Geometric pre-divergences 5 Generalized Pythagorian theorem (cf. Nagaoka-Amari (1982), Chentsov

    (1968, 1972), Csisz´ ar (1975)) (M, g, ∇, ∇∗) : a dually flat space D : M × M → [0, ∞) : ∇-divergence on M (a non-symmetric squred distance like function) γ1 : ∇-geodesic connecting p and q (p, q, r ∈ M) γ2 : ∇∗-geodesic connecting q and r γ1 ⊥ γ2 at q with respect to g =⇒ D(p, r) = D(p, q) + D(q, r) p q r ׏- geodesic ׏כ-geodesic
  6. Geometric pre-divergences 6 Generalized projection theorem (cf. Nagaoka-Amari (1982), Chentsov

    (1968, 1972), Csisz´ ar (1975)) (M, g, ∇, ∇∗) : a dually flat space D : M × M → [0, ∞) ∇-divergence on M S ⊂ M : ∇∗-autoparallel (∇∗-totally geodesic) submanifold in M. γ : ∇-geodesic connecting p and q (p ∈ M\S, q ∈ S) Then γ ⊥ S at q with respect to g ⇐⇒ D(p, q) = min r∈S D(p, r)
  7. Geometric pre-divergences 7 1 Statistical manifolds and contrast functions We

    assume that all objects such as manifolds, Riemannian metrics, etc. are smooth in this talk. 1.1 Statistical manifolds M : a manifold (an open domain in Rn) h : a (semi-)Riemannian metric on M ∇ : an affine connection on M   Definition 1.1 ∇∗: the dual (or conjugate) connection of ∇ with respect to h def ⇐⇒ Xh(Y, Z) = h(∇X Y, Z) + h(Y, ∇∗ X Z) for X, Y, Z ∈ X(M)   (1) (∇∗)∗ = ∇ (2) ∇(0) := 1 2 (∇ + ∇∗) =⇒ ∇(0)h = 0
  8. Geometric pre-divergences 8 The torsion tensor T ∇ of ∇:

    T ∇(X, Y ) = ∇X Y − ∇Y X − [X, Y ]   Proposition 1.2 Consider the following four conditions: (1) ∇ is torsion-free. (2) ∇∗ is torsion-free. (3) C := ∇h is totally symmetric. (4) ∇(0) := (∇ + ∇∗)/2 is the Levi-Civita connection w.r.t h. Any two conditions imply the rests.     Definition 1.3 (Kurose (1994)) Let (M, h) be a semi-Riemannian man- ifold with an affine connection ∇ on M. (M, ∇, h) : a statistical manifold def ⇐⇒ (1) ∇ is torsion-free (2) (∇X h)(Y, Z) = (∇Y h)(X, Z) C(X, Y, Z) := (∇X h)(Y, Z) is the cubic form of (M, ∇, h)  
  9. Geometric pre-divergences 9   Proposition 1.4 (M, h) :

    a semi-Riemannian manifold ∇(0) : the Levi-Civita connection with respect to h C : a totally symmetric (0, 3)-tensor field h(∇X Y, Z) := h(∇(0) X Y, Z) − 1 2 C(X, Y, Z), h(∇∗ X Y, Z) := h(∇(0) X Y, Z) + 1 2 C(X, Y, Z) =⇒ (1) ∇ and ∇∗ are affine connections, torsion-free and mutually dual with respect to h. (2) ∇h and ∇∗h are totally symmetric. (3) (M, ∇, h) and (M, ∇∗, h) are statistical manifolds.     Remark 1.5 (Original definition by S.L. Lauritzen) (M, g) : a Riemannian manifold C : a totally symmetric (0, 3)-tensor field We call the triplet (M, g, C) a statistical manifold.  
  10. Geometric pre-divergences 10   Remark 1.6 (Kurose (1990) (cf.

    Fujiwara (2015))) (M, g) : a Riemannian manifold ∇, ∇∗ : a pair of conjugate connections We call the quadruplet (M, g, ∇, ∇∗) a statistical manifold.   The curvature tensor R∇ of ∇: R∇(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z   h(R∇(X, Y )Z, W ) + h(Z, R∇∗ (X, Y )W ) = 0   This equation implies that R∇ = 0 ⇐⇒ R∇∗ = 0.   Definition 1.7 Let (M, ∇, h) be a statistical manifold. (M, ∇, h) is a Hessian manifold def ⇐⇒ R∇ = 0 and T ∇ = 0, i.e., ∇ is flat. ⇐⇒ (M, h, ∇, ∇∗) is a dually flat space.  
  11. Geometric pre-divergences 11 Example 1.8 (Normal distributions) For x ∈

    R, set pξ (x) = 1 √ 2π(ξ2)2 exp [ − (x − ξ1)2 2(ξ2) ] = 1 √ 2πσ2 exp [ − (x − µ)2 2σ2 ] S = {pξ | ξ = (ξ1, ξ2) = (µ, σ) ∈ R × (0, ∞)} Define the evaluation map by ex : S → R, ex (p) := p(x). Set l := log ◦ex gF p (X, Y ) := ∫ ∞ −∞ Xp (log ◦ex )Yp (log ◦ex )p(x)dx = Ep [(Xp l)(Yp l)] ( gF ij (pξ ) ) = − 1 (ξ2)2 ( 1 0 0 2 ) the Fisher metric Cp (X, Y, Z) := Ep [(Xp l)(Yp l)(Zp l)] the cubic form, Amari-Chentsov tensor field Denote by ∇(0) the Levi-Civita connection with respect to g. We define the α-connection (α ∈ R) on S by gF (∇(α) X Y, Z) := gF (∇(0) X Y, Z) − α 2 C(X, Y, Z) (S, ∇(α), g) is a statistical manifold. In particular, (S, ∇(1), gF ) and (S, ∇(−1), gF ) are Hessian manifolds, and thus (S, gF , ∇(1), ∇(−1)) is a dually flat space
  12. Geometric pre-divergences 12 A statistical model Se is an exponential

    family def ⇐⇒ Se = {pθ | θ ∈ Θ ⊂ Rn (an open subset)} where pθ (x) = exp[Z(x) + n ∑ i=1 θiFi (x) − ψ(θ)] Z, F1 , · · · , Fn : functions on X ψ : a function on the parameter space Θ The coordinate system [θi] is called the natural parameters.   Proposition 1.9 For an exponential family Se , (1) ∇(e) := ∇(1) and ∇(m) := ∇(−1) are flat. (2) [θi] is a ∇(e)-affine coordinate, i.e., Γ(e) k ij ≡ 0. (3) Set ηi (p) = Ep [Fi ]. Then [ηi ] is a ∇(m)-affine coordinate, Γ(m) k ij ≡ 0.   gF ij (θ) = E[(∂i (log ◦ex ))(∂j (log ◦ex ))] = ∂i ∂j ψ(θ) : the Fisher metric CF ijk (θ) = E[(∂i (log ◦ex ))(∂j (log ◦ex ))(∂k (log ◦ex ))] = ∂i ∂j ∂k ψ(θ) : the cubic form, Amari-Chentsov tensor field The triplets (Se , ∇(e), gF ) and (Se , ∇(m), gF ) are Hessian manifolds, and thus (Se , gF , ∇(e), ∇(m)) is a dually flat space.
  13. Geometric pre-divergences 13 1.2 Conformal-projective geometry   Definition 1.10

    Two statistical manifolds (M, ∇, h), (M, ¯ ∇, ¯ h) are (1) conformally-projectively equivalent def ⇐⇒ There exist two functions ϕ and ψ on M such that ¯ h(X, Y ) = eϕ+ψh(X, Y ), ¯ ∇X Y = ∇X Y − h(X, Y )gradh ψ + dϕ(Y ) X + dϕ(X) Y (2) 1-conformally equivalent def ⇐⇒ There exist a function ψ on M such that ¯ h(X, Y ) = eψh(X, Y ), ¯ ∇X Y = ∇X Y − h(X, Y )gradh ψ where gradh ψ is defined by h(X, gradh ψ) = dψ(X). Two connections ∇ and ¯ ∇ are dual-projectively equivalent. (3) (−1)-conformally equivalent def ⇐⇒ There exist a function ϕ on M such that ¯ h(X, Y ) = eϕh(X, Y ), ¯ ∇X Y = ∇X Y + dϕ(Y ) X + dϕ(X) Y Two connections ∇ and ¯ ∇ are projectively equivalent.  
  14. Geometric pre-divergences 14 Remark 1.11 (In the case ϕ =

    ψ =⇒ 0-conformally equivalent) (M, g), (M, ¯ g) : Riemannian manifolds ∇(0), ¯ ∇(0) : their Levi-Civita connections =⇒ ¯ ∇(0) X Y = ∇(0) X Y − h(X, Y )gradh ϕ + dϕ(Y ) X + dϕ(X) Y Proposition 1.12 (M, ∇, h) and (M, ¯ ∇, ¯ h) are 1-conformally equivalent ⇐⇒ (M, ∇∗, h) and (M, ¯ ∇∗, ¯ h) are (−1)-conformally equivalent.   Definition 1.13 (M, ∇, h) is conformally-projectively flat (resp. 1-conformally flat, (−1)-conformally flat) def ⇐⇒ (M, ∇, h) is locally confrmally-projectively equivalent (resp. 1-conformally euivalent, (−1)-conformally equivalent) to some Hessian manifold. That is, for each point in M, ∃U ⊂ M : a neighborhood, ∃(U, ¯ ∇, ¯ h) : a Hessian manifold such that (U, ∇|U , h|U ) and (U, ¯ ∇, ¯ h) are conformally-projectively equivalent.  
  15. Geometric pre-divergences 15 Conformal-projective geometry and umbilical points (M, ∇,

    h) : a statistical manifold, n ≥ 3 N : a submanifold of M n : the unit normal vector along N ∇′ : the induced connection h′ : the induced metric (N, ∇′, h′) is a statistical submanifold. ∇X Y = ∇′ X Y + α(X, Y )n ∇X ν = −β#(X) + τ(X)n Set β(X, Y ) = h′(β#(X), Y ).   Definition 1.14 p ∈ N p : a tangentially umbilical point of N in (M, ∇, h) def ⇐⇒ ∃c : αp = ch′ p p : a normally umbilical point of N in (M, ∇, h) def ⇐⇒ ∃c : βp = ch′ p  
  16. Geometric pre-divergences 16   Theorem 1.15 (Kurose ’02) (M,

    ∇, h) and (M, ¯ ∇, ¯ h) : simply connected statistical manifolds, dim M = n ≥ 3. (M, ∇, h) and (M, ¯ ∇, ¯ h) are conformally-projectively equivalent ⇐⇒ (1) Ric(X, Y ) − Ric(Y, X) = Ric(X, Y ) − Ric(Y, X) (2) (∇, h) → ( ¯ ∇, ¯ h) preserves the tangentially umbilical points and the normally umbilical points of any hypersurface of M.  
  17. Geometric pre-divergences 17 1.3 Contrast functions M : a manifold

    D : a function on M × M D[X1 · · · Xi |Y1 · · · Yj ] : a function on M defined by D[X1 · · · Xi |Y1 · · · Yj ](r) := (X1 )(p) · · · (Xi )(p) (Y1 )(q) · · · (Yj )(q) D(p, q)|p=r q=r For example, D[X| ](r) = X(p) D(p, q)|p=r q=r D[X|Y ](r) = X(p) Y(q) D(p, q)|p=r q=r D[XY |Z](r) = X(p) Y(p) Z(q) D(p, q)|p=r q=r . . .   Definition 1.16 D : M × M → R : a contrast function of M def ⇐⇒ (1) D(p, p) = 0, (2) D[X|] = D[|X] = 0, (3) h(X, Y ) := −D[X|Y ] : a semi-Riemannian metric on M.  
  18. Geometric pre-divergences 18   Proposition 1.17 For a contrast

    function D on M, we define ∇ and ∇∗ by h(∇X Y, Z) := −D[XY |Z], h(Y, ∇∗ X Z) := −D[Y |XZ] =⇒ (1) ∇, ∇∗ : are torsion-free affine connections on M, mutually dual with respect to h. (2) ∇h, ∇∗h : symmetric (0, 3)-tensor fields. (3) (M, ∇, h), (M, ∇∗, h) : statistical manifolds.   Example 1.18 For M = Rn, set D(p, q) := 1 2 ||p − q||2, (p, q) ∈ Rn × Rn. Then D is a contrast function on Rn. The contrast function D induces the standard Euclidean metric gE and the standard flat affine connection ∇E on Rn. That is, (Rn, ∇E, gE) is a Hessian manifold.
  19. Geometric pre-divergences 19 Example 1.19 S = {p(x; θ)}: a

    statistical model Kullback-Leibler divergence, relative entropy   DKL (p(x; θ), p(x; θ′)) = ∫ p(x; θ) log p(x; θ) p(x; θ′) dx = Eθ [log p(x; θ) − log p(x; θ′)] .   DKL [∂i |∂j ] = − ∫ ∂i p(θ)∂′ j log p(θ′)dx θ=θ′ ( ∂i = ∂ ∂θi , ∂′ j = ∂ ∂θ′j ) = − ∫ ∂i log p(θ)∂′ j log p(θ′)p(θ)dx θ=θ′ = −gF ij the Fisher metric DKL [∂i ∂j |∂k ] = − ∫ ( ∂i ∂j l(θ)∂′ k l(θ′) + ∂i l(θ)∂j l(θ)∂′ k l(θ′) ) p(θ)dx θ=θ′ = −Γ(m) ij,k the mixture connection The KL-divergence induces the invariant statistical manifold (S, ∇(m), gF ).
  20. Geometric pre-divergences 20   Definition 1.20 For a contrast

    function D on M, denote by h h : the induced semi-Riemannian metric, ∇, ∇∗ : the induced connections. We define (0, 4)-tensor fields B and B∗ by h(B(X, Y )Z, V ) := −D[XY Z|V ] + D[∇X ∇Y Z|V ], h(V, B∗(X, Y )Z) := −D[V |XY Z] + D[V |∇∗ X ∇∗ Y Z]. We call B the Bartlett tensor field on M, and B∗ the dual Bartlett tensor field on M.     Proposition 1.21 (Eguchi 1993) R, R∗ : the curvature tensors of ∇, ∇∗, respectively. =⇒ R(X, Y )Z = B(Y, X)Z − B(X, Y )Z, R∗(X, Y )Z = B∗(Y, X)Z − B∗(X, Y )Z.  
  21. Geometric pre-divergences 21   Proposition 1.22 D, ˜ D

    : contrast functions on M (M, ∇, h), (M, ˜ ∇, ˜ h) : induced statistical manifolds ϕ, ψ : functions on M. (1) ˜ D(p, q) = eψ(p)+ϕ(q)D(p, q) =⇒ (M, ∇, h) and (M, ˜ ∇, ˜ h) are conformally-projectively equivalent. (2) ˜ D(p, q) = eψ(q)D(p, q) =⇒ (M, ∇, h) and (M, ˜ ∇, ˜ h) are 1-conformally equivalent. (3) ˜ D(p, q) = eϕ(p)D(p, q) =⇒ (M, ∇, h) and (M, ˜ ∇, ˜ h) are (−1)-conformally equivalent.  
  22. Geometric pre-divergences 22 2 Affine immersions f : M →

    Rn+1: an immersion (dim M = n ≥ 2) ξ: a local vector field along f   Definition 2.1 {f, ξ} : M → Rn+1 is an affine immersion def ⇐⇒ For an arbitrary point p ∈ M, Tf(p) Rn+1 = f∗ (Tp M) ⊕ R{ξp } ξ: a transversal vector field   D: the standard flat affine connection on Rn+1 DX f∗ Y = f∗ (∇X Y ) + h(X, Y )ξ, DX ξ = −f∗ (SX) + τ(X)ξ.   Proposition 2.2 {f, ξ}, { ¯ f, ¯ ξ} : affine immersions Suppose ∇ = ¯ ∇, h = ¯ h, S = ¯ S, τ = ¯ τ =⇒ ∃A ∈ GL(n + 1, R) and ∃b ∈ Rn+1 s.t. ¯ f = Af + b, ¯ ξ = Aξ  
  23. Geometric pre-divergences 23 D: the standard flat affine connection on

    Rn+1 DX f∗ Y = f∗ (∇X Y ) + h(X, Y )ξ, DX ξ = −f∗ (SX) + τ(X)ξ. ∇ : the induced connection h : the affine fundamental form S : the affine shape operator τ : the transversal connection form   f : non-degenerate def ⇐⇒ h : non-degenerate {f, ξ} : equiaffine def ⇐⇒ τ = 0   w : the induced volume element (n-form) with respect to {f, ξ} def ⇐⇒ w(X1 , . . . , Xn ) := det(f∗ X1 , . . . , f∗ Xn , ξ), where “det” is the standard volume element on Rn+1.   ∇, τ, w : induced objects from {f, ξ} =⇒ (∇Y w)(X1 , . . . , Xn ) = τ(Y )w(X1 , . . . , Xn )   τ = 0 ⇐⇒ v is parallel with respect to ∇.
  24. Geometric pre-divergences 24 Fundamental structural equations for affine immersions Gauss:

    R(X, Y )Z = h(Y, Z)SX − h(X, Z)SY Codazzi: (∇X h)(Y, Z) + τ(X)h(Y, Z) = (∇Y h)(X, Z) + τ(Y )h(X, Z) (∇X S)(Y ) − τ(X)SY = (∇Y S)(X) − τ(Y )SX Ricci: h(X, SY ) − h(Y, SX) = (∇X τ)(Y ) − (∇Y τ)(X)   Proposition 2.3 M : simply connected. ∇ : a torsion-free affine connection, S : a (1, 1)-tensor field h : a semi-Riemannian metric τ : a 1-form ∇, h, S and τ satisfy above equations =⇒ There exists an affine immersion {f, ξ} which induces given ∇, h, S and τ.     Proposition 2.4 {f, ξ} : non-degenerate, =⇒ (M, ∇, h) is a statistical manifold, equiaffine 1-conoformally flat. If M is simply connected, the converse also holds.  
  25. Geometric pre-divergences 25 U : an open domain of Rn

    ψ : a funcion on U   We say that {f, ξ} : U → Rn+1 is a graph immersion def ⇐⇒ f :   θ1 . . . θn   →     θ1 . . . θn ψ(θ)     , ξ =     0 . . . 0 1       Set ∂i = ∂ ∂θi , ψij = ∂2ψ ∂θi∂θj . Then we have D∂i f∗ ∂j = ψij ξ. This implies that ∇ is flat and {θi} is the ∇-affine coordinate system.   Proposition 2.5 (U, ∇, h) : a Hessian manifold ψ : a potential function of h =⇒ (U, ∇, h) can be realized in Rn+1 by a graph immersion with potential ψ.  
  26. Geometric pre-divergences 26 {f, ξ} : nondegenerate, equiaffine Rn+1 :

    the dual space of Rn+1 (≃ Tf(p) Rn+1 (∀p ∈ M)) ⟨ , ⟩ : the canonical pairing of Rn+1 and Rn+1.   v : M → Rn+1 is the conormal map of {f, ξ} def ⇐⇒ ⟨v(p), ξp ⟩ = 1, ⟨v(p), f∗ Xp ⟩ = 0   Geometric divergence   DG : M × M → R : the geometric divergence of (M, ∇, h). def ⇐⇒ DG (p, q) = ⟨v(q), f(p) − f(q)⟩.   cf. affine support function DA : Rn+1 × M → R: DA (x, q) = ⟨v(q), x − f(q)⟩ • DG s independent of realization of (M, ∇, h). • DG is a contrast function on M. Remark 2.6 The geometric divergence is globally defined, whereas most divergence is locally defined.
  27. Geometric pre-divergences 27 (U, ∇, h) : a simply connected

    Hessian manifold, U ⊂ Rn : an open domain ( =⇒ (U, h, ∇, ∇∗) is a dually flat space.) =⇒ ∃ψ : a funcion on U (potential function) such that ∂2ψ ∂θi∂θj = gij =⇒ {f, ξ} : an affine immersion (graph immersion) f :   θ1 . . . θn   →     θ1 . . . θn ψ(θ)     , ξ =     0 . . . 0 1     v : the conormal map of {f, ξ}, v = (−η1 , . . . , −ηn , 1) ηi = ∂ψ ∂θi By setting ϕ(q) = ∑ ηi (q)θi(q) − ψ(q), we have DG (p, q) = ⟨v(q), f(p) − f(q)⟩ = − ∑ ηi (q)θi(p) + ψ(p) + ∑ ηi (q)θi(q) − ψ(q) = ψ(p) + ϕ(q) − ∑ ηi (q)θi(p) = D(p, q)
  28. Geometric pre-divergences 28 3 Centroaffine immersions of codimension two M:

    an n-dimensional manifold f : M → Rn+2: an immersion ξ: a local vector field along f   Definition 3.1 {f, ξ} : M → Rn+2 is a centroaffine immersions of codimension two def ⇐⇒ For an arbitrary point p ∈ M, Tf(p) Rn+2 = f∗ (Tp M) ⊕ R{ξp } ⊕ R{f(p)} ξ: a transversal vector field   D: the standard flat affine connection on Rn+2 DX f∗ Y = f∗ (∇X Y ) + h(X, Y )ξ + k(X, Y )f, DX ξ = −f∗ (SX) + τ(X)ξ + υ(X)f.
  29. Geometric pre-divergences 29 ∇ : the induced connection h :

    the affine fundamental form τ : the transversal connection form S : the affine shape operator w(X1 , · · · , Xn ) := det(f∗ X1 , · · · , f∗ Xn , ξ, f) the induced volume element   Proposition 3.2 ∇X w = τ(X)w     Definition 3.3 f : non-degenerate def ⇐⇒ h : non-degenerate {f, ξ} : equiaffine def ⇐⇒ τ = 0     Proposition 3.4 {f, ξ} : M → Rn+2 : non-degenerate, equiaffine =⇒ (M, ∇, h) is a statistical manifold, conformally-projectively flat  
  30. Geometric pre-divergences 30 Duality of affine immersions Rn+2 : the

    dual vector space of Rn+2 (≃ Tf(p) Rn+2 (∀p ∈ M)) ⟨ , ⟩ : the pairing of Rn+2 and Rn+2   Definition 3.5 v, ξ∗ : M → Rn+2 def ⇐⇒ ⟨v(p), ξp ⟩ = 1 ⟨ξ∗(p), ξp ⟩ = 0, ⟨v(p), f(p)⟩ = 0 ⟨ξ∗(p), f(p)⟩ = 1, ⟨v(p), f∗ Xp ⟩ = 0 ⟨ξ∗(p), f∗ Xp ⟩ = 0,   We call v the conormal map of {f, ξ} If h is non-degenerate =⇒ {v, ξ∗} : M → Rn+2 is a centroaffine immersion of codimension two. We call {v, ξ∗} the dual map of {f, ξ}.   Proposition 3.6 {f, ξ} induces (M, ∇, h) ⇐⇒ {v, ξ∗} induces (M, ∇∗, h).  
  31. Geometric pre-divergences 31   Definition 3.5 v, ξ∗ :

    M → Rn+2 : the dual map of {f, ξ}. def ⇐⇒ ⟨v(p), ξp ⟩ = 1 ⟨ξ∗(p), ξp ⟩ = 0, ⟨v(p), f(p)⟩ = 0 ⟨ξ∗(p), f(p)⟩ = 1, ⟨v(p), f∗ Xp ⟩ = 0 ⟨ξ∗(p), f∗ Xp ⟩ = 0,     Definition 3.7 DG : M × M → R : the geometric divergence def ⇐⇒ DG (p, q) = ⟨v(q), f(p) − f(q)⟩   The geometric divergence DG is a contrast function, and this is a special form of an affine support function.
  32. Geometric pre-divergences 32 Legendre transformation Proposition 3.8 (M, g, ∇,

    ∇∗) : a dually flat space {θi} : a ∇-affine coordinate system {ηi} : a ∇∗-affine coordinate system =⇒ ∂ψ ∂θi = ηi , ∂ϕ ∂ηi = θi, ∂2ψ ∂θi∂θj = gij , ∂2ϕ ∂ηi∂ηj = gij, g ( ∂ ∂θi , ∂ ∂ηj ) = { 1 (i = j) 0 (i ̸= j), ψ(p) + ϕ(p) − n ∑ i=1 θi(p)ηi (p) = 0,   (M, ∇, g) and (M, ∇∗, g) are flat statistical manifolds.  
  33. Geometric pre-divergences 33 f =     

     θ1 . . . θn ψ 1       , ξ =       0 . . . 0 1 0       =⇒ f∗ ∂ ∂θn = ∂f ∂θn =       0 . . . 1 ∂ψ ∂θn 0       v = (−η1 , . . . , −ηn , 1, ϕ), ξ∗ = (0, . . . , 0, 0, 1)   ⟨ v(p), f∗ ∂ ∂θi ⟩ = 0 ⇐⇒ −ηi (p) + ∂ψ ∂θi (p) = 0 ⟨v(p), f(p)⟩ = 0 ⇐⇒ ψ(p) + ϕ(p) − n ∑ i=1 θi(p)ηi (p) = 0 DG (p, q) = ⟨v(q), f(p) − f(q)⟩ = ψ(p) + ϕ(q) − n ∑ i=1 θi(p)ηi (q)  
  34. Geometric pre-divergences 34 4 Quasi statistical manifolds and pre-contrast functions

    M : a manifold (an open domain in Rn) h : a nondegenerate (0, 2)-tensor filed on M. (h(X, Y ) = 0 for ∀Y ∈ X(M) =⇒ X = 0) ∇ : an affine connection on M   Definition 4.1 ∇∗: the quasi dual connection (or left conjugate connection) of ∇ with respect to h def ⇐⇒ Xh(Y, Z) = h(∇∗ X Y, Z) + h(Y, ∇X Z) for X, Y, Z ∈ X(M)   We remark that (∇∗)∗ ̸= ∇ in general.   Proposition 4.2 If h is symmetric h(X, Y ) = h(Y, X) or skew-symmetric h(X, Y ) = −h(Y, X) =⇒ (∇∗)∗ = ∇   If h is symmetric, then ∇(0) := 1 2 (∇ + ∇∗) =⇒ ∇(0)h = 0
  35. Geometric pre-divergences 35 The torsion tensor T ∇ of ∇:

    T ∇(X, Y ) = ∇X Y − ∇Y X − [X, Y ] The curvature tensor R∇ of ∇: R∇(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z   Proposition 4.3 (∇X h)(Y, Z) − (∇Y h)(X, Z) = h(T ∇∗ (X, Y ) − T ∇(X, Y ), Z) h(R∇∗ (X, Y )Z, W ) + h(Z, R∇(X, Y )W ) = 0   Definition 4.4 (M, ∇, h): a quasi statistical manifold def ⇐⇒ (∇X h)(Y, Z) − (∇Y h)(X, Z) = −h(T ∇(X, Y ), Z) In addition, if h is a semi-Riemannian metric, we say that (M, ∇, h) is a statistical manifold admitting torsion (SMAT).   Proposition 4.5 (M, ∇, h) : a quasi statistical manifold =⇒ The quasi dual connection ∇∗ of ∇ is torsion free.  
  36. Geometric pre-divergences 36   Suppose that (M, ∇, h)

    is a quasi statistical manifold. (1) (M, ∇, h) is a Hessian manifold def ⇐⇒ h is symmetric, R∇ = 0 and T ∇ = 0 ⇐⇒ (M, h, ∇, ∇∗) is a dually flat space. (2) (M, ∇, h) is a partially flat space (or space of distant parallelism) def ⇐⇒ R∇ = 0 (R∇∗ = 0, T ∇∗ = 0, but T ∇ ̸= 0, in general.) ⇐⇒ (M, h, ∇, ∇∗) is a partially flat space.   Remark 4.6 For a given affine connection, we defined another affine connection. There- fore it is necessary to assume nondegeneracy of (0, 2)-tensor field. If we can define a pair of mutually dual affine connections, we may not assume nondegeneracy of (0, 2)-tensor field.
  37. Geometric pre-divergences 37   Definition 4.7 Two quasi statistical

    manifolds (M, ∇, h) and (M, ¯ ∇, ¯ h) are 1-conformally equivalent def ⇐⇒ There exist a function ψ on M such that ¯ h(X, Y ) = eψh(X, Y ), ¯ ∇X Y = ∇X Y − h(X, Y )gradh ψ where gradh ψ is the right gradient vector field defined by h(X, gradh ψ) = dψ(X).     Definition 4.8 (M, ∇, h) is 1-conformally partially flat def ⇐⇒ (M, ∇, h) is locally 1-conformally equivalent to some partially flat quasi statistical manifold.  
  38. Geometric pre-divergences 38 4.2 Pre-contrast functions M : a manifold

    ρ : a function on M × T M ρ[X1 · · · Xi |Y1 · · · Yj Z] : a function on M defined by ρ[X1 · · · Xi |Y1 · · · Yj Z](r) := (X1 )(p) · · ·(Xi )(p) (Y1 )(q) · · · (Yj )(q) ρ(p, Zq )|p=r q=r For example, ρ[ |XY ](r) = X(q) ρ(p, Yq )|p=r q=r ρ[XZ|Y ](r) = X(p) Z(p) ρ(p, Yq )|p=r q=r ρ[XY |ZV ](r) = X(p) Y(p) Z(q) ρ(p, Vq )|p=r q=r . . .   Definition 4.9 For X, X1 , X2 , Y ∈ X(M) and f1 , f2 ∈ C∞(M), ρ : M × T M → R : a pre-contrast function on M def ⇐⇒ (1) ρ(p, (f1 X1 + f2 X2 )q ) = f1 (q)ρ(p, (X1 )q ) + f2 (q)ρ(p, (X2 )q ), (2) ρ[|X] = 0 (i.e. ∀r ∈ M, ρ(r, Xr ) = 0), (3) h(X, Y ) := −ρ[X|Y ] is non-degenerate.   D(p, q) : contrast function =⇒ X(q) D(p, q) : pre-contrast function
  39. Geometric pre-divergences 39   Proposition 4.10 We can define

    affine connections ∇ and ∇∗ by h(∇∗ X Y, Z) = −ρ[XY |Z], h(Y, ∇X Z) = −ρ[Y |XZ]. Moreover, ∇, ∇∗ : mutually dual with respect to h. ∇∗ : torsion-free   (Proof) Xh(Y, Z) = −Xρ[Y |Z] = −ρ[XY |Z] − ρ[Y |XZ] = h(∇∗ X Y, Z) + h(Y, ∇X Z) h(∇∗ X Y − ∇∗ Y X, Z) = −ρ[XY |Z] + ρ[Y X|Z] = −ρ[[X, Y ]|Z] = h([X, Y ], Z) Lemma 4.11 ρ(p, Xq ) : a pre-contrast function on M =⇒ (M, ∇, h) is a quasi-statistical manifold In particular, if h is a semi-Riemannian metric =⇒ (M, ∇, h) is a statistical manifold admitting torsion.
  40. Geometric pre-divergences 40 5 Affine distributions ω : T M

    → Rn+1: a Rn+1-valued 1-form ξ : M → Rn+1: a Rn+1-valued function   Definition 5.1 {ω, ξ} is an affine distribution def ⇐⇒ For an arbitrary point p ∈ M, Rn+1 = Image ωp ⊕ R{ξx } ξ: a transversal vector field     {f, ξ}: an affine immersion =⇒ {df, ξ}: an affine distribution   Xω(Y ) = ω(∇X Y ) + h(X, Y )ξ, Xξ = −ω(SX) + τ(X)ξ. ∇ : an affine connection (T ∇(X, Y ) ̸= 0 in general) h : a (0, 2)-tensor field (h(X, Y ) ̸= h(Y, X) in general) S : a (1, 1)-tensor field τ : a 1-form
  41. Geometric pre-divergences 41 Xω(Y ) = ω(∇X Y ) +

    h(X, Y )ξ, Xξ = −ω(SX) + τ(X)ξ.   ω : symmetric def ⇐⇒ h : symmetric ω : nondegenerate def ⇐⇒ h : nondegenerate {ω, ξ} : equiaffine def ⇐⇒ τ = 0   Symmetry and nondegeneracy of ω are independent of ξ   Proposition 5.2 Set ˜ ξ := ω(V ) + ϕξ. Then the induced objects change as follows: ∇X Y = ˜ ∇X Y + ˜ h(X, Y )V, h(X, Y ) = ϕ˜ h(X, Y ), ˜ SX − ˜ τ(X)V = ϕSX − ∇X V, ϕ˜ τ(X) = h(X, V ) + dϕ(X) + ϕτ(X).     Proposition 5.3 Image (dω)p ⊂ Image ωp ⇐⇒ h: symmetric Image (dξ)p ⊂ Image ωp ⇐⇒ τ = 0  
  42. Geometric pre-divergences 42 Fundamental structural equations for affine distributions: Gauss

    equation: R(X, Y )Z = h(Y, Z)SX − h(X, Z)SY, Codazzi equations: (∇X h)(Y, Z) + h(Y, Z)τ(X) −(∇Y h)(X, Z) + h(X, Z)τ(Y ) = −h(T ∇(X, Y ), Z), (∇X S)(Y ) + τ(Y )SX − (∇Y S)(X) − τ(X)SY = −S(T ∇(X, Y )), Ricci equation: h(X, SY ) − (∇X τ)(Y ) − h(Y, SX) + (∇Y τ)(X) = τ(T ∇(X, Y )).   Proposition 5.4 (Haba (2020)) M : simply connected. ∇ : an affine connection, S : a (1, 1)-tensor field h : a (0, 2)-tensor field, τ : a 1-form ∇, h, S and τ satisfy fundamental equations =⇒ ∃{ω, ξ} : an affine distribution which induces ∇, h, S and τ.     Theorem 5.5 (Haba (2020)) (1) {ω, ξ} : nondegenerate, equiaffine =⇒ (M, ∇, h) : 1-conformally partially flat quasi statistical manifold. (2) {ω, ξ} : symmetric, nondegenerate, equiaffine =⇒ (M, ∇, h) : 1-conformally partially flat SMAT. If M is simply connected, the converses also hold  
  43. Geometric pre-divergences 43 SMAT with the SLD Fisher metric (Kurose

    2007) Herm(d) : the set of all Hermitian matrices of degree d. S = {P ∈ Herm(d) | P > 0, traceP = 1} TP S ∼ = A0 A0 = {X ∈ Herm(d) | traceX = 0} We denote by X the corresponding vector field of X.   For P ∈ S, X ∈ A0 , define ωP (X) (∈ Herm(d)) and ξ by X = 1 2 (P ωP (X) + ωP (X)P ), ξ = −Id Then {ω, ξ} is an equiaffine distribution.   The induced quantities are given by hP (X, Y ) = 1 2 trace ( P (ωP (X)ωP (Y ) + ωP (Y )ωP (X)) ) ( = gP (X, Y ) ) , ( ∇ X Y ) p = hP (X, Y )P − 1 2 (XωP (Y ) + ωP (Y )X). (R = R∗ = 0, T ∗ = 0, but T ̸= 0)
  44. Geometric pre-divergences 44 5.2 Conormal maps and geometric pre-divergence {ω,

    ξ} : nondegenerate, equiaffine Rn+1 : the dual space of Rn+1 ⟨ , ⟩ : the canonical pairing of Rn+1 and Rn+1.   v : M → Rn+1 is the conormal map of {ω, ξ} def ⇐⇒ ⟨v(p), ξp ⟩ = 1, ⟨v(p), ω(Xp )⟩ = 0     We define a function on M × T M by ρ(q, X) = ⟨v(q), ω(X)⟩. ρ is called the geometric pre-divergence on M.     Theorem 1 (M, ∇, h) : a simply connected 1-conformally partially flat quasi sta- tistical manifold =⇒ there exists a pre-contrast function which induces (M, ∇, h).  
  45. Geometric pre-divergences 45 6 Generalized projection theorems Theorem 2 {ω,

    ξ} : an affine distribution to Rn+1 (M, ∇, h) : a quasi statistical manifold induced from {ω, ξ} with the quasi-dual connection ∇∗ ρ : the geometric quasi-divergence on (M, ∇, h) N ⊂ M : a submanifold in M p ∈ M\N, q ∈ N γ : the ∇∗ geodesic connecting p and q Then γ ⊥ N at q (i.e. h( ˙ γ(0), V ) = 0, ∀V ∈ Tq N) ⇐⇒ ρ(V, γ(t)) = 0 Remark 6.1 h(V, ˙ γ(0)) ̸= 0 in general
  46. Geometric pre-divergences 46 References [1] Haba, K., 1-Conformal geometry of

    quasi statistical manifolds, Inf. Geom., (2020). [2] Haba, K. and Matsuzoe H., Complex affine distributions. Differential Geom. Appl., 75 (2021), 101734. [3] Henmi, M. and Matsuzoe, H., Statistical manifolds admitting tor- sion and partially flat spaces, Geometric structures of information, Signals Commun. Technol., Springer, Cham, 2019, 37–50. [4] Kurose, T., Dual connections and affine geometry, Math. Z., 203 (1990), no. 1, 115–121. [5] Kurose, T., On the divergences of 1-conformally flat statistical man- ifolds, Tohoku Math. J., 46 (1994), no. 3, 427–433. [6] Kurose, T., Statistical manifolds admitting torsion. Geometry and Something, 2007, in Japanese.
  47. Geometric pre-divergences 47 [7] Blaschke, W., Vorlesungen ¨ uber Differentialgeometrie

    II, Affine Differentialgeometrie, Springer, Berlin, 1923. [8] Hotelling, H., Spaces of statistical parameters. Bull. Am. Math. Soc. (AMS), 36(1930), 191. [9] Norden, A. P., ¨ Uber Paare konjugierter Parallel¨ ubertragungen, (On the pairs of adjoint parallel transports, German), Trudy semin. vekt. tenzorn. anal., 4(1937), 205–255. [10] Norden, A. P., On pairs of conjugate parallel translations in n di- mensional spaces, C.R. (Doklady) Acad. Sci. URSS (N.S.), 49(1945), 625–628. [11] Rao, R. C., Information and the accuracy attainable in the estima- tion of statistical parameters. Bull. Calcutta Math. Soc., 37(1945), 81–91. [12] Sen, R. N., On parallelism in Riemannian space, Bull. Calcutta Math. Soc., 36 (1944), 102–107. [13] Sen, R. N., On parallelism in Riemannian space. II. Bull. Calcutta Math. Soc., 37 (1945), 153–159.
  48. Geometric pre-divergences 48 [14] Sen, R. N., On parallelism in

    Riemannian space. III, Bull. Calcutta Math. Soc., 38 (1946), 161–167. [15] ˇ Cencov, N. N., A nonsymmetric distance between probability dis- tributions, and entropy and the theorem of Pythagoras. (Russian), Mat. Zametki, 4 (1968), 323–332. [16] ˇ Cencov, N. N., Statistical decision rules and optimal inferences, (Russian) Izdat. ”Nauka”, Moscow, 1972. 520 pp. [17] ˇ Cencov, N. N., Statistical decision rules and optimal inference. Translation from the Russian edited by Lev J. Leifman. Translations of Mathematical Monographs, 53. American Mathematical Society, Providence, R.I., 1982. viii+499 pp. [18] Csisz´ ar, I., I-divergence geometry of probability distributions and minimization problems, Ann. Probability, 3 (1975), 146–158. [19] Nagaoka, H. and Amari, S., Differential geometry of smooth families of probability distributions, Technical Report METR 82-7, Univer- sity of Tokyo, 1982.