Shrunk subspace via operator Sinkhorn iteration

Slide 1

Slide 1 text

Shrunk subspace via operator Sinkhorn iteration Cole Franks Tasuku Soma Michel Goemans MIT/Gridspace MIT MIT 1 / 19

Slide 2

Slide 2 text

1 Overview 2 / 19

Slide 3

Slide 3 text

Edmonds’ problem Given: A = x1 A1 + · · · + xp Ap (xi: indeterminate, Ai ∈ Cn×n: matrix) Determine: Is det(A) the zero polynomial? • If we can use randomness, it is easy! (Schwartz-Zippel lemma) Can we do deterministically? −→ polynomial identity testing • Deep connection to combinatorial optimization and complexity theory [Edmonds, 1967; Lovász, 1989; Murota, 2009] • Computing rank of A is difficult, but computing non-commutative rank of A is in P [Gurvits, 2004; Garg et al., 2020] 3 / 19

Slide 4

Slide 4 text

Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A = min{n − dim U + dim( i Ai U) : U subspace in Cn}. 4 / 19

Slide 5

Slide 5 text

Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A = min{n − dim U + dim( i Ai U) : U subspace in Cn}. Properties • “rank” of A where xi xj ̸= xj xi; formally defined over free skew field [Amitsur, 1966] • rank A ≤ nc-rank A ≤ 2 rank A • Many interesting special cases (e.g. bipartite matching, linear matroid intersection) satisfy det A ≡ 0 ⇐⇒ nc-rank A < n Shrunk subspace ... optimal subspace of RHS (dual certificate of nc-rank) Applications: separation in Brascamp-Lieb polytope, fractional linear matroid matching, one parameter subgroup of null-cone membership, etc. 4 / 19

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A = min{n − dim U + dim( i Ai U) : U subspace in Cn}. Algorithms for nc-rank • Operator scaling [Garg et al., 2020; Allen-Zhu et al., 2018] • Algebraic augmenting path [Ivanyos, Qiao, and Subrahmanyam, 2018] • Discrete convex analysis in CAT(0) [Hamada and Hirai, 2021] simple (aka operator Sinkhorn) only optimal value can find U very complicated can find U bit-complexity issue 4 / 19

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Our result Theorem (Franks, S., Goemans, 2023) There is a Sinkhorn-style algorithm that finds the minimum shrunk subspace in deterministic polynomial time. • RHS is submodular minimization on modular lattice. =⇒ ∃ minimum minimizer • Applications: Fractional linear matroid matching and separation on rank-2 BL polytope • Time-complexity IQS18 HH21 this work at least O(pn18) at least O(pn17) ˜ O(n12(n + p)) n: matrix size, p: # of indeterminates f(U) + f(V ) ≥ f(U + V ) + f(U ∩ V ) 5 / 19

Slide 10

Slide 10 text

In what follows... Instead of nc-rank/operator scaling nc-rank A = min{n − dim U + dim( i Ai U) : U subspace in Cn}. we’ll focus on special case: maximum matching/matrix scaling max M: matching |M| = min{n − |X| + |Γ(X)| : X column subset}. 6 / 19

Slide 11

Slide 11 text

2 Algorithm 7 / 19

Slide 12

Slide 12 text

Matrix scaling Input: nonnegative matrix A ∈ Rn×n + , ε > 0 Output: positive diagonal matrices L, R s.t. ∥(LAR)1 − 1∥ ≤ ε ∥(LAR)⊤1 − 1∥ ≤ ε A is scalable def ⇐⇒ ∀ε > 0, there is solution Theorem (Sinkhorn and Knopp (1967)) A is scalable ⇐⇒ ∃ perfect matching in support graph of A ⇐⇒ |Γ(X)| ≥ |X| (X: column subset)   .3 .2 .5 0 .8 .5 .7 0 0   X Γ(X) 8 / 19

Slide 13

Slide 13 text

Sinkhorn algorithm For t = 0, 1, 2, . . . A(2t+1) = Diag(A(2t)1)−1A(2t), scale rows to sum to 1 A(2t+2) = A(2t+1) Diag((A(2t+1)⊤1)−1. scale columns to sum to 1 9 / 19

Slide 14

Slide 14 text

Sinkhorn algorithm For t = 0, 1, 2, . . . A(2t+1) = Diag(A(2t)1)−1A(2t), scale rows to sum to 1 A(2t+2) = A(2t+1) Diag((A(2t+1)⊤1)−1. scale columns to sum to 1 As convex optimization f(x, y) = i,j aij exi +yj − i xi − j yj • ∥∇f(x, y)∥ ≤ ε ⇐⇒ (L, R) = (Diag(ex), Diag(ey)) solution • Sinkhorn algorithm = alternating minimization for minx,y∈Rn f(x, y) 9 / 19

Slide 15

Slide 15 text

Dominant independent set max M: matching |M| = min (S, T): independent set [2n − |S| − |T|] • Maximum independent sets form distributive lattice: (S, T), (S′, T′) independent =⇒ (S ∪ S′, T ∩ T′), (S ∩ S′, T ∪ T′) independent • dominant independent set: unique independent set (S∗, T∗) with T∗ inclusion-wise minimum Q. Can we find dominant independent set with Sinkhorn? −→ new scaling problem “matrix (k, r)-scaling” 10 / 19

Slide 16

Slide 16 text

Matrix (k, r)-scaling Input: nonnegative A ∈ Rn×n + , ε > 0, k > 0, r ∈ Z+ Find: positive diagonal L, R, z ≥ 0 s.t. ˜ A = ezLAR satisfies ˜ A1 ≤ 1 row sum , ˜ A⊤1 ≺w αr col sum majorization , 1⊤ ˜ A1 ≥ k − ε total sum where αr = (1, . . . , 1 n−r , 1 − 1/n, . . . , 1 − 1/n r ) (Weak) Majorization: For α, β ∈ Rn, we say α ≺w β def ⇐⇒ α↓ 1 + · · · + α↓ i ≤ β↓ 1 + · · · + β↓ i (i = 1, . . . , n) (α↓ i : the ith largest entry of α) Example: (0.5, 0.5, 0.5) ≺w (1.0, 0.4, 0.1) (0.9, 0.6, 0.0) ≺w (1.5, 0.0, 0.0) 11 / 19

Slide 17

Slide 17 text

Intuition from DM decomposition Let k = k∗ (size of maximum matching) Let (S∗, T∗) be dominant independent set. DM decomposition says: T∗ = {vertex exposed by some maximum matching} T∗ = {vertex contained in all maximum matchings} r ≤ |T∗| ⇐⇒ maximum matching polytope ∩ {col sum ≺w αr } ̸= ∅ T∗ S∗ 12 / 19

Slide 18

Slide 18 text

Sinkhorn for matrix (k, r)-scaling For t = 0, 2, 4, . . . 1 If total sum of A(t) < k − ε, scale up all entries with z to sum to k − ε; set A(t+1) ← ezA(t). 13 / 19

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Sinkhorn for matrix (k, r)-scaling For t = 0, 2, 4, . . . 1 If total sum of A(t) < k − ε, scale up all entries with z to sum to k − ε; set A(t+1) ← ezA(t). 2 If A(t+1) violates row/col sum constraint, update L or R to satisfy most violated constraint; Update of L: scale down each row to sum ≤ 1 Update of R: Kullback-Leibler projection of col sum vector A(t)1 onto down-closure of permutahedron of αr (O(n2) time [Suehiro et al., 2012]) set A(t+2) ← LA(t+1)R. 13 / 19

Slide 21

Slide 21 text

Matrix (k, r)-scaling as convex optimization fk,r (x, y, z) = i,j aij e−xi−yj+z + i xi + αr · y↓ − kz • x, y ≥ 0, z ≥ 0 KKT point ⇐⇒ (L, R, z) = (Diag(ex), Diag(ey), z) solution • f is convex • Sinkhorn algorithm = block coordinate descent for infx,y≥0,z≥0 f(x, y, z) =⇒ O(1/ √ t) convergence 14 / 19

Slide 22

Slide 22 text

Matrix (k, r)-scaling Theorem The following conditions are equivalent: 1 infx,y≥0,z≥0 fk,r (x, y, z) > −∞ 2 A is (k, r)-scalable 3 k ≤ k∗ and r ≤ r∗ := |T∗| Further, one can check the above conditions with poly many Sinkhorn iterations. For (k, r) = (k∗, r∗ + 1), the dominant independent set can be found via poly many Sinkhorn iterations. fk,r (x, y, z) = i,j aij e−xi−yj +z + i xi + αr · y↓ − kz 15 / 19

Slide 23

Slide 23 text

Algorithm for dominant independent set compute k∗ compute r∗ by (k∗, r)-scaling for (k, r) = (k∗, r∗ + 1), find (x, y, z) with fk,r (x, y, z) small Round (x, y, z) to dominant independent set (S∗, T∗) k∗ := size of max matching r∗ := |T∗| standard Sinkhorn iteration Sinkhorn iteration for (k, r)-scaling combinatorial operations 16 / 19

Slide 24

Slide 24 text

Extension to shrunk subspace (S, T) is independent subspace def ⇐⇒ tr(ΠS Ai ΠT ) = 0 for any i nc-rank A = min{2n − dim S − dim T : (S, T) independent subsp}. Ai ∼ T ∗ ∗ S ∗ O 17 / 19

Slide 25

Slide 25 text

Extension to shrunk subspace (S, T) is independent subspace def ⇐⇒ tr(ΠS Ai ΠT ) = 0 for any i nc-rank A = min{2n − dim S − dim T : (S, T) independent subsp}. Ai ∼ T ∗ ∗ S ∗ O Operator (k, r)-scaling Input: A1 , . . . , Ap ∈ Cn×n, ε > 0, k > 0, r ∈ Z+ Find: nonsingular L, R, z ≥ 0 s.t. ˜ Ai = ezLAi R satisfies p i=1 ˜ Ai ˜ Ai † ⪯ I, λ p i=1 ˜ Ai † ˜ Ai ≺w αr , tr p i=1 ˜ Ai ˜ Ai † ≥ k − ε • (k, r)-scalable ⇐⇒ nc-rank A ≤ k and r ≤ dim T∗ • We can find dominant independent subspace (S∗, T∗) via operator Sinkhorn iteration. But rounding becomes intricate due to continuous freedom of subspaces! (bit complexity bound and stability are needed) 17 / 19

Slide 26

Slide 26 text

3 Conclusion 18 / 19

Slide 27

Slide 27 text

Conclusion Summary • Simpler algorithm for finding shrunk subspace based on Sinkhorn iteration • New operator scaling problems: (k, r)-scaling, Majorization scaling • Applications: fractional linear matroid matching and rank-2 BL polytope Outlook • Applications of new scaling problems to other problems? 19 / 19

Slide 28

Slide 28 text

References I Allen-Zhu, Z., A. Garg, Y. Li, R. Oliveira, and A. Wigderson (2018). “Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing”. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp. 172–181. Amitsur, S. (1966). “Rational identities and applications to algebra and geometry”. In: Journal of Algebra 3.3, pp. 304–359. Cohn, P. M. (2003). “Skew fields”. In: Further Algebra and Applications. Springer, pp. 343–370. Edmonds, J. (1967). “Systems of distinct representatives and linear algebra”. In: Journal of Research of the National Bureau of Standards B71, pp. 241–245. Garg, A., L. Gurvits, R. Oliveira, and A. Wigderson (2020). “Operator Scaling: Theory and Applications”. In: Foundations of Computational Mathematics, pp. 223–290. Gurvits, L. (2004). “Classical complexity and quantum entanglement”. In: Journal of Computer and System Sciences 69.3, pp. 448–484. Hamada, M. and H. Hirai (2021). “Computing the nc-rank via discrete convex optimization on CAT(0) spaces”. In: SIAM Journal on Applied Algebra and Geometry 5.3, pp. 455–478. doi: 10.1137/20M138836X. Ivanyos, G., Y. Qiao, and K. V. Subrahmanyam (2017). “Non-commutative Edmonds’ problem and matrix semi-invariants”. In: computational complexity 26.3, pp. 717–763. issn: 1420-8954. — (2018). “Constructive non-commutative rank computation is in deterministic polynomial time”. In: computational complexity 27.4, pp. 561–593. Lovász, L. (1989). “Singular spaces of matrices and their application in combinatorics”. In: Bulletin of the Brazilian Mathematical Society 20, pp. 87–99. Murota, K. (2009). Matrices and Matroids for System Analysis. 2nd. Springer-Verlag, Berlin. 1 / 7

Slide 29

Slide 29 text

References II Sinkhorn, R. and P. Knopp (1967). “Concerning nonnegative matrices and doubly stochastic matrices”. In: Pacific Journal of Mathematics 21, pp. 343–348. Suehiro, D., K. Hatano, S. Kijima, E. Takimoto, and K. Nagano (2012). “Online prediction under submodular constraints”. In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory (ALT), pp. 260–274. 2 / 7

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Dominant shrunk subspace A = i xiAi ←→ Φ(X) = i AiXA† i S, T ≤ Cn is independent subspace def ⇐⇒ ΠS • Φ(ΠT ) = 0 (ΠS := orthogonal projector onto S ⇐⇒ Ai ∼ T ∗ ∗ S ∗ O (i = 1, . . . , p) nc-rank A = min{2n − dim S − dim T : (S, T) independent subspace}. • Maximum independent subspaces form modular lattice: (S, T), (S′, T′) independent =⇒ (S ∩ S′, T + T′), (S + S′, T ∩ T′) independent • ∴ ∃ maximum independent subspace (S, T) with T minumum (dominant independent subspace) Idea: Find dominant independent subspace by “operator (k, r)-scaling” 3 / 7

Slide 32

Slide 32 text

Algorithm for dominant shrunk subspace compute k∗ by k-operator scaling compute r∗ by (k∗, r)-operator scaling For (k, r) = (k∗, r∗ + 1), find (X, Y, z) with fk,r (X, Y, z) small via Sinkhorn iteration k∗ := nc-rank A r∗ := dim T∗ Apply rounding for nonnegative matrix constructed from (X, Y, z) to obtain ε-independent subspace S, T Compute orthogonal projector ΠT onto T Round entries of ΠT to nearest poly-bit rationals to obtain ΠT∗ Rounding becomes intricate due to continuous freedom of unitary matrix 4 / 7

Slide 33

Slide 33 text

Rounding (for matrix scaling) Observation fk (x, y, z) is sufficiently small =⇒ (˜ x, ˜ y) = (x/z, y/z) is a fractional vertex cover of size ≤ k. Sort rows and cols of A in nonincreasing order of ˜ x, ˜ y, respectively Let I := {(i, j) : i + j = ⌈k⌉ + 1}. • 1 |I| (i,j)∈I ˜ xi + ˜ yj < 1 • ∃(i, j) ∈ I s.t. ˜ xi + ˜ yj < 1. So ai,j = 0. • (S, T) = ({i, . . . , n}, {j, . . . , n}) has size > 2n − k O I S T Operator scaling: Only obtain (S, T) s.t. tr(ΠS Ai ΠT ) ≤ ε for any i. 5 / 7

Slide 34

Slide 34 text

Rounding for operator scaling Lemma (stability lemma) If subspaces S, T satisfies ΠS • Φ(ΠT ) ≤ ε, dim S + dim T = 2n − k∗, dim T ≤ r∗ , then ∥ΠS − ΠS∗ ∥2 , ∥ΠT − ΠT∗ ∥2 ≤ e ˜ O(n3)ε Lemma (bit complexity) Each entry of ΠT∗ is a rational number with bit length M1 = ˜ O(n5). Proof idea: minimum shrunk subspace = limit of Wong sequence (cf.[Ivanyos, Qiao, and Subrahmanyam, 2017]) For ε := e− ˜ O(n5), one can round each entry of ΠT to nearest rational with bit length ≤ M1 to obtain ΠT∗ . 6 / 7

Slide 35

Slide 35 text

Majorization operator scaling Input: CP map Φ : X → i Ai XA† i , nonnegative vectors p, q ∈ Rn +, ε, k > 0 Find: Nonsingular L, R, z ≥ 0 s.t. ˜ Φ = ezΦL,R satisfies: ˜ Φ(I) ≺w p, ˜ Φ∗(I) ≺w q, tr ˜ Φ(I) ≥ k (p, q) = (1, αr ) is (k, r)-scaling As g-convex opitmization f(X, Y, z) = ezX−1 • Φ(Y −1) + α⊤λ(log X) + β⊤λ(log Y ) − kz Similar theorem/Sinkhorn holds as (k, r)-scaling 7 / 7