Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Shrunk subspace via operator Sinkhorn iteration

Shrunk subspace via operator Sinkhorn iteration

SODA23

Tasuku Soma

January 23, 2023
Tweet

More Decks by Tasuku Soma

Other Decks in Research

Transcript

  1. Shrunk subspace via operator Sinkhorn iteration Cole Franks Tasuku Soma

    Michel Goemans MIT/Gridspace MIT MIT 1 / 19
  2. 1 Overview 2 / 19

  3. Edmonds’ problem Given: A = x1 A1 + · ·

    · + xp Ap (xi: indeterminate, Ai ∈ Cn×n: matrix) Determine: Is det(A) the zero polynomial? • If we can use randomness, it is easy! (Schwartz-Zippel lemma) Can we do deterministically? −→ polynomial identity testing • Deep connection to combinatorial optimization and complexity theory [Edmonds, 1967; Lovász, 1989; Murota, 2009] • Computing rank of A is difficult, but computing non-commutative rank of A is in P [Gurvits, 2004; Garg et al., 2020] 3 / 19
  4. Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A =

    min{n − dim U + dim( i Ai U) : U subspace in Cn}. 4 / 19
  5. Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A =

    min{n − dim U + dim( i Ai U) : U subspace in Cn}. Properties • “rank” of A where xi xj ̸= xj xi; formally defined over free skew field [Amitsur, 1966] • rank A ≤ nc-rank A ≤ 2 rank A • Many interesting special cases (e.g. bipartite matching, linear matroid intersection) satisfy det A ≡ 0 ⇐⇒ nc-rank A < n Shrunk subspace ... optimal subspace of RHS (dual certificate of nc-rank) Applications: separation in Brascamp-Lieb polytope, fractional linear matroid matching, one parameter subgroup of null-cone membership, etc. 4 / 19
  6. Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A =

    min{n − dim U + dim( i Ai U) : U subspace in Cn}. Algorithms for nc-rank • Operator scaling [Garg et al., 2020; Allen-Zhu et al., 2018] • Algebraic augmenting path [Ivanyos, Qiao, and Subrahmanyam, 2018] • Discrete convex analysis in CAT(0) [Hamada and Hirai, 2021] 4 / 19
  7. Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A =

    min{n − dim U + dim( i Ai U) : U subspace in Cn}. Algorithms for nc-rank • Operator scaling [Garg et al., 2020; Allen-Zhu et al., 2018] • Algebraic augmenting path [Ivanyos, Qiao, and Subrahmanyam, 2018] • Discrete convex analysis in CAT(0) [Hamada and Hirai, 2021] simple (aka operator Sinkhorn) only optimal value can find U very complicated can find U bit-complexity issue 4 / 19
  8. Non-commutative rank Amitsur (1966) and Cohn (2003) nc-rank A =

    min{n − dim U + dim( i Ai U) : U subspace in Cn}. Algorithms for nc-rank • Operator scaling [Garg et al., 2020; Allen-Zhu et al., 2018] • Algebraic augmenting path [Ivanyos, Qiao, and Subrahmanyam, 2018] • Discrete convex analysis in CAT(0) [Hamada and Hirai, 2021] simple (aka operator Sinkhorn) only optimal value can find U very complicated can find U bit-complexity issue Can we find shrunk subspace by Sinkhorn-style algorithm? 4 / 19
  9. Our result Theorem (Franks, S., Goemans, 2023) There is a

    Sinkhorn-style algorithm that finds the minimum shrunk subspace in deterministic polynomial time. • RHS is submodular minimization on modular lattice. =⇒ ∃ minimum minimizer • Applications: Fractional linear matroid matching and separation on rank-2 BL polytope • Time-complexity IQS18 HH21 this work at least O(pn18) at least O(pn17) ˜ O(n12(n + p)) n: matrix size, p: # of indeterminates f(U) + f(V ) ≥ f(U + V ) + f(U ∩ V ) 5 / 19
  10. In what follows... Instead of nc-rank/operator scaling nc-rank A =

    min{n − dim U + dim( i Ai U) : U subspace in Cn}. we’ll focus on special case: maximum matching/matrix scaling max M: matching |M| = min{n − |X| + |Γ(X)| : X column subset}. 6 / 19
  11. 2 Algorithm 7 / 19

  12. Matrix scaling Input: nonnegative matrix A ∈ Rn×n + ,

    ε > 0 Output: positive diagonal matrices L, R s.t. ∥(LAR)1 − 1∥ ≤ ε ∥(LAR)⊤1 − 1∥ ≤ ε A is scalable def ⇐⇒ ∀ε > 0, there is solution Theorem (Sinkhorn and Knopp (1967)) A is scalable ⇐⇒ ∃ perfect matching in support graph of A ⇐⇒ |Γ(X)| ≥ |X| (X: column subset)   .3 .2 .5 0 .8 .5 .7 0 0   X Γ(X) 8 / 19
  13. Sinkhorn algorithm For t = 0, 1, 2, . .

    . A(2t+1) = Diag(A(2t)1)−1A(2t), scale rows to sum to 1 A(2t+2) = A(2t+1) Diag((A(2t+1)⊤1)−1. scale columns to sum to 1 9 / 19
  14. Sinkhorn algorithm For t = 0, 1, 2, . .

    . A(2t+1) = Diag(A(2t)1)−1A(2t), scale rows to sum to 1 A(2t+2) = A(2t+1) Diag((A(2t+1)⊤1)−1. scale columns to sum to 1 As convex optimization f(x, y) = i,j aij exi +yj − i xi − j yj • ∥∇f(x, y)∥ ≤ ε ⇐⇒ (L, R) = (Diag(ex), Diag(ey)) solution • Sinkhorn algorithm = alternating minimization for minx,y∈Rn f(x, y) 9 / 19
  15. Dominant independent set max M: matching |M| = min (S,

    T): independent set [2n − |S| − |T|] • Maximum independent sets form distributive lattice: (S, T), (S′, T′) independent =⇒ (S ∪ S′, T ∩ T′), (S ∩ S′, T ∪ T′) independent • dominant independent set: unique independent set (S∗, T∗) with T∗ inclusion-wise minimum Q. Can we find dominant independent set with Sinkhorn? −→ new scaling problem “matrix (k, r)-scaling” 10 / 19
  16. Matrix (k, r)-scaling Input: nonnegative A ∈ Rn×n + ,

    ε > 0, k > 0, r ∈ Z+ Find: positive diagonal L, R, z ≥ 0 s.t. ˜ A = ezLAR satisfies ˜ A1 ≤ 1 row sum , ˜ A⊤1 ≺w αr col sum majorization , 1⊤ ˜ A1 ≥ k − ε total sum where αr = (1, . . . , 1 n−r , 1 − 1/n, . . . , 1 − 1/n r ) (Weak) Majorization: For α, β ∈ Rn, we say α ≺w β def ⇐⇒ α↓ 1 + · · · + α↓ i ≤ β↓ 1 + · · · + β↓ i (i = 1, . . . , n) (α↓ i : the ith largest entry of α) Example: (0.5, 0.5, 0.5) ≺w (1.0, 0.4, 0.1) (0.9, 0.6, 0.0) ≺w (1.5, 0.0, 0.0) 11 / 19
  17. Intuition from DM decomposition Let k = k∗ (size of

    maximum matching) Let (S∗, T∗) be dominant independent set. DM decomposition says: T∗ = {vertex exposed by some maximum matching} T∗ = {vertex contained in all maximum matchings} r ≤ |T∗| ⇐⇒ maximum matching polytope ∩ {col sum ≺w αr } ̸= ∅ T∗ S∗ 12 / 19
  18. Sinkhorn for matrix (k, r)-scaling For t = 0, 2,

    4, . . . 1 If total sum of A(t) < k − ε, scale up all entries with z to sum to k − ε; set A(t+1) ← ezA(t). 13 / 19
  19. Sinkhorn for matrix (k, r)-scaling For t = 0, 2,

    4, . . . 1 If total sum of A(t) < k − ε, scale up all entries with z to sum to k − ε; set A(t+1) ← ezA(t). 2 If A(t+1) violates row/col sum constraint, update L or R to satisfy most violated constraint; set A(t+2) ← LA(t+1)R. 13 / 19
  20. Sinkhorn for matrix (k, r)-scaling For t = 0, 2,

    4, . . . 1 If total sum of A(t) < k − ε, scale up all entries with z to sum to k − ε; set A(t+1) ← ezA(t). 2 If A(t+1) violates row/col sum constraint, update L or R to satisfy most violated constraint; Update of L: scale down each row to sum ≤ 1 Update of R: Kullback-Leibler projection of col sum vector A(t)1 onto down-closure of permutahedron of αr (O(n2) time [Suehiro et al., 2012]) set A(t+2) ← LA(t+1)R. 13 / 19
  21. Matrix (k, r)-scaling as convex optimization fk,r (x, y, z)

    = i,j aij e−xi−yj+z + i xi + αr · y↓ − kz • x, y ≥ 0, z ≥ 0 KKT point ⇐⇒ (L, R, z) = (Diag(ex), Diag(ey), z) solution • f is convex • Sinkhorn algorithm = block coordinate descent for infx,y≥0,z≥0 f(x, y, z) =⇒ O(1/ √ t) convergence 14 / 19
  22. Matrix (k, r)-scaling Theorem The following conditions are equivalent: 1

    infx,y≥0,z≥0 fk,r (x, y, z) > −∞ 2 A is (k, r)-scalable 3 k ≤ k∗ and r ≤ r∗ := |T∗| Further, one can check the above conditions with poly many Sinkhorn iterations. For (k, r) = (k∗, r∗ + 1), the dominant independent set can be found via poly many Sinkhorn iterations. fk,r (x, y, z) = i,j aij e−xi−yj +z + i xi + αr · y↓ − kz 15 / 19
  23. Algorithm for dominant independent set compute k∗ compute r∗ by

    (k∗, r)-scaling for (k, r) = (k∗, r∗ + 1), find (x, y, z) with fk,r (x, y, z) small Round (x, y, z) to dominant independent set (S∗, T∗) k∗ := size of max matching r∗ := |T∗| standard Sinkhorn iteration Sinkhorn iteration for (k, r)-scaling combinatorial operations 16 / 19
  24. Extension to shrunk subspace (S, T) is independent subspace def

    ⇐⇒ tr(ΠS Ai ΠT ) = 0 for any i nc-rank A = min{2n − dim S − dim T : (S, T) independent subsp}. Ai ∼ T ∗ ∗ S ∗ O 17 / 19
  25. Extension to shrunk subspace (S, T) is independent subspace def

    ⇐⇒ tr(ΠS Ai ΠT ) = 0 for any i nc-rank A = min{2n − dim S − dim T : (S, T) independent subsp}. Ai ∼ T ∗ ∗ S ∗ O Operator (k, r)-scaling Input: A1 , . . . , Ap ∈ Cn×n, ε > 0, k > 0, r ∈ Z+ Find: nonsingular L, R, z ≥ 0 s.t. ˜ Ai = ezLAi R satisfies p i=1 ˜ Ai ˜ Ai † ⪯ I, λ p i=1 ˜ Ai † ˜ Ai ≺w αr , tr p i=1 ˜ Ai ˜ Ai † ≥ k − ε • (k, r)-scalable ⇐⇒ nc-rank A ≤ k and r ≤ dim T∗ • We can find dominant independent subspace (S∗, T∗) via operator Sinkhorn iteration. But rounding becomes intricate due to continuous freedom of subspaces! (bit complexity bound and stability are needed) 17 / 19
  26. 3 Conclusion 18 / 19

  27. Conclusion Summary • Simpler algorithm for finding shrunk subspace based

    on Sinkhorn iteration • New operator scaling problems: (k, r)-scaling, Majorization scaling • Applications: fractional linear matroid matching and rank-2 BL polytope Outlook • Applications of new scaling problems to other problems? 19 / 19
  28. References I Allen-Zhu, Z., A. Garg, Y. Li, R. Oliveira,

    and A. Wigderson (2018). “Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing”. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp. 172–181. Amitsur, S. (1966). “Rational identities and applications to algebra and geometry”. In: Journal of Algebra 3.3, pp. 304–359. Cohn, P. M. (2003). “Skew fields”. In: Further Algebra and Applications. Springer, pp. 343–370. Edmonds, J. (1967). “Systems of distinct representatives and linear algebra”. In: Journal of Research of the National Bureau of Standards B71, pp. 241–245. Garg, A., L. Gurvits, R. Oliveira, and A. Wigderson (2020). “Operator Scaling: Theory and Applications”. In: Foundations of Computational Mathematics, pp. 223–290. Gurvits, L. (2004). “Classical complexity and quantum entanglement”. In: Journal of Computer and System Sciences 69.3, pp. 448–484. Hamada, M. and H. Hirai (2021). “Computing the nc-rank via discrete convex optimization on CAT(0) spaces”. In: SIAM Journal on Applied Algebra and Geometry 5.3, pp. 455–478. doi: 10.1137/20M138836X. Ivanyos, G., Y. Qiao, and K. V. Subrahmanyam (2017). “Non-commutative Edmonds’ problem and matrix semi-invariants”. In: computational complexity 26.3, pp. 717–763. issn: 1420-8954. — (2018). “Constructive non-commutative rank computation is in deterministic polynomial time”. In: computational complexity 27.4, pp. 561–593. Lovász, L. (1989). “Singular spaces of matrices and their application in combinatorics”. In: Bulletin of the Brazilian Mathematical Society 20, pp. 87–99. Murota, K. (2009). Matrices and Matroids for System Analysis. 2nd. Springer-Verlag, Berlin. 1 / 7
  29. References II Sinkhorn, R. and P. Knopp (1967). “Concerning nonnegative

    matrices and doubly stochastic matrices”. In: Pacific Journal of Mathematics 21, pp. 343–348. Suehiro, D., K. Hatano, S. Kijima, E. Takimoto, and K. Nagano (2012). “Online prediction under submodular constraints”. In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory (ALT), pp. 260–274. 2 / 7
  30. Dominant shrunk subspace A = i xiAi ←→ Φ(X) =

    i AiXA† i S, T ≤ Cn is independent subspace def ⇐⇒ ΠS • Φ(ΠT ) = 0 (ΠS := orthogonal projector onto S ⇐⇒ Ai ∼ T ∗ ∗ S ∗ O (i = 1, . . . , p) nc-rank A = min{2n − dim S − dim T : (S, T) independent subspace}. 3 / 7
  31. Dominant shrunk subspace A = i xiAi ←→ Φ(X) =

    i AiXA† i S, T ≤ Cn is independent subspace def ⇐⇒ ΠS • Φ(ΠT ) = 0 (ΠS := orthogonal projector onto S ⇐⇒ Ai ∼ T ∗ ∗ S ∗ O (i = 1, . . . , p) nc-rank A = min{2n − dim S − dim T : (S, T) independent subspace}. • Maximum independent subspaces form modular lattice: (S, T), (S′, T′) independent =⇒ (S ∩ S′, T + T′), (S + S′, T ∩ T′) independent • ∴ ∃ maximum independent subspace (S, T) with T minumum (dominant independent subspace) Idea: Find dominant independent subspace by “operator (k, r)-scaling” 3 / 7
  32. Algorithm for dominant shrunk subspace compute k∗ by k-operator scaling

    compute r∗ by (k∗, r)-operator scaling For (k, r) = (k∗, r∗ + 1), find (X, Y, z) with fk,r (X, Y, z) small via Sinkhorn iteration k∗ := nc-rank A r∗ := dim T∗ Apply rounding for nonnegative matrix constructed from (X, Y, z) to obtain ε-independent subspace S, T Compute orthogonal projector ΠT onto T Round entries of ΠT to nearest poly-bit rationals to obtain ΠT∗ Rounding becomes intricate due to continuous freedom of unitary matrix 4 / 7
  33. Rounding (for matrix scaling) Observation fk (x, y, z) is

    sufficiently small =⇒ (˜ x, ˜ y) = (x/z, y/z) is a fractional vertex cover of size ≤ k. Sort rows and cols of A in nonincreasing order of ˜ x, ˜ y, respectively Let I := {(i, j) : i + j = ⌈k⌉ + 1}. • 1 |I| (i,j)∈I ˜ xi + ˜ yj < 1 • ∃(i, j) ∈ I s.t. ˜ xi + ˜ yj < 1. So ai,j = 0. • (S, T) = ({i, . . . , n}, {j, . . . , n}) has size > 2n − k O I S T Operator scaling: Only obtain (S, T) s.t. tr(ΠS Ai ΠT ) ≤ ε for any i. 5 / 7
  34. Rounding for operator scaling Lemma (stability lemma) If subspaces S,

    T satisfies ΠS • Φ(ΠT ) ≤ ε, dim S + dim T = 2n − k∗, dim T ≤ r∗ , then ∥ΠS − ΠS∗ ∥2 , ∥ΠT − ΠT∗ ∥2 ≤ e ˜ O(n3)ε Lemma (bit complexity) Each entry of ΠT∗ is a rational number with bit length M1 = ˜ O(n5). Proof idea: minimum shrunk subspace = limit of Wong sequence (cf.[Ivanyos, Qiao, and Subrahmanyam, 2017]) For ε := e− ˜ O(n5), one can round each entry of ΠT to nearest rational with bit length ≤ M1 to obtain ΠT∗ . 6 / 7
  35. Majorization operator scaling Input: CP map Φ : X →

    i Ai XA† i , nonnegative vectors p, q ∈ Rn +, ε, k > 0 Find: Nonsingular L, R, z ≥ 0 s.t. ˜ Φ = ezΦL,R satisfies: ˜ Φ(I) ≺w p, ˜ Φ∗(I) ≺w q, tr ˜ Φ(I) ≥ k (p, q) = (1, αr ) is (k, r)-scaling As g-convex opitmization f(X, Y, z) = ezX−1 • Φ(Y −1) + α⊤λ(log X) + β⊤λ(log Y ) − kz Similar theorem/Sinkhorn holds as (k, r)-scaling 7 / 7