Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Beyond Exponential Graph: Communication Efficie...

Avatar for Yuki Takezawa Yuki Takezawa
November 08, 2023

Beyond Exponential Graph: Communication Efficient Topology for Decentralized Learning via Finite-timeย Convergence

Beyond Exponential Graph: Communication Efficient Topologies for Decentralized Learning via Finite-time Convergence (NeurIPS 2023)
https://arxiv.org/abs/2305.11420

Avatar for Yuki Takezawa

Yuki Takezawa

November 08, 2023
Tweet

More Decks by Yuki Takezawa

Other Decks in Research

Transcript

  1. 1 KYOTO UNIVERSITY KYOTO UNIVERSITY Beyond Exponential Graph: Communication Efficient

    Topology for Decentralized Learning via Finite-time Convergence (NeurIPS 2023) Yuki Takezawa1,2, Ryoma Sato1,2, Han Bao1,2, Kenta Niwa3, Makoto Yamada2 1Kyoto Univ., 2OIST, 3NTT CS Lab.
  2. 2 KYOTO UNIVERSITY Background Decentralized Learning Training NN in parallel

    on multiple nodes (e.g., server, GPUs). โ—ผ Large-scale machine learning. โ—ผ Privacy : server, which has its own (private) training datasets. Decentralized Learning
  3. 3 KYOTO UNIVERSITY Background Decentralized Learning : node (server), which

    has training dataset. : nodes can transmit parameters. ๐‘“๐‘– ๐’™๐‘– : loss function of node ๐‘–. (๐’™๐‘– is NNโ€™s parameter of node ๐‘–.) Goal of Decentralized Learning: inf ๐’™ 1 ๐‘› เท ๐‘–=1 ๐‘› ๐‘“๐‘– ๐’™ ๐’™1 = ๐’™2 = โ‹ฏ ๐‘“5 ๐‘“2 ๐‘“1 ๐‘“4 ๐‘“3 โ€ป ๐‘› is the number of nodes.
  4. 4 KYOTO UNIVERSITY Update rule of DSGD: Node ๐‘– update

    its NNโ€™s parameter ๐’™๐‘– as follows: โ—ผ ๐’™ ๐‘– (๐‘Ÿ+1 2 ) = ๐’™ ๐‘– (๐‘Ÿ) โˆ’ ๐œ‚โˆ‡๐‘“๐‘– ๐’™ ๐‘– ๐‘Ÿ โ—ผ Exchange parameters ๐’™ ๐‘– (๐‘Ÿ+1 2 ) with neighbors. โ—ผ ๐’™ ๐‘– (๐‘Ÿ+1) = ฯƒ ๐‘—=1 ๐‘› ๐‘Š๐‘–๐‘— ๐’™ ๐‘— (๐‘Ÿ+1 2 ) Background Decentralized SGD (DSGD) Let ๐‘› be the number of nodes and let ๐‘พ be an adjacency matrix. ๐‘Š๐‘–๐‘— is an edge weight and positive iff there exists edge ๐‘–, ๐‘— or ๐‘– = ๐‘—.
  5. 5 KYOTO UNIVERSITY Background What is the โ€œgoodโ€ network structure?

    โ—ผ Communication efficiency (i.e., training speed) โ—ผ Accuracy Ring Complete Grid
  6. 6 KYOTO UNIVERSITY Background Communication Efficiency Communication is the main

    bottleneck of distributed learning. โ—ผ Communication costs is determined by max degree. Ring (2) Complete (n-1) Grid (4)
  7. 7 KYOTO UNIVERSITY Background What is the โ€œgoodโ€ network structure?

    โ—ผ Communication Efficiency (i.e., training speed) โ—ผ Accuracy Ring Complete Grid
  8. 8 KYOTO UNIVERSITY Background Consensus Rate โ—ผ How โ€œwell-connectedโ€ the

    topology is important. โ—ผ How fast the information spread-out is important. Ring Complete Grid
  9. 9 KYOTO UNIVERSITY Background Consensus Rate Problem: โ—ผ There exists

    ๐‘› nodes, and node ๐‘– has parameters ๐’™๐‘– . โ—ผ Let ๐‘Š๐‘–๐‘— is the edge weight. (๐‘Š๐‘–๐‘— > 0 iff there exists edge or ๐‘– = ๐‘—) โ—ผ Node ๐‘– updates ๐’™๐‘– as ๐’™๐‘– โ† ฯƒ ๐‘—=1 ๐‘› ๐‘Š๐‘–๐‘— ๐’™๐‘— . Question: โ—ผ How fast ๐’™๐‘– reach เดฅ ๐’™ โ‰” 1 ๐‘› ฯƒ ๐‘—=1 ๐‘› ๐’™๐‘– ?
  10. 10 KYOTO UNIVERSITY Background Consensus Rate with ๐‘› Nodes. Topology

    Consensus Rate ๐›ฝ โˆˆ [0,1) โ†“ Ring 1 โˆ’ ๐‘‚ 1 ๐‘›2 Torus 1 โˆ’ ๐‘‚ 1 ๐‘› Exp. Graph 1 โˆ’ ๐‘‚ 1 log2 ๐‘› Complete 0 1 ๐‘› เท ๐‘–=1 ๐‘› ๐’™ ๐‘– ๐‘Ÿ โˆ’ เดฅ ๐’™ 2 โ€ป The number in the bracket is the maximum degree.
  11. 11 KYOTO UNIVERSITY Background Consensus Rate with ๐‘› Nodes. Fast

    consensus rate (i.e., small ๐›ฝ โˆˆ [0,1)) enables Decentralized SGD to achieve high accuracy and fast convergence rate. Theorem: Convergence Rate The parameter ๐’™๐‘– generated by Decentralized SGD satisfies 1 ๐‘… + 1 เท ๐‘Ÿ=0 ๐‘… โ€–๐›ป๐‘“ เดฅ ๐’™(๐‘Ÿ) โ€–2 โ‰ค ๐œ– after ๐‘… = ๐‘‚ 1 ๐‘›๐œ–2 + 1 1 โˆ’ ๐›ฝ ๐œ–3/2 iterations where เดฅ ๐’™ โ‰” 1 ๐‘› ฯƒ๐‘– ๐’™๐‘– .
  12. 12 KYOTO UNIVERSITY Background Consensus Rate โ—ผ High communication efficiency:

    small maximum degree โ—ผ High accuracy/fast convergence rate : fast consensus rate Ring Complete Grid High accuracy/ Fast convergence rate High communication efficiency
  13. 13 KYOTO UNIVERSITY Background Contribution We propose the Base-(k+1) Graph,

    which enables Decenctalized SGD to achieve reasonable balance between communication efficiency and accuracy/convergence rate. โ€ป The number in the bracket is the maximum degree.
  14. 15 KYOTO UNIVERSITY Proposed Method Core Idea: Finite-Time Convergence The

    existing topologies asymptotically converge. The proposed topologies, Base-(k+1) Graph, is finite-time convergence. 1 ๐‘› เท ๐‘–=1 ๐‘› ๐’™ ๐‘– ๐‘Ÿ โˆ’ เดฅ ๐’™ 2 โ€ป The number in the bracket is the maximum degree.
  15. 16 KYOTO UNIVERSITY Proposed Method Existing Finite-time Convergent Topologies Topology

    Max Degree #Nodes ๐’ 1-peer Hypercube 1 A power of 2 1-peer Exp. Graph 1 A power of 2 Base-(k+1) Graph ๐‘˜ Arbitrary number of nodes โ—ผ 1-peer Hypercube is not constructed when ๐‘› is not power of 2. โ—ผ 1-peer Exp. is not finite-time convergence when ๐‘› is not power of 2.
  16. 17 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube ๐‘›

    = 2 โ—ผ All edge weight is 0.5. 1 2 Node 1 Node 2 Initial parameter ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2
  17. 18 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 1

    2 3 4 2 4 1 3 ๐‘› = 4 โ—ผ All edge weight is 0.5. Node 1 Node 2 Node 3 Node 4 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ4 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2
  18. 19 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 1

    2 3 4 2 4 1 3 ๐‘› = 4 โ—ผ All edge weight is 0.5. Node 1 Node 2 Node 3 Node 4 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ4 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2
  19. 20 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 1

    2 3 4 2 4 1 3 ๐‘› = 4 โ—ผ All edge weight is 0.5. Node 1 Node 2 Node 3 Node 4 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ4 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ฅ4 2
  20. 21 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube ๐‘›

    = 8 โ—ผ All edge weight is 0.5. 1-peer Hypercube is finite-time convergence when ๐‘› is a power of 2, while it cannot be constructed when ๐‘› is not a power of 2.
  21. 22 KYOTO UNIVERSITY Proposed Method Base-2 Graph Next, we propose

    the Base-2 Graph: โ—ผ It is finite-time convergence for any ๐‘›. โ—ผ Its maximum degree is 1. Topology Max Degree #Nodes ๐’ 1-peer Hypercube 1 A power of 2 1-peer Exp. Graph 1 A power of 2 Base-(k+1) Graph ๐‘˜ Arbitrary number of nodes
  22. 23 KYOTO UNIVERSITY Proposed Method Core Idea of Simple Base-2

    Graph ๐‘› = 3 = 2 + 1 ๐‘› = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. ๐‘› = 7 = 22 + 2 + 1 3 4 6 5 1 7 2
  23. 24 KYOTO UNIVERSITY Proposed Method Core Idea of Simple Base-2

    Graph ๐‘› = 3 = 2 + 1 ๐‘› = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. ๐‘› = 7 = 22 + 2 + 1 3 4 6 5 1 7 2
  24. 25 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 3 2 3 1 2 3 1 2 3 1 Node 1 Node 2 Node 3 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 + 4๐‘ฅ3 6 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 2 3 1 3 1 3 โ€ป edge weight 0.5 is omitted.
  25. 26 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 3 Node 1 Node 2 Node 3 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 + 4๐‘ฅ3 6 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 2 3 1 2 3 1 2 3 1 2 3 1 3 1 3 โ€ป edge weight 0.5 is omitted.
  26. 27 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 3 Node 1 Node 2 Node 3 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 + 4๐‘ฅ3 6 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 2 3 1 2 3 1 2 3 1 2 3 1 3 1 3 โ€ป edge weight 0.5 is omitted.
  27. 28 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 3 Node 1 Node 2 Node 3 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 + 4๐‘ฅ3 6 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 1 3 2 3 1 2 3 1 2 3 1 2 3 1 3 โ€ป edge weight 0.5 is omitted.
  28. 29 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 3 Node 1 Node 2 Node 3 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 + 4๐‘ฅ3 6 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3 3 2 3 1 2 3 1 2 3 1 2 3 1 3
  29. 30 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 3 Node 1 Node 2 Node 3 Init. value ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ3 ๐‘ฅ1 + ๐‘ฅ2 2 ๐‘ฅ1 + ๐‘ฅ2 + 4๐‘ฅ3 6 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 3 2 3 1 2 3 1 2 3 1 2 3 1 3 Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3 3
  30. 31 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 โ€ป edge weight 0.5 is omitted.
  31. 32 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 5 = 22 + 1 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 โ€ป edge weight 0.5 is omitted.
  32. 33 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 5 = 22 + 1 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3+๐‘ฅ4+๐‘ฅ5 5
  33. 34 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with ๐‘›

    = 5 = 22 + 1 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3+๐‘ฅ4+๐‘ฅ5 5
  34. 35 KYOTO UNIVERSITY Proposed Method Brief Summary We propose the

    Simple Base-2 Graph: โ—ผ Its maximum degree is only 1. โ—ผ It is finite-time convergence for any ๐‘›. Next, we propose the Simple Base-(k+1) Graph: โ—ผ Its maximum degree is ๐‘˜. โ—ผ It is finite-time convergence for any ๐‘›.
  35. 36 KYOTO UNIVERSITY Proposed Method Review: Core Idea of Simple

    Base-2 Graph ๐‘› = 3 = 2 + 1 ๐‘› = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. ๐‘› = 7 = 22 + 2 + 1 3 4 6 5 1 7 2 We need extend 1-peer Hypercube to k-peer setting.
  36. 37 KYOTO UNIVERSITY Proposed Method k-peer Hyper-hypercube 1-peer Hypercube can

    be constructed when ๐‘› is a power of 2. โ—ผ ๐‘› is a power of 2 โ‡” The primal factors of ๐‘› is not larger than 2. 1 2 3 4 2 4 1 3 We propose the k-peer Hyper-hypercube, which can be constructed when the primal factors of ๐‘› is not larger than ๐‘˜ + 1.
  37. 38 KYOTO UNIVERSITY Proposed Method k-peer Hyper-hypercube โ—ผ Case with

    ๐‘˜ = 2 and ๐‘› = 6 = 2 ร— 3 โ—ผ Case with ๐‘˜ = 2 and ๐‘› = 9 = 3 ร— 3 . (self-loops are omitted.) Edge weight is 1 2 . Edge weight is 1 3 . Edge weight is 1 3 .
  38. 39 KYOTO UNIVERSITY Proposed Method Review: Core Idea of Simple

    Base-2 Graph ๐‘› = 3 = 2 + 1 ๐‘› = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. ๐‘› = 7 = 22 + 2 + 1 3 4 6 5 1 7 2 Binary representation (base-2 number)
  39. 40 KYOTO UNIVERSITY Proposed Method Core Idea of Simple Base-3

    Graph ๐‘› = 3 ๐‘› = 5 = 3 + 2 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 2-peer Hyper-hypercube is applicable. ๐‘› = 7 = 2 ร— 3 + 1 3 4 6 5 1 7 2 base-3 number
  40. 41 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with ๐‘›

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 โ€ป self-loops are omitted.
  41. 42 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with ๐‘›

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 โ€ป self-loops are omitted.
  42. 43 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with ๐‘›

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 โ€ป self-loops are omitted. Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3+๐‘ฅ4+๐‘ฅ5 5 Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3+๐‘ฅ4+๐‘ฅ5 5
  43. 44 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with ๐‘›

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 โ€ป self-loops are omitted. Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3+๐‘ฅ4+๐‘ฅ5 5 Average is ๐‘ฅ1+๐‘ฅ2+๐‘ฅ3+๐‘ฅ4+๐‘ฅ5 5
  44. 45 KYOTO UNIVERSITY Proposed Method Simple Base-(k+1) Graph vs Base-(k+1)

    Graph โ—ผ Using additional technique, we can reduce the length of the Simple Base-(k+1) Graph. โ€ป The number in the bracket is the maximum degree.
  45. 46 KYOTO UNIVERSITY Proposed Method Experiments 1 ๐‘› เท ๐‘–=1

    ๐‘› ๐’™ ๐‘– ๐‘Ÿ โˆ’ เดฅ ๐’™ 2 โ€ป The number in the bracket is the maximum degree.
  46. 48 KYOTO UNIVERSITY Proposed Method Decentralized SGD on Base-(k+1) Graph

    Let ๐‘พ(1), โ‹ฏ , ๐‘พ(๐‘š) be adjacency matrices of Base-(k+1) Graph. Node ๐‘– updates its parameter ๐’™๐‘– as follows: ๐’™ ๐‘– (๐‘Ÿ+1) = เท ๐‘—=1 ๐‘› ๐‘Š ๐‘–๐‘— (1+๐‘š๐‘œ๐‘‘ ๐‘Ÿ,๐‘š ) ๐’™ ๐‘— (๐‘Ÿ) โˆ’ ๐œ‚โˆ‡๐‘“๐‘— ๐’™ ๐‘— (๐‘Ÿ) 2 3 1 2 3 1 2 3 1
  47. 49 KYOTO UNIVERSITY Proposed Method Decentralized SGD on Base-(k+1) Graph

    Max Degree โ†“ Order of ๐‘น โ†“ Ring 2 ๐‘‚ 1 ๐‘›๐œ–2 + ๐‘›2 ๐œ–3/2 Torus 4 ๐‘‚ 1 ๐‘›๐œ–2 + ๐‘› ๐œ–3/2 Exp. Graph log2 ๐‘› ๐‘‚ 1 ๐‘›๐œ–2 + log2 ๐‘› ๐œ–3/2 Base-(k+1) Graph (ours) ๐‘˜ ๐‘‚ 1 ๐‘›๐œ–2 + log๐‘˜+1 ๐‘› ๐œ–3/2 DSGD satisfies 1 ๐‘…+1 ฯƒ๐‘Ÿ=0 ๐‘… โˆ‡๐‘“ เดฅ ๐’™ ๐‘Ÿ 2 โ‰ค ๐œ– after ๐‘… iterations.
  48. 50 KYOTO UNIVERSITY Proposed Method Decentralized SGD on Base-(k+1) Graph

    Topology Consensus Rate โ†‘ Max Degree โ†“ Convergence Rate โ†“ Exponential Graph 1 โˆ’ ๐‘‚ 1 log2 ๐‘› log2 ๐‘› ๐‘‚ 1 ๐‘›๐œ–2 + log2 ๐‘› ๐œ–3/2 Base-(๐’Œ + ๐Ÿ) Graph (ours) N/A ๐‘˜ ๐‘‚ 1 ๐‘›๐œ–2 + log๐‘˜+1 ๐‘› ๐œ–3/2 Exp. Graph vs Base-2 Graph โ—ผ Same convergence rate and better communication efficiency Exp. Graph vs Base-(๐‘˜ + 1) Graph with 2 โ‰ค ๐‘˜ < log2 ๐‘› โ—ผ Faster convergence rate and better communication efficiency
  49. 52 KYOTO UNIVERSITY Experiments Model: VGG Datasets: Fashion MNIST, CIFAR-10,

    CIFAR-100 #Nodes: 25 We conduct experiments both i.i.d. and non-i.i.d. settings. โ—ผ Non-i.i.d setting โ—ผ i.i.d setting
  50. 53 KYOTO UNIVERSITY Experiments Results on non-i.i.d. setting โ—ผ ๐‘›

    = 25 โ—ผ The number in the bracket is the maximum degree.
  51. 54 KYOTO UNIVERSITY Experiments Results on i.i.d. setting โ—ผ ๐‘›

    = 25 โ—ผ The number in the bracket is the maximum degree.
  52. 55 KYOTO UNIVERSITY Experiments Results of CIFAR-10 with non-i.i.d. Setting

    โ—ผ Base-2 Graph outperforms 1-peer Exp. โ—ผ Base-{3,4,5} Graph outperforms 1-peer Exp. and Exp.
  53. 56 KYOTO UNIVERSITY Experiments Results with Other Decentralized Learning Methods

    โ—ผ Base-2 Graph is comparable to 1-peer exponential graph. โ—ผ Base-5 Graph outperforms the exponential graph.
  54. 57 KYOTO UNIVERSITY Conclusion We propose Base-(k+1) Graph: โ—ผ Finite-time

    convergence for any ๐‘› and ๐‘˜. โ—ผ Theoretically: Faster convergence rate and fewer communication costs than the exp. graph. โ—ผ Experimentally: Reasonable balance between accuracy and communication efficiency.