Beyond Exponential Graph: Communication Efficient Topology for Decentralized Learning via Finite-timeย Convergence
Beyond Exponential Graph: Communication Efficient Topologies for Decentralized Learning via Finite-time Convergence (NeurIPS 2023)
https://arxiv.org/abs/2305.11420
on multiple nodes (e.g., server, GPUs). โผ Large-scale machine learning. โผ Privacy : server, which has its own (private) training datasets. Decentralized Learning
has training dataset. : nodes can transmit parameters. ๐๐ ๐๐ : loss function of node ๐. (๐๐ is NNโs parameter of node ๐.) Goal of Decentralized Learning: inf ๐ 1 ๐ เท ๐=1 ๐ ๐๐ ๐ ๐1 = ๐2 = โฏ ๐5 ๐2 ๐1 ๐4 ๐3 โป ๐ is the number of nodes.
its NNโs parameter ๐๐ as follows: โผ ๐ ๐ (๐+1 2 ) = ๐ ๐ (๐) โ ๐โ๐๐ ๐ ๐ ๐ โผ Exchange parameters ๐ ๐ (๐+1 2 ) with neighbors. โผ ๐ ๐ (๐+1) = ฯ ๐=1 ๐ ๐๐๐ ๐ ๐ (๐+1 2 ) Background Decentralized SGD (DSGD) Let ๐ be the number of nodes and let ๐พ be an adjacency matrix. ๐๐๐ is an edge weight and positive iff there exists edge ๐, ๐ or ๐ = ๐.
small maximum degree โผ High accuracy/fast convergence rate : fast consensus rate Ring Complete Grid High accuracy/ Fast convergence rate High communication efficiency
which enables Decenctalized SGD to achieve reasonable balance between communication efficiency and accuracy/convergence rate. โป The number in the bracket is the maximum degree.
existing topologies asymptotically converge. The proposed topologies, Base-(k+1) Graph, is finite-time convergence. 1 ๐ เท ๐=1 ๐ ๐ ๐ ๐ โ เดฅ ๐ 2 โป The number in the bracket is the maximum degree.
Max Degree #Nodes ๐ 1-peer Hypercube 1 A power of 2 1-peer Exp. Graph 1 A power of 2 Base-(k+1) Graph ๐ Arbitrary number of nodes โผ 1-peer Hypercube is not constructed when ๐ is not power of 2. โผ 1-peer Exp. is not finite-time convergence when ๐ is not power of 2.
= 8 โผ All edge weight is 0.5. 1-peer Hypercube is finite-time convergence when ๐ is a power of 2, while it cannot be constructed when ๐ is not a power of 2.
the Base-2 Graph: โผ It is finite-time convergence for any ๐. โผ Its maximum degree is 1. Topology Max Degree #Nodes ๐ 1-peer Hypercube 1 A power of 2 1-peer Exp. Graph 1 A power of 2 Base-(k+1) Graph ๐ Arbitrary number of nodes
Simple Base-2 Graph: โผ Its maximum degree is only 1. โผ It is finite-time convergence for any ๐. Next, we propose the Simple Base-(k+1) Graph: โผ Its maximum degree is ๐. โผ It is finite-time convergence for any ๐.
be constructed when ๐ is a power of 2. โผ ๐ is a power of 2 โ The primal factors of ๐ is not larger than 2. 1 2 3 4 2 4 1 3 We propose the k-peer Hyper-hypercube, which can be constructed when the primal factors of ๐ is not larger than ๐ + 1.
convergence for any ๐ and ๐. โผ Theoretically: Faster convergence rate and fewer communication costs than the exp. graph. โผ Experimentally: Reasonable balance between accuracy and communication efficiency.