Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Exponential Graph: Communication Efficient Topology for Decentralized Learning via Finite-time Convergence

Yuki Takezawa
November 08, 2023

Beyond Exponential Graph: Communication Efficient Topology for Decentralized Learning via Finite-time Convergence

Beyond Exponential Graph: Communication Efficient Topologies for Decentralized Learning via Finite-time Convergence (NeurIPS 2023)
https://arxiv.org/abs/2305.11420

Yuki Takezawa

November 08, 2023
Tweet

Other Decks in Research

Transcript

  1. 1 KYOTO UNIVERSITY KYOTO UNIVERSITY Beyond Exponential Graph: Communication Efficient

    Topology for Decentralized Learning via Finite-time Convergence (NeurIPS 2023) Yuki Takezawa1,2, Ryoma Sato1,2, Han Bao1,2, Kenta Niwa3, Makoto Yamada2 1Kyoto Univ., 2OIST, 3NTT CS Lab.
  2. 2 KYOTO UNIVERSITY Background Decentralized Learning Training NN in parallel

    on multiple nodes (e.g., server, GPUs). ◼ Large-scale machine learning. ◼ Privacy : server, which has its own (private) training datasets. Decentralized Learning
  3. 3 KYOTO UNIVERSITY Background Decentralized Learning : node (server), which

    has training dataset. : nodes can transmit parameters. 𝑓𝑖 𝒙𝑖 : loss function of node 𝑖. (𝒙𝑖 is NN’s parameter of node 𝑖.) Goal of Decentralized Learning: inf 𝒙 1 𝑛 ෍ 𝑖=1 𝑛 𝑓𝑖 𝒙 𝒙1 = 𝒙2 = ⋯ 𝑓5 𝑓2 𝑓1 𝑓4 𝑓3 ※ 𝑛 is the number of nodes.
  4. 4 KYOTO UNIVERSITY Update rule of DSGD: Node 𝑖 update

    its NN’s parameter 𝒙𝑖 as follows: ◼ 𝒙 𝑖 (𝑟+1 2 ) = 𝒙 𝑖 (𝑟) − 𝜂∇𝑓𝑖 𝒙 𝑖 𝑟 ◼ Exchange parameters 𝒙 𝑖 (𝑟+1 2 ) with neighbors. ◼ 𝒙 𝑖 (𝑟+1) = σ 𝑗=1 𝑛 𝑊𝑖𝑗 𝒙 𝑗 (𝑟+1 2 ) Background Decentralized SGD (DSGD) Let 𝑛 be the number of nodes and let 𝑾 be an adjacency matrix. 𝑊𝑖𝑗 is an edge weight and positive iff there exists edge 𝑖, 𝑗 or 𝑖 = 𝑗.
  5. 5 KYOTO UNIVERSITY Background What is the “good” network structure?

    ◼ Communication efficiency (i.e., training speed) ◼ Accuracy Ring Complete Grid
  6. 6 KYOTO UNIVERSITY Background Communication Efficiency Communication is the main

    bottleneck of distributed learning. ◼ Communication costs is determined by max degree. Ring (2) Complete (n-1) Grid (4)
  7. 7 KYOTO UNIVERSITY Background What is the “good” network structure?

    ◼ Communication Efficiency (i.e., training speed) ◼ Accuracy Ring Complete Grid
  8. 8 KYOTO UNIVERSITY Background Consensus Rate ◼ How “well-connected” the

    topology is important. ◼ How fast the information spread-out is important. Ring Complete Grid
  9. 9 KYOTO UNIVERSITY Background Consensus Rate Problem: ◼ There exists

    𝑛 nodes, and node 𝑖 has parameters 𝒙𝑖 . ◼ Let 𝑊𝑖𝑗 is the edge weight. (𝑊𝑖𝑗 > 0 iff there exists edge or 𝑖 = 𝑗) ◼ Node 𝑖 updates 𝒙𝑖 as 𝒙𝑖 ← σ 𝑗=1 𝑛 𝑊𝑖𝑗 𝒙𝑗 . Question: ◼ How fast 𝒙𝑖 reach ഥ 𝒙 ≔ 1 𝑛 σ 𝑗=1 𝑛 𝒙𝑖 ?
  10. 10 KYOTO UNIVERSITY Background Consensus Rate with 𝑛 Nodes. Topology

    Consensus Rate 𝛽 ∈ [0,1) ↓ Ring 1 − 𝑂 1 𝑛2 Torus 1 − 𝑂 1 𝑛 Exp. Graph 1 − 𝑂 1 log2 𝑛 Complete 0 1 𝑛 ෍ 𝑖=1 𝑛 𝒙 𝑖 𝑟 − ഥ 𝒙 2 ※ The number in the bracket is the maximum degree.
  11. 11 KYOTO UNIVERSITY Background Consensus Rate with 𝑛 Nodes. Fast

    consensus rate (i.e., small 𝛽 ∈ [0,1)) enables Decentralized SGD to achieve high accuracy and fast convergence rate. Theorem: Convergence Rate The parameter 𝒙𝑖 generated by Decentralized SGD satisfies 1 𝑅 + 1 ෍ 𝑟=0 𝑅 ‖𝛻𝑓 ഥ 𝒙(𝑟) ‖2 ≤ 𝜖 after 𝑅 = 𝑂 1 𝑛𝜖2 + 1 1 − 𝛽 𝜖3/2 iterations where ഥ 𝒙 ≔ 1 𝑛 σ𝑖 𝒙𝑖 .
  12. 12 KYOTO UNIVERSITY Background Consensus Rate ◼ High communication efficiency:

    small maximum degree ◼ High accuracy/fast convergence rate : fast consensus rate Ring Complete Grid High accuracy/ Fast convergence rate High communication efficiency
  13. 13 KYOTO UNIVERSITY Background Contribution We propose the Base-(k+1) Graph,

    which enables Decenctalized SGD to achieve reasonable balance between communication efficiency and accuracy/convergence rate. ※ The number in the bracket is the maximum degree.
  14. 15 KYOTO UNIVERSITY Proposed Method Core Idea: Finite-Time Convergence The

    existing topologies asymptotically converge. The proposed topologies, Base-(k+1) Graph, is finite-time convergence. 1 𝑛 ෍ 𝑖=1 𝑛 𝒙 𝑖 𝑟 − ഥ 𝒙 2 ※ The number in the bracket is the maximum degree.
  15. 16 KYOTO UNIVERSITY Proposed Method Existing Finite-time Convergent Topologies Topology

    Max Degree #Nodes 𝒏 1-peer Hypercube 1 A power of 2 1-peer Exp. Graph 1 A power of 2 Base-(k+1) Graph 𝑘 Arbitrary number of nodes ◼ 1-peer Hypercube is not constructed when 𝑛 is not power of 2. ◼ 1-peer Exp. is not finite-time convergence when 𝑛 is not power of 2.
  16. 17 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 𝑛

    = 2 ◼ All edge weight is 0.5. 1 2 Node 1 Node 2 Initial parameter 𝑥1 𝑥2 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2
  17. 18 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 1

    2 3 4 2 4 1 3 𝑛 = 4 ◼ All edge weight is 0.5. Node 1 Node 2 Node 3 Node 4 Init. value 𝑥1 𝑥2 𝑥3 𝑥4 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 + 𝑥4 2 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2
  18. 19 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 1

    2 3 4 2 4 1 3 𝑛 = 4 ◼ All edge weight is 0.5. Node 1 Node 2 Node 3 Node 4 Init. value 𝑥1 𝑥2 𝑥3 𝑥4 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 + 𝑥4 2 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2
  19. 20 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 1

    2 3 4 2 4 1 3 𝑛 = 4 ◼ All edge weight is 0.5. Node 1 Node 2 Node 3 Node 4 Init. value 𝑥1 𝑥2 𝑥3 𝑥4 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 + 𝑥4 2 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 2
  20. 21 KYOTO UNIVERSITY Existing Finite-Time Convergent Topologies 1-peer Hypercube 𝑛

    = 8 ◼ All edge weight is 0.5. 1-peer Hypercube is finite-time convergence when 𝑛 is a power of 2, while it cannot be constructed when 𝑛 is not a power of 2.
  21. 22 KYOTO UNIVERSITY Proposed Method Base-2 Graph Next, we propose

    the Base-2 Graph: ◼ It is finite-time convergence for any 𝑛. ◼ Its maximum degree is 1. Topology Max Degree #Nodes 𝒏 1-peer Hypercube 1 A power of 2 1-peer Exp. Graph 1 A power of 2 Base-(k+1) Graph 𝑘 Arbitrary number of nodes
  22. 23 KYOTO UNIVERSITY Proposed Method Core Idea of Simple Base-2

    Graph 𝑛 = 3 = 2 + 1 𝑛 = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. 𝑛 = 7 = 22 + 2 + 1 3 4 6 5 1 7 2
  23. 24 KYOTO UNIVERSITY Proposed Method Core Idea of Simple Base-2

    Graph 𝑛 = 3 = 2 + 1 𝑛 = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. 𝑛 = 7 = 22 + 2 + 1 3 4 6 5 1 7 2
  24. 25 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 3 2 3 1 2 3 1 2 3 1 Node 1 Node 2 Node 3 Init. value 𝑥1 𝑥2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 + 4𝑥3 6 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 2 3 1 3 1 3 ※ edge weight 0.5 is omitted.
  25. 26 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 3 Node 1 Node 2 Node 3 Init. value 𝑥1 𝑥2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 + 4𝑥3 6 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 2 3 1 2 3 1 2 3 1 2 3 1 3 1 3 ※ edge weight 0.5 is omitted.
  26. 27 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 3 Node 1 Node 2 Node 3 Init. value 𝑥1 𝑥2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 + 4𝑥3 6 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 2 3 1 2 3 1 2 3 1 2 3 1 3 1 3 ※ edge weight 0.5 is omitted.
  27. 28 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 3 Node 1 Node 2 Node 3 Init. value 𝑥1 𝑥2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 + 4𝑥3 6 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 1 3 2 3 1 2 3 1 2 3 1 2 3 1 3 ※ edge weight 0.5 is omitted.
  28. 29 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 3 Node 1 Node 2 Node 3 Init. value 𝑥1 𝑥2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 + 4𝑥3 6 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 Average is 𝑥1+𝑥2+𝑥3 3 2 3 1 2 3 1 2 3 1 2 3 1 3
  29. 30 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 3 Node 1 Node 2 Node 3 Init. value 𝑥1 𝑥2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 2 𝑥3 𝑥1 + 𝑥2 2 𝑥1 + 𝑥2 + 4𝑥3 6 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 𝑥1 + 𝑥2 + 𝑥3 3 2 3 1 2 3 1 2 3 1 2 3 1 3 Average is 𝑥1+𝑥2+𝑥3 3
  30. 31 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 ※ edge weight 0.5 is omitted.
  31. 32 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 5 = 22 + 1 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 ※ edge weight 0.5 is omitted.
  32. 33 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 5 = 22 + 1 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 Average is 𝑥1+𝑥2+𝑥3+𝑥4+𝑥5 5
  33. 34 KYOTO UNIVERSITY Proposed Method Simple Base-2 Graph with 𝑛

    = 5 = 22 + 1 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 4 5 1 5 1 5 Average is 𝑥1+𝑥2+𝑥3+𝑥4+𝑥5 5
  34. 35 KYOTO UNIVERSITY Proposed Method Brief Summary We propose the

    Simple Base-2 Graph: ◼ Its maximum degree is only 1. ◼ It is finite-time convergence for any 𝑛. Next, we propose the Simple Base-(k+1) Graph: ◼ Its maximum degree is 𝑘. ◼ It is finite-time convergence for any 𝑛.
  35. 36 KYOTO UNIVERSITY Proposed Method Review: Core Idea of Simple

    Base-2 Graph 𝑛 = 3 = 2 + 1 𝑛 = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. 𝑛 = 7 = 22 + 2 + 1 3 4 6 5 1 7 2 We need extend 1-peer Hypercube to k-peer setting.
  36. 37 KYOTO UNIVERSITY Proposed Method k-peer Hyper-hypercube 1-peer Hypercube can

    be constructed when 𝑛 is a power of 2. ◼ 𝑛 is a power of 2 ⇔ The primal factors of 𝑛 is not larger than 2. 1 2 3 4 2 4 1 3 We propose the k-peer Hyper-hypercube, which can be constructed when the primal factors of 𝑛 is not larger than 𝑘 + 1.
  37. 38 KYOTO UNIVERSITY Proposed Method k-peer Hyper-hypercube ◼ Case with

    𝑘 = 2 and 𝑛 = 6 = 2 × 3 ◼ Case with 𝑘 = 2 and 𝑛 = 9 = 3 × 3 . (self-loops are omitted.) Edge weight is 1 2 . Edge weight is 1 3 . Edge weight is 1 3 .
  38. 39 KYOTO UNIVERSITY Proposed Method Review: Core Idea of Simple

    Base-2 Graph 𝑛 = 3 = 2 + 1 𝑛 = 5 = 22 + 1 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 1-peer Hypercube is applicable. 𝑛 = 7 = 22 + 2 + 1 3 4 6 5 1 7 2 Binary representation (base-2 number)
  39. 40 KYOTO UNIVERSITY Proposed Method Core Idea of Simple Base-3

    Graph 𝑛 = 3 𝑛 = 5 = 3 + 2 2 3 1 2 4 1 3 5 Core idea is splitting the set of nodes into disjoint subsets to which 2-peer Hyper-hypercube is applicable. 𝑛 = 7 = 2 × 3 + 1 3 4 6 5 1 7 2 base-3 number
  40. 41 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with 𝑛

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 ※ self-loops are omitted.
  41. 42 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with 𝑛

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 ※ self-loops are omitted.
  42. 43 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with 𝑛

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 ※ self-loops are omitted. Average is 𝑥1+𝑥2+𝑥3+𝑥4+𝑥5 5 Average is 𝑥1+𝑥2+𝑥3+𝑥4+𝑥5 5
  43. 44 KYOTO UNIVERSITY Proposed Method Simple Base-3 Graph with 𝑛

    = 5 2 4 1 3 5 2 4 1 3 5 2 4 1 3 5 ※ self-loops are omitted. Average is 𝑥1+𝑥2+𝑥3+𝑥4+𝑥5 5 Average is 𝑥1+𝑥2+𝑥3+𝑥4+𝑥5 5
  44. 45 KYOTO UNIVERSITY Proposed Method Simple Base-(k+1) Graph vs Base-(k+1)

    Graph ◼ Using additional technique, we can reduce the length of the Simple Base-(k+1) Graph. ※ The number in the bracket is the maximum degree.
  45. 46 KYOTO UNIVERSITY Proposed Method Experiments 1 𝑛 ෍ 𝑖=1

    𝑛 𝒙 𝑖 𝑟 − ഥ 𝒙 2 ※ The number in the bracket is the maximum degree.
  46. 48 KYOTO UNIVERSITY Proposed Method Decentralized SGD on Base-(k+1) Graph

    Let 𝑾(1), ⋯ , 𝑾(𝑚) be adjacency matrices of Base-(k+1) Graph. Node 𝑖 updates its parameter 𝒙𝑖 as follows: 𝒙 𝑖 (𝑟+1) = ෍ 𝑗=1 𝑛 𝑊 𝑖𝑗 (1+𝑚𝑜𝑑 𝑟,𝑚 ) 𝒙 𝑗 (𝑟) − 𝜂∇𝑓𝑗 𝒙 𝑗 (𝑟) 2 3 1 2 3 1 2 3 1
  47. 49 KYOTO UNIVERSITY Proposed Method Decentralized SGD on Base-(k+1) Graph

    Max Degree ↓ Order of 𝑹 ↓ Ring 2 𝑂 1 𝑛𝜖2 + 𝑛2 𝜖3/2 Torus 4 𝑂 1 𝑛𝜖2 + 𝑛 𝜖3/2 Exp. Graph log2 𝑛 𝑂 1 𝑛𝜖2 + log2 𝑛 𝜖3/2 Base-(k+1) Graph (ours) 𝑘 𝑂 1 𝑛𝜖2 + log𝑘+1 𝑛 𝜖3/2 DSGD satisfies 1 𝑅+1 σ𝑟=0 𝑅 ∇𝑓 ഥ 𝒙 𝑟 2 ≤ 𝜖 after 𝑅 iterations.
  48. 50 KYOTO UNIVERSITY Proposed Method Decentralized SGD on Base-(k+1) Graph

    Topology Consensus Rate ↑ Max Degree ↓ Convergence Rate ↓ Exponential Graph 1 − 𝑂 1 log2 𝑛 log2 𝑛 𝑂 1 𝑛𝜖2 + log2 𝑛 𝜖3/2 Base-(𝒌 + 𝟏) Graph (ours) N/A 𝑘 𝑂 1 𝑛𝜖2 + log𝑘+1 𝑛 𝜖3/2 Exp. Graph vs Base-2 Graph ◼ Same convergence rate and better communication efficiency Exp. Graph vs Base-(𝑘 + 1) Graph with 2 ≤ 𝑘 < log2 𝑛 ◼ Faster convergence rate and better communication efficiency
  49. 52 KYOTO UNIVERSITY Experiments Model: VGG Datasets: Fashion MNIST, CIFAR-10,

    CIFAR-100 #Nodes: 25 We conduct experiments both i.i.d. and non-i.i.d. settings. ◼ Non-i.i.d setting ◼ i.i.d setting
  50. 53 KYOTO UNIVERSITY Experiments Results on non-i.i.d. setting ◼ 𝑛

    = 25 ◼ The number in the bracket is the maximum degree.
  51. 54 KYOTO UNIVERSITY Experiments Results on i.i.d. setting ◼ 𝑛

    = 25 ◼ The number in the bracket is the maximum degree.
  52. 55 KYOTO UNIVERSITY Experiments Results of CIFAR-10 with non-i.i.d. Setting

    ◼ Base-2 Graph outperforms 1-peer Exp. ◼ Base-{3,4,5} Graph outperforms 1-peer Exp. and Exp.
  53. 56 KYOTO UNIVERSITY Experiments Results with Other Decentralized Learning Methods

    ◼ Base-2 Graph is comparable to 1-peer exponential graph. ◼ Base-5 Graph outperforms the exponential graph.
  54. 57 KYOTO UNIVERSITY Conclusion We propose Base-(k+1) Graph: ◼ Finite-time

    convergence for any 𝑛 and 𝑘. ◼ Theoretically: Faster convergence rate and fewer communication costs than the exp. graph. ◼ Experimentally: Reasonable balance between accuracy and communication efficiency.