Heart of the SwarmKit: Topology Management

Heart of the SwarmKit: Topology Management Docker Distributed Systems Summit
10.07.2016 Andrea Luzzardi / [email protected] / @aluzzardi Docker Inc.

Push vs Pull Model

3 Push vs Pull Push Pull Manager Worker ZooKeeper 3
- Payload 1 - Register 2 - Discover Manager Worker Registration & Payload

4 Push vs Pull Push • Pros: Provides better control
over communication rate − Managers decide when to contact Workers • Cons: Requires a discovery mechanism − More failure scenarios − Harder to troubleshoot Pull • Pros: Simpler to operate − Workers connect to Managers and don’t need to bind − Can easily traverse networks − Easier to secure − Less moving parts • Cons: Workers must maintain connection to Managers at all times

5 Push vs Pull • SwarmKit adopted the Pull model
• Favored operational simplicity • Engineered solutions to provide rate control in pull mode

Rate Control Controlling communication rate in a Pull model

7 Rate Control: Heartbeats • Manager dictates heartbeat rate to
Workers • Rate is Configurable • Managers agree on same Rate by Consensus (Raft) • Managers add jitter so pings are spread over time (avoid bursts) Manager Worker Ping? Pong! Ping me back in 5.2 seconds

8 Rate Control: Workloads • Worker opens a gRPC stream
to receive workloads • Manager can send data whenever it wants to • Manager will send data in batches • Changes are buffered and sent in batches of 100 or every 100 ms, whichever occurs first • Adds little delay (at most 100ms) but drastically reduces amount of communication Manager Worker Give me work to do 100ms - [Batch of 12 ] 200ms - [Batch of 26 ] 300ms - [Batch of 32 ] 340ms - [Batch of 100] 360ms - [Batch of 100] 460ms - [Batch of 42 ] 560ms - [Batch of 23 ]

Replication Running multiple managers for high availability

10 Replication Manager Manager Manager Worker Leader Follower Follower •
Worker can connect to any Manager • Followers will forward traffic to the Leader

Followers multiplex all workers to the Leader using a single connection • Backed by gRPC channels (HTTP/2 streams) • Reduces Leader networking load by spreading the connections evenly Worker Worker Example: On a cluster with 10,000 workers and 5 managers, each will only have to handle about 2,000 connections. Each follower will forward its 2,000 workers using a single socket to the leader.

Upon Leader failure, a new one is elected • All managers start redirecting worker traffic to the new one • Transparent to workers Worker Worker

13 Replication Manager Manager Manager Worker Follower Follower Leader •
Upon Leader failure, a new one is elected • All managers start redirecting worker traffic to the new one • Transparent to workers Worker Worker

14 Replication Manager 3 Manager 1 Manager 2 Worker Leader
Follower Follower • Manager sends list of all managers’ addresses to Workers • When a new manager joins, all workers are notified • Upon manager failure, workers will reconnect to a different manager - Manager 1 Addr - Manager 2 Addr - Manager 3 Addr

Follower Follower • Manager sends list of all managers’ addresses to Workers • When a new manager joins, all workers are notified • Upon manager failure, workers will reconnect to a different manager

Follower Follower • Manager sends list of all managers’ addresses to Workers • When a new manager joins, all workers are notified • Upon manager failure, workers will reconnect to a different manager Reconnect to random manager

17 Replication • gRPC handles connection management − Exponential backoff,
reconnection jitter, … − Avoids flooding managers on failover − Connections evenly spread across Managers • Manager Weights − Allows Manager prioritization / de-prioritization − Gracefully remove Manager from rotation

Presence Scalable presence in a distributed environment

19 Presence • Leader commits Worker state (Up vs Down)
into Raft − Propagates to all managers − Recoverable in case of leader re-election • Heartbeat TTLs kept in Leader memory − Too expensive to store “last ping time” in Raft • Every ping would result in a quorum write − Leader keeps worker<->TTL in a heap (time.AfterFunc) − Upon leader failover workers are given a grace period to reconnect • Workers considered Unknown until they reconnect • If they do they move back to Up • If they don’t they move to Down

Andrea Luzzardi [email protected] / @aluzzardi

Heart of the SwarmKit: Topology Management

Heart of the SwarmKit: Topology Management

Andrea Luzzardi

Other Decks in Technology

Featured

Transcript