Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gossip Protocols. Failure Detection, sharing st...

flopez
December 19, 2016

Gossip Protocols. Failure Detection, sharing state and other common tasks in Distributed systems.

In large Distributed Systems knowing the state of the whole system is a difficult task which becomes harder as we increment the number of nodes. There are too many nodes to communicate with and many algorithms that solve the problem tend to grow linearly with the number of nodes. The underlying network is a problem too, we can’t rely on hardware solutions as they wouldn’t be available in the cloud (e.g. Multicast). It’s also really complex to maintain an updated graph of nodes and even to store the graph itself in large systems.

Many distributed systems nowadays rely on Gossip protocols to share the state of the system among the nodes because they avoid these problems.

A Gossip protocol is a communication protocol, a way of multicasting messages inspired by epidemics, human gossip and social networks.

In this talk we’ll give an introduction to them and we'll show a live demo using flopezluis.github.io/gossip-simulator/

flopez

December 19, 2016
Tweet

More Decks by flopez

Other Decks in Programming

Transcript

  1. INTRODUCTION TO GOSSIP ABOUT @FLOPEZLUIS • Director of Engineering @ShuttleCloudEng

    • Co-organizer PawersWeLoveMad and Distributed System Mad • Author of Mastering Python Regular Expressions 2
  2. INTRODUCTION TO GOSSIP THE PROBLEM ‣ Each node knows every

    other node ‣ Traditional master and slave ‣ Paxos or other consensus based algorithms. ‣ BitTorrent based protocol also a P2P approach ‣ Multicast
  3. INTRODUCTION TO GOSSIP WHAT’S GOSSIP USED FOR? ‣ Database replication

    ‣ Information dissemination ‣ Cluster membership ‣ Failure Detectors ‣ Overlay Networks ‣ Aggregations
  4. INTRODUCTION TO GOSSIP WHAT DO THEY HAVE IN COMMON? ‣

    RIAK ‣ CASSANDRA ‣ DYNAMO ‣ CONSUL ‣ Amazon s3 ‣ Docker Swarm ‣ ElasticSearch ‣ Hazelcast ‣ Redis Cluster ‣ AKKA ‣ Flume (cloudera) ‣ Bitcoin ‣ Dynomite ‣ Tribler ‣ Comcast ‣ ….
  5. INTRODUCTION TO GOSSIP 8 FALLACIES OF DISTRIBUTED COMPUTING ‣ 1.

    The network is reliable. ‣ 2. Latency is zero. ‣ 3. Bandwidth is infinite. ‣ 4. The network is secure. ‣ 5. Topology doesn't change. ‣ 6. There is one administrator. ‣ 7. Transport cost is zero. ‣ 8. The network is homogeneous.
  6. INTRODUCTION TO GOSSIP BROADCAST PROTOCOL A primary use of gossip

    is for information diffusion: some event occurs, and our goal is to spread the word [3]
  7. INTRODUCTION TO GOSSIP GOSSIP AND EPIDEMICS Trying to squash a

    rumor is like trying to unring a bell. 
 ~Shana Alexander Anyone can start a rumor, but none can stop one. 
 ~ American proverb

  8. INTRODUCTION TO GOSSIP STRENGTHS OF GOSSIP ▸ Scalable ▸ Fault-tolerance.

    ▸ Robust ▸ Convergent consistency. ▸ extremely decentralized form of information discovery. ▸ Little code and complexity
  9. INTRODUCTION TO GOSSIP STRENGTHS OF GOSSIP ▸ Ability to operate

    in networks with irregular and unknown connectivity [3] ▸ Robust ▸ Convergent consistency. O(log(N)) ▸ Gossip offers an extremely decentralized form of information discovery, and its latencies are often acceptable if the information won’t actually be used immediately. [3] ▸ Little code and complexity
  10. INTRODUCTION TO GOSSIP FORMAL DEFINITION Many attempts to formally define

    gossip but there is no standard definition [13]
  11. INTRODUCTION TO GOSSIP FORMAL DEFINITION In general they have these

    properties [4] [13]: ‣ node selection must be random, or at least guarantee enough peer diversity ‣ only local information is available at all nodes ‣ communication is round-based (periodic) ‣ Transmission and processing capacity per round is limited ‣ All nodes run the same protocol
  12. INTRODUCTION TO GOSSIP EPIDEMIC ALGORITHMS FOR REPLICATED DATABASE MAINTENANCE The

    paper “Epidemic Algorithms for Replicated Database Maintenance” [1] (1987) is considered to be seminal. On disseminating information reliably without broadcasting. Proceedings of the International Conference on Distributed Computing Systems (1987), pp. 74–81
  13. INTRODUCTION TO GOSSIP EPIDEMIC ALGORITHMS FOR REPLICATED DATABASE MAINTENANCE ‣

    They were trying to build a directory, a lookup database. ‣ The network was unreliable. ‣ Database was replicated at 300 of nodes (or more). ‣ All servers accept updates. ‣ Each update is injected at a single site and propagated to all sites or substituted by a later update ‣ Replicas become consistent after no more new updates.
  14. INTRODUCTION TO GOSSIP EPIDEMIC ALGORITHMS FOR REPLICATED DATABASE MAINTENANCE Gossip

    protocols literature have adopted some terms from the epidemiology literature [1]: ▸ Infective. 
 A node with an update it is willing to share. ▸ Susceptible. 
 A node that has not received the update yet (It is not infected). ▸ Removed. 
 A node that has already received the update but it is not willing to share it.
  15. INTRODUCTION TO GOSSIP EPIDEMIC ALGORITHMS FOR REPLICATED DATABASE MAINTENANCE
 They

    analysed 3 methods for spreading the updates: ‣ Direct mail ‣ Anti-entropy ‣ Rumor mongering
  16. INTRODUCTION TO GOSSIP TYPES OF GOSSIP ▸ Anti-entropy (SI model)


    Simple epidemics. A node is always susceptible or infective. ▸ Rumor Mongering (SIR model)
 Complex epidemics. A node can be susceptible, infective or removed.
  17. INTRODUCTION TO GOSSIP MODELLING RUMOR SPREADING ‣ s proportion of

    nodes remain susceptible when gossip stops. ‣ k average number of times a node sends the update to a peer that already has it.
  18. INTRODUCTION TO GOSSIP MODELLING RUMOR SPREADING k =1 this formula

    suggest that 25% will miss the update , at k=2 only 6% will miss it, for k =5, 0.25%...
  19. INTRODUCTION TO GOSSIP STRATEGIES FOR SPREADING THE GOSSIP ▸ PUSH

    ▸ infective nodes are the ones infecting susceptible nodes. ▸ very efficient where there are few updates.
 ▸ PULL ▸ all nodes are actively pulling for updates. ▸ very efficient where there are many updates.
 ▸ PUSH-PULL ▸ The node and selected node exchange their information.

  20. INTRODUCTION TO GOSSIP CAVEATS ▸ Not very efficient. Messages can

    arrive several times to a node ▸ Too much bandwidth. ▸ Latency ▸ the randomness inherent in many gossip protocols can make it hard to reproduce and debug unexpected problems that arise at runtime ▸ Gossip protocols can’t scale well in some situations
  21. TEXT REFERENCES ▸ [1] A. Demers, D. Greene, C. Hauser,

    W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. “Epidemic Algorithms for Replicated Database Maintenance.” In Proc. Sixth Symp. on Principles of Distributed Computing, pp. 1–12, Aug. 1987. ACM. ▸ [2] Kermack, W. O.; McKendrick, A. G. (1927). "A Contribution to the Mathematical Theory of Epidemics". Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 115 (772) ▸ [3] Ken Birman. The Promise, and Limitations, of Gossip Protocols. SIGOPS Oper. Syst. Rev., 41(5):8–13, October 2007 ▸ [4] Gossip-based Protocols for Large-scale Distributed Systems. Márk Jelasity, 2013 ▸ [5] J. Leitão, J. Pereira, and L. Rodrigues. Epidemic broadcast trees. In Huai, J. and Baldoni, R. and Yen, I., editor, IEEE International Symposium On Reliable Distributed Systems, pages 301–310. IEEE Computer Society, 2007 ▸ [6] Ali Saidi and Mojdeh Mohtashemi. Minimum-cost first-push-then-pull gossip algorithm. IEEE Wireless Communications and Networking Conference, WCNC, pages 2554–2559, 2012

  22. TEXT REFERENCES ▸ [7] JELASITY, M., GUERRAOUI, R., KERMARREC, A.-M.,

    AND VAN STEEN, M. 2004. The peer sampling service: Experimental evaluation of unstructured gossip-based implementations. In Middleware 2004, H.-A. Jacobsen, Ed. Lecture Notes in Computer Science, vol. 3231. Springer- Verlag, 79–98. ▸ [8] http://status.aws.amazon.com/s3-20080720.html ▸ [9] http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archGossipAbout.html ▸ [10] https://www.consul.io/docs/internals/gossip.html ▸ [11] A Gossip-Style Failure Detection Service: Robbert van Renesse, Yaron Minsky, and Mark Hayden*; Dept. of Computer Science, Cornell University; 4118 Upson Hall, Ithaca, NY 14853 ▸ [12] Gupta, Indranil, Chandra, Tushar D., and Goldszmidt, Germ´an S. On scalable and efficient distributed failure detectors. In Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing, PODC ’01, pp. 170–179,New York, NY, USA, 2001. ACM. ISBN 1-58113-383-9. doi: 10.1145/383962.384010. URL http://doi.acm.org/10.1145/383962.384010
  23. TEXT REFERENCES ▸ [13] Montresor, A.: Intelligent Gossip. In: Studies

    on Computational Inteligence, Intelligent Distributed Computing, Systems and Applications, Springer, Heidelberg (2008) ▸ [14] On disseminating information reliably without broadcasting. Proceedings of the International Conference on Distributed Computing Systems (1987), pp. 74–81 ▸ [15] Brenda Baker and Robert Shostak. Gossips and telephones. Discrete Mathematics, 2(3):191–193, June 1972. ▸ [16] http://www.inf.u-szeged.hu/~jelasity/ddm/gossip.pdf ▸ [17] Kermarrec, Anne-Marie, and Steen, Maarten Van, “Gossiping in distributed systems”, ACM SIGOPS Operating Systems Review, Volume 41, Issue 5, Pages: 2 – 7, 2007. ▸ [18] S. Voulgaris, M. Jelasity, M. van Steen, A Robust and Scalable Peer-to-Peer Gossiping Protocol,Lecture Notes in Computer Science (LNCS), vol. 2872 (Springer, Berlin/ Heidelberg, 2004), pp. 47–58. doi:10.1007/b104265