Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Axon Server went RAFTing

Axon Server went RAFTing

RAFT protocol is a well-known protocol for consensus in Distributed Systems. Want to learn how consensus is achieved in a system with a large amount of data such as Axon Server’s Event Store? Join this talk to hear about all specifics regarding data replication in highly available Event Store!

Axon is a free and open source Java framework for writing Java applications following DDD, event sourcing, and CQRS principles. While especially useful in a microservices context, Axon provides great value in building structured monoliths that can be broken down into microservices when needed.

Axon Server is a messaging platform specifically built to support distributed Axon applications. One of its key benefits is storing events published by Axon applications. In not so rare cases, the number of these events is over millions, even billions. Availability of Axon Server plays a significant role in the product portfolio. To keep event replication reliable we chose RAFT protocol for consensus implementation of our clustering features.

In short, consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).

Join this talk to learn why we chose RAFT; what were our findings during the design, the implementation, and testing phase; and what does it mean to replicate an event store holding billions of events!

Avatar for m1l4n54v1c

m1l4n54v1c

May 17, 2019
Tweet

More Decks by m1l4n54v1c

Other Decks in Programming

Transcript

  1. Event Store • An Event Store stores the published events

    to be retrieved both by consumers as well as the publishing component itself. @MilanSavic14
  2. Election Safety (1/5) • At most one leader can be

    elected in a given term @MilanSavic14
  3. Leader Append-Only (2/5) • A leader never overwrites or deletes

    entries in its log; it only appends new entries @MilanSavic14
  4. Log Matching (3/5) • If two logs contain an entry

    with the same index and term, then the logs are identical in all entries up through the given index @MilanSavic14
  5. Leader Completeness (4/5) • If a log entry is committed

    in a given term, then that entry will be present in the logs of the leaders for all higher- numbered terms @MilanSavic14
  6. State Machine Safety (5/5) • If a server has applied

    a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index @MilanSavic14
  7. Testing • Development environment • Build environment • Cloud environment

    @MilanSavic14 • Unit Tests • Integration Tests • E2E Tests • Performance Tests • Chaos experiments Test Types Environments
  8. Lessons Learned • Start as simple as possible • Don’t

    deviate from RAFT paper • Timing is important • Large state makes Install Snapshot more difficult • Test, test, test @MilanSavic14
  9. Resources • https://raft.github.io/ • https://axoniq.io/, https://bit.ly/2syZ1f1 • In Search of

    an Understandable Consensus Algorithm (Extended Version) - Diego Ongaro and John Ousterhout, Stanford University • CONSENSUS: BRIDGING THEORY AND PRACTICE – Diego Ongaro • http://gousios.org/courses/bigdata/book/introduction-to- distributed-systems.html @MilanSavic14