Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Central Dogma - Highly-available version-contro...

Central Dogma - Highly-available version-controlled service configuration repository

This session introduces Central Dogma's multi-master architecture and disaster recovery strategy. Additionally, the core features of Central Dogma are briefly introduced.

Ikhoon Eom

March 23, 2022
Tweet

More Decks by Ikhoon Eom

Other Decks in Programming

Transcript

  1. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  2. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  3. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  4. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  5. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  6. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  7. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  8. Central Dogma is… • Repository service for textual configuration ◦

    Primarily JSON ◦ YAML, XML, INI, JavaScript, … • Highly available • Version controlled • Advance query mechanism • Change notification • Fine-grained access control • Mirroring from an external Git repository
  9. High availability (HA) is a characteristic of a system which

    aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. https://en.wikipedia.org/wiki/High_availability
  10. A single point can be a failure someday from: -

    Hardware - Network - Power - Human mistake - … - … Single point of failure Master Single point
  11. Master-slave architecture - Write to master - Read from slave

    Master Slave1 Slave2 Slave3 replication replication replication 󰞵 󰠁 read write/read read
  12. Single point of failure - master node Master Slave1 Slave2

    Slave3 replication replication replication 󰞵 󰠁 read write/read read
  13. Single point of failure - master node Master Slave1 Slave2

    Slave3 replication replication replication 󰞵 󰠁 read write/read read 💥 💥 💥 💥 If the master breaks down, write operations cannot be performed.
  14. Promote a slave into master Master (Stopped) New master (Slave1)

    Slave2 Slave3 replication replication 󰞵 󰠁 read write/read read Promotion Clients should wait for a new master to be promoted. Who is the master node??
  15. Multi-master architecture Master 5 Master 1 Master 2 Master 3

    Master 4 replication replication replication replication • All nodes in Central Dogma cluster are master. • Clients can write data to any nodes. • Clients can read data from any nodes. replication 󰞵 󰠁 write/read write/read write/read
  16. No single point of failure 󰢂 Master 5 Master 1

    Master 2 Master 3 Master 4 replication replication replication Even if a master node breaks down, clients continue to read and write operations with zero downtime. replication 󰞵 󰠁 write/read write/read write/read
  17. • Data added to a master should be applied to

    all other master nodes. • What if the same file is updated on different servers at the same time with different contents? replication Challenging issues in multi-master Master 1 Master 2 💻 A’ write A → A’ A A → A’ replication Master 1 Master 2 💻 B write A → B C A → C 💻 write A → C A → B Master 3 A → A’ replication A conflict 💥 consensus algorithm
  18. Central Dogma ❤ ZooKeeper • Apache ZooKeeper is an open-source

    server for distributed coordination. • Apache Curator provides a variety of recipes on the top of ZooKeeper which Central Dogma needs. • ZooKeeper is a server, so we have to build a separate cluster. • However, external components could be another single point of failure. • Central Dogma should guarantee high availability.
  19. Embedded ZooKeeper Central Dogma Embedded ZooKeeper Central Dogma Embedded ZooKeeper

    create an ensemble Launch ZooKeeper inside Central Dogma
  20. Central Dogma Embedded ZooKeeper ZooKeeper replication log Central Dogma Embedded

    ZooKeeper Store replication logs to znode of ZooKeeper Central Dogma Embedded ZooKeeper Central Dogma Embedded ZooKeeper Read new replication logs from ZooKeeper and apply the logs to the old data replication replication Synchronize new changes with ZooKeeper Local storage Local storage
  21. Acquires a global lock before committing a data to avoid

    concurrent modification. Distributed locks Master 1 󰠁 write A → B 󰞵 write A → C A → B 1. Acquire a lock for A and update A to B Curator distributed locks Master 1 Curator distributed locks replication 2. Wait to acquire a lock for A 3. Abort the late request due to conflict. return an error 💥 https://curator.apache.org/curator-recipes/shared-reentrant-lock.html
  22. Disaster recovery • Consensus requires the vote of majority. •

    Writes will stop when the majority of replicas are stopped. • An entire cluster in a data center could break down. Master 5 Master 1 Master 2 Master 3 Master 4 replication replication replication replication replication
  23. Multi-data center consistency Master 5 Master 1 Master 2 Master

    3 Master 4 replication replication replication replication replication Master 10 Master 6 Master 7 Master 8 Master 9 replication replication replication replication replication Tokyo DC Osaka DC Master 15 Master 11 Master 12 Master 13 Master 14 replication replication replication replication replication Taiwan DC replication replication replication replication replication replication replication replication replication Consensus slows down as the number of replicas increases. It now wants 8 votes of 15 replicas.
  24. Hierarchical quorum Master 5 Master 1 Master 2 Master 3

    Master 4 replication replication replication replication replication Master 10 Master 6 Master 7 Master 8 Master 9 replication replication replication replication replication Tokyo DC Osaka DC Master 15 Master 11 Master 12 Master 13 Master 14 replication replication replication replication replication Taiwan DC https://zookeeper.apache.org/doc/r3.6.0/zookeeperHierarchicalQuorums.html replication replication replication Hierarchical quorum requires only 6 votes for 15 replicas.
  25. Disaster recovery Master 5 Master 1 Master 2 Master 3

    Master 4 Master 10 Master 6 Master 7 Master 8 Master 9 replication replication replication replication replication Tokyo DC Osaka DC Master 15 Master 11 Master 12 Master 13 Master 14 replication replication replication replication replication Taiwan DC replication 󰞵 󰠁 write/read 󰞵 󰠁 write/read
  26. Storage All data MUST be stored safely. An external storage

    could be a single point of failure, and hard to orchestrate overall systems in an urgent situation.
  27. • All data are stored to Git. ◦ History -

    diffs and authors ◦ Bigger than RAM ◦ Version-controlled JGit as a storage engine Git repositories Git repositories JGit https://www.eclipse.org/jgit/ Git repository
  28. Focus on simplicity • Integer revision numbers ◦ Easily know

    the relation of revisions • Linear history - no branches Git repository
  29. Advanced query mechanism • … thanks to the first-class JSON

    support • JSON path $.store.book[*].author $.store.book[?(@.price < 10)] $..book[(?(@.author =~ /.*RESS/i)] • JSON patch - RFC6902 [{ "op": "remove", "path": "/a/b/c" }, { "op": "add", "path": "/a/b/c", "value": ["foo", "bar"]}, { "op": "replace", "path": "/a/b/c", "value": 42 }]
  30. Configuration should be dynamically changeable • What’s fetched at start-time

    ◦ Application parameters ◦ Bean properties • What’s updated at run-time ◦ API rate limit ◦ Scheduled maintenance notice ◦ Roll-out & A/B experiment parameters
  31. Change notification CentralDogma dogma = new ArmeriaCentralDogmaBuilder().host("example.com").build(); CentralDogmaRepository repo =

    dogma.forRepo("my_project", "my_repository"); Watcher<JsonNode> watcher = repo.watcher(Query.ofJsonPath("settings.json", "$.foo")) .start(); watcher.watch((revision, json) -> { // 👈👈👈 System.out.println("Foo has been updated to " + json + " at revision " + revision); }); Clients get notified on a new commit. https://line.github.io/centraldogma/client-java.html
  32. Fine-grained access control • Apache Shiro as the authentication layer

    • Four roles ◦ Administrator, Owner, Member, Guest • In a repository, read and write permission can be set based on: ◦ Roles, users and tokens • Application token ◦ Represents a virtual user
  33. Mirroring from an external Git repository • Keep your settings

    in a GitHub / GitLab repository • Send a pull request to modify the configuration • Get it reviewed and merged • Let your services read from Central Dogma ◦ Queryable ◦ Watchable ◦ Highly-available ◦ Accessible from the same network