Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Availability at Backlog Git

High Availability at Backlog Git

@ Backlog Meetup in Hanoi!

Backlog:https://backlog.com/ja/
GitHub:https://github.com/vvatanabe
X:https://twitter.com/vvvatanabe

vvatanabe

March 27, 2024
Tweet

More Decks by vvatanabe

Other Decks in Technology

Transcript

  1. High Availability at Backlog Git Nulab Inc., Service Development Division,

    Backlog Section Yuichi Watanabe Copyright Nulab Inc. All Rights Reserved. Backlog Meetup in Hanoi
  2. 渡邉 祐一 Yuichi Watanabe I'm a big fan of the

    Go language and a passionate advocate for the open-source culture. Professionally, I'm involved in developing and operating Git hosting services. When I'm not working, I love crafting small, useful libraries and command- line tools. One of the things I truly enjoy is collaborating with developers from around the globe on GitHub to create unique pieces of software. https://github.com/vvatanabe Copyright Nulab Inc. All Rights Reserved. Nulab Inc.
  3. Backlog Git manages code and commit history linked to projects

    and issues. Setting Up Private Repositories Allows for easy sharing of work with the team by linking source code to projects. Commenting on Pull Requests Notify team members engaged in pair programming by commenting on differences between branches directly on the source code. Source Code Review Facilitates team source code reviews. Differences between files are displayed in different colors, clearly indicating who made changes where, for swift merge decisions. Checking Commit History Commit history can be reviewed by branch. Browsing history through the file browser allows viewing of only the history related to a file.
  4. What is a Git repository? A datastore that includes all

    data of a project along with its change history. Composed primarily of four types of objects: Blob object: Stores the content of a file, but not the file name or directory structure. Tree object: Represents directories and the files or other directories within them, holding pointers to Blobs and child Trees to express the directory structure. Commit object: A snapshot of the project at a certain point in time, including pointers to the Tree at the time of the commit, parent commits, commit message, and information about the author and committer. Tag object: Labels used to reference specific commits, mainly for indicating release versions. commit object tree object blob object tree object blob object blob object blob object branch tag pull request
  5. Overview of Git Hosting Architecture Amazon EKS ・Backlog Web, API

    and Git LFS AWS Cloud [EKS] WEB [EKS] API Amazon ECS (Graviton) ・Git HTTPS, SSH and Proxy ・Git Replication Worker and Git GC Worker [ECS] Git HTTPS [ECS] Git PROXY [ECS] Git SSH Amazon EC2 with EBS ・Git RPC Note: For simplification in the diagram, elements such as Availability Zones, VPCs, subnets, and load balancers are omitted. In practice, each service is deployed across multiple Availability Zones. And more... [ECS] Replication Worker [SQS] ・Amazon SQS, S3... [S3] [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker [SQS] EventBridge Scheduler [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EKS] Git LFS Using AWS as cloud computing platform
  6. Overview of Git Hosting Architecture The application is unified in

    Go ・The Git hosting application is primarily implemented in the Go language. ・It standardizes foundational code, monitoring, and CI/CD processes, enhancing development and operational efficiency. ・Logging rules can be unified more easily, improving traceability across services. ・Mechanisms for metric collection can be more readily standardized. ・Good practices specific to the language that are newly discovered can easily be propagated to other services. AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  7. Overview of Git Hosting Architecture Stateless Frontend Backlog’s Git hosting

    categorizes requests into five types, each processed by dedicated services: ・Backlog Web: For web browser requests. ・Backlog API: For Backlog API requests. ・Git LFS, HTTP: For HTTPS Git command requests. ・Git SSH: For SSH Git command requests. These services are stateless, not holding any storage. They authenticate and authorize requests, then use RPC to connect to backend services for data operations. AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  8. Overview of Git Hosting Architecture Stateful Backend In the system

    diagram, only the Git RPC service in the backend possesses storage. It receives RPC from the frontend to read from and write to Git repositories, akin to a database middleware specialized in Git repository operations. It runs on Amazon EC2 and mounts Amazon EBS. The configuration is an Active/Active Primary/Replica setup, with the Replica being an exact copy of the Primary. All write-oriented RPCs are processed by the Primary, while read-oriented RPCs can be handled by either the Primary or the Replica. AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  9. Overview of Git Hosting Architecture Amazon EBS for storing Git

    repositories 1. Amazon EFS Network file system that can be mounted from multiple servers, making it easy to scale servers.IO can become a performance bottleneck depending on the number and size of repository commits, potentially 10 times worse than Amazon EBS. 3. Amazon EBS Stable performance. Cannot be mounted from multiple servers. Requires a unique mechanism for redundancy. 2. Amazon S3 Highly durable cloud storage with 99.999999999% (11 nines) data durability. Requires FUSE to mount S3 buckets as a filesystem. Communication costs increase linearly with the amount of data read/written, potentially becoming a performance bottleneck.
  10. Overview of Git Hosting Architecture RPC Proxy Connecting Frontend and

    Backend The Git Proxy, positioned in the center of the diagram, receives all RPCs from the frontend and relays them to the backend. It serves as the central service in replication. AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  11. Services communication using gRPC Why Use gRPC? All services communicate

    using gRPC. The primary reason for choosing gRPC is its suitability for the diverse communication characteristics unique to Git. gRPC supports four types of communication methods, all of which are highly efficient for optimizing Git workflows. AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3] gRPC
  12. Services communication using gRPC Server Streaming RPC Write large amounts

    of data. e.g., git clone, git pull, git fetch Response Response Response gRPC Stub gRPC Server Client Streaming RPC Read large amounts of data. e.g., git push Unary RPC Small amounts of data. e.g., get commits, branches, tags gRPC Stub gRPC Server Response gRPC Stub gRPC Server Request Response Request Request Request Request
  13. Replication Mechanisms for High Availability 1. Synchronous Replication (Strong Consistency)

    ・Writing to multiple storages simultaneously. ・High data consistency with a very low risk of data loss. ・May result in increased latency and reduced write performance. ・Requires distributed transactions over the network using algorithms like 3PC. 2. Asynchronous Replication (Eventual Consistency) ・Upon being written to the primary storage, the information is added to a queue. ・Notifies that the write operation has completed before it is actually written to the secondary storage. ・There is a time lag until the data is synchronized, but it is less affected by latency. Something [EC2] GIT [EBS] Ripository [EC2] GIT [EBS] Ripository ① ② ③ ④ Something [EC2] GIT [EBS] Ripository [EC2] GIT [EBS] Ripository ① ② ③ ④ Replication Approaches ⑤
  14. Replication Mechanisms for High Availability The Git Proxy relays all

    RPCs from the front end to the appropriate backend and create a replication log during write operations. Dynamic gRPC Proxies at the core of replication Reading the attributes of an RPC Determining whether it's a write or read operation Relaying to the primary Sending replication logs to S3 Sending messages to SQS Relaying to the primary Determining if replication is in progress Relaying to the primary or replica write read Complete Incomplete Workflow of the Git Proxy: AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  15. Replication Mechanisms for High Availability Replication logs are critical data

    referenced by both the Git Proxy and Replication Workers. It's essential to have access to the most up-to-date replication logs at all times. Reasons for choosing S3: ・SQS does not support referencing messages by ID. ・The actual logs need to be maintained outside of SQS. ・S3 supports Strong Consistency. ・S3 ensures log consistency by always returning the latest data. Replication logs stored in S3 with strong consistency AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  16. Replication Mechanisms for High Availability ・Replication logs are issued per

    repository. ・The type of log is categorized into multiple events according to the write attributes. ・It is necessary to maintain the execution order of replication. for example: 1. Creation of the repository 2. Writing to the repository 3. Renaming the repository 4. Deleting the repository ・The order is guaranteed using SQS's FIFO and message grouping feature. ・Assigning a group ID to messages being enqueued ensures the delivery order of messages with the same group ID. Ensuring delivery order using SQS FIFO and message grouping AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  17. Replication Mechanisms for High Availability ・The Replication Worker polls SQS.

    ・Replication Workers are made redundant on ECS Fargate, allowing a single ECS task to process multiple messages concurrently. ・It retrieves the actual replication log from S3 using the keys contained in the fetched messages. ・Based on the replication log's content, it identifies the target Git repository and the type of replication, then executes the appropriate RPC for replication on the replica Git server. ・Upon successful replication, it deletes the replication log in S3 and the message in SQS. If replication fails, it updates the SQS visibility timeout for a retry. Controlling Replication with Replication Workers AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  18. Replication Mechanisms for High Availability ・Replication RPCs provided by Git

    RPC are available for each type of replication. ・All are designed to be idempotent, meaning they can be retried and executed multiple times without issues. ・Examples include RPCs for duplicating Git objects like blobs, commits, trees, and references such as branches, tags, and pull requests. ・These RPCs replicate by executing the git fetch sub command from the replica to the primary. ・The transaction of git fetch ensures data consistency even in the event of unexpected errors. Since git fetch is idempotent, it can process only the differences even if executed multiple times. Idempotent Replication RPCs AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  19. パフォーマンスとコストの最適化 Replication Mechanisms for High Availability ・There was a problem

    where the high processing cost of Git GC frequently occurred, significantly consuming EC2 compute resources. ・To address this, the execution of Git GC was hooked to temporarily store repository information targeted for GC in SQS, scheduling the processing during the low-access late- night hours. ・AWS Lambda was initially considered for this scheduled batch process but was ruled out due to the long execution time of Git GC. ・To accommodate the processing time, EventBridge Scheduler was used to periodically execute tasks on ECS Fargate based on the date and time. Batch Processing for Git GC Using ECS Task and EventBridge Scheduler AWS Cloud [EKS] WEB [EKS] API [ECS] Git PROXY [ECS] Git HTTPS [ECS] Replication Worker [ECS] Git SSH [SQS] [S3] [EC2] GIT RPC [EBS] Ripository (Primary) [EC2] GIT RPC [EBS] Ripository (Replica) [ECS] GC Worker EventBridge Scheduler [SQS] [EKS] Git LFS [S3]
  20. パフォーマンスとコストの最適化 Conclusion We introduced the high availability mechanisms of Backlog's

    Git hosting based on its actual architecture. ・The object database as a repository is stored on Amazon EBS. ・Services are divided for processing based on the characteristics of the request. ・Services with storage are centralized in the backend, made redundant in an Active/Active Primary/Replica configuration. ・All services are connected via gRPC, choosing the optimal communication method according to the request characteristics. ・A core dynamic reverse proxy for gRPC manages replication and distribution. ・Primary and Replica are asynchronously replicated on a per-repository basis. In Backlog's Git hosting,