Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we scaled GitLab for a 30k-employee company

How we scaled GitLab for a 30k-employee company

GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O.

https://www.youtube.com/watch?v=byZcOH92CiY

Minqi Pan

May 06, 2016
Tweet

More Decks by Minqi Pan

Other Decks in Programming

Transcript

  1. Linux Virtual Server
 (IP Virtual Server) • transport-layer load balancing

    inside kernel • layer-4 switching, unlike nginx (layer-7) • can: IP weighting, IP blocking, health checking • can’t: HTTP 200 Health Checking, URL rewriting
  2. Complications • SSH Host Key Synchronisation: do it once •

    SSH Client Key Synchronisation: do it every time • synchronised via redis pub-sub
  3. GitLab Geo • introduced in GitLab 8.5 EE • 1

    Master N Slave Replication • achieves A-P in C-A-P theorem • no disaster recovery • no sharing
  4. HTTP 80/443 SSH 22 nginx ssh2http routing via key namespace/repo_name

    GitLab shard FS shard GitLab shard FS shard GitLab shard FS shard
  5. GitLab Sharding • Introduces Sidekiq sharing as well • Introduces

    many changes to the application layer as well
 - need to have super user authentication
 - need to eliminate every page with requests across shards (e.g. admin page of repo sizes) • Tedious changes on the application level.
  6. How to deal with FS? • Hardware Network-Attached Storage? •

    Software Network-Attached Storage? • Remote Procedure Calls to FS shards? • Kill it?
  7. • Hard-NAS: Alibaba has non-IOE policies. • Soft-NAS: Alibaba does

    not have it yet. • RPC: GitRPC? Good. GitHub does that. • Kill FS: Use the cloud. Try something new!
  8. by “cloud” we mean… • Amazon S3: Amazon Simple Storage

    Service • Alibaba OSS: Alibaba Object Storage Service
  9. libgit2 git grit • used in wiki’s • via gollum-lib

    • via gollum-grit_adapter • eliminate-able via
 gollum-rugged_adapter gitlab-rails
  10. gitlab-rails libgit2 git • via gitlab_git • via rugged •

    backend
 replace-able • via gitlab-shell • via gitlab-workhorse • via popen • backend
 hard-to-replace (FS) grit
  11. odb’s refdb • stored via OSS • locked via redis

    hi-priority lo-priority loose OSS store packed OSS store
  12. Example • First byte of the name is 0x9f •

    IDX[8 + (0x9f - 1) * 4] == 0x0403 == 1027 • IDX[8 + 0x9f * 4] == 0x0403 == 1029 • Object No. 1027 ~ 1029 Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
  13. Example • Binary search 1027 ~ 1029 • Found at

    8 + 4 * 256 + 1027 * 20 == 21572 • Skip the rest total_num*(20+4) == 1628*24 Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
  14. Example • IDX[8 + 4 * 256 + 1628*24 +

    4 * 1027] Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 • PACK[0x0004482D] == PACK[280621]
  15. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 E3 11100011 1_______ => MSB 1 continue

    _110____ => type == 6 == OFS_DELTA ____0011 => length == 3 3-bit type, (n-1)*7+4-bit length
  16. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 offset == 5572 push 0x0004482D into stack

    deal with (0x0004482D - 5572) push (0x0004482D - 5572) into stack … root base
  17. Example SHA1 type size size-pack offset- pack depth base 9fcf811e00fa469

    688943a9152c16d 4ee90fb9a9 blob 19 32 280621 4 6110c89446f2281 e5db9b798a0fa02 0fad6e63e1 6110c89446f2281 e5db9b798a0fa02 0fad6e63e1 blob 52 45 275049 3 3bbeff3fc22b75c 1a26f4ab9b64449 b33002aea5 3bbeff3fc22b75c 1a26f4ab9b64449 b33002aea5 blob 2935 1263 273786 2 a39920830904665 6ecc01f7653c5d5 b8905fc16e a39920830904665 6ecc01f7653c5d5 b8905fc16e blob 4686 1540 272246 1 e4e56117de8b3bd 0bd899701da4712 caee27c7d6 e4e56117de8b3bd 0bd899701da4712 caee27c7d6 blob 12635 3279 115703 0 -
  18. git fetch / clone • git upload-pack --advertise-refs
 (rewritten via

    libgit2) • git upload-pack
 (untouched) • git pack-objects
 (rewritten via libgit2 pack builder)
  19. git push (small data) • git upload-pack --advertise-refs
 (rewritten via

    libgit2) • git upload-pack
 (untouched) • ntohl(hdr.hdr_entries) < unpack_limit • git unpack-objects
 (modified via libgit2, writing to loose OSS store)
  20. git push (big data) • git upload-pack --advertise-refs
 (rewritten via

    libgit2) • git upload-pack
 (untouched) • ntohl(hdr.hdr_entries) >= unpack_limit • git index-pack
 (modified via libgit2, writing to packed OSS store)
  21. git push • FS-based:
 6.27s user 1.72s system 14% cpu

    53.299 total • Cloud-based:
 6.13s user 1.29s system 13% cpu 54.697 total
  22. git push (delta) • FS-based:
 0.09s user 0.07s system 5%

    cpu 3.059 total • Cloud-based:
 0.04s user 0.05s system 3% cpu 2.845 total
  23. git clone • FS-based:
 6.89s user 8.99s system 33% cpu

    47.096 total • Cloud-based:
 7.08s user 8.12s system 20% cpu 1:14.12 total
  24. git fetch (delta) • FS-based:
 0.14s user 0.13s system 33%

    cpu 0.806 total • Cloud-based:
 0.09s user 0.10s system 1% cpu 16.019 total
  25. GET /namespace/repo/tree/ master • FS-based:
 Executing action: show - 74.5

    ms • Cloud-based:
 Executing action: show - 5877.7 ms
  26. GET /namespace/repo/tree/ master/builds • FS-based:
 Executing action: show - 50.0

    ms • Cloud-based:
 Executing action: show - 4547.0 ms
  27. odb hamburger refdb • cached via redis hi-priority lo-priority loose

    OSS store packed OSS store loose FS cache packed FS cache
  28. loose FS cache • cache written when
 ntohl(hdr.hdr_entries) < unpack_limit


    in git-unpack-objects • when reading via loose OSS store
  29. packed FS cache • cache written when
 ntohl(hdr.hdr_entries) >= unpack_limit


    in git-index-pack • cache written in git-pack-objects
  30. redis refdb cache • cache written when read and cache-miss

    • cache expired when refdb got updated
 e.g. git-receive-pack
  31. • develop libgit2 backends for AWS S3 • gitlab: favour

    libgit2, eliminate direct calls to git • gitlab: add settings to choose backends • gollum: use rugged as the default • libgit2: improve performance, e.g. pack builder