Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we scaled GitLab for a 30k-employee company

How we scaled GitLab for a 30k-employee company

GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O.

https://www.youtube.com/watch?v=byZcOH92CiY

8002c84eb4c18170632f8fb7efb09288?s=128

Minqi Pan

May 06, 2016
Tweet

Transcript

  1. How we scaled GitLab for a 30k-employee company Minqi Pan

  2. Hello, I’m Minqi Pan github.com/pmq20 twitter @psvr

  3. What’s GitLab?

  4. GitLab a git-box
 installed on-premises

  5. GitLab HTTP 80/443 SSH 22

  6. GitLab HTTP 80/443 SSH 22

  7. GitLab Redis MySQL File System

  8. What’s inside? GitLab

  9. NGINX OpenSSH Server Unicorn gitlab-shell Gitlab Workhorse git gitlab_git rails

    sidekiq rugged libgit2
  10. Works great for small teams

  11. However

  12. to make it easy to do business anywhere

  13. Let’s scale it!

  14. GitLab HTTP 80/443 SSH 22

  15. HTTP 80/443 SSH 22 unicorn unicorn unicorn …

  16. HTTP 80/443 SSH 22 unicorn unicorn unicorn … nginx ?

  17. HTTP 80/443 SSH 22 unicorn unicorn unicorn … nginx ssh2http

    https://github.com/pmq20/ssh2http
  18. unicorn unicorn unicorn … LVS (IPVS) HTTP 80/443 SSH 22

  19. Linux Virtual Server
 (IP Virtual Server) • transport-layer load balancing

    inside kernel • layer-4 switching, unlike nginx (layer-7) • can: IP weighting, IP blocking, health checking • can’t: HTTP 200 Health Checking, URL rewriting
  20. Complications • SSH Host Key Synchronisation: do it once •

    SSH Client Key Synchronisation: do it every time • synchronised via redis pub-sub
  21. Does it scale in the backend?

  22. IV. Backing services Treat backing services as attached resources

  23. None
  24. Redis MySQL File System GitLab * git repositories * user

    generated attachments / avatars
  25. None
  26. GitLab Geo • introduced in GitLab 8.5 EE • 1

    Master N Slave Replication • achieves A-P in C-A-P theorem • no disaster recovery • no sharing
  27. HTTP 80/443 SSH 22 nginx ssh2http routing via key namespace/repo_name

    GitLab shard FS shard GitLab shard FS shard GitLab shard FS shard
  28. GitLab Sharding • Introduces Sidekiq sharing as well • Introduces

    many changes to the application layer as well
 - need to have super user authentication
 - need to eliminate every page with requests across shards (e.g. admin page of repo sizes) • Tedious changes on the application level.
  29. How to deal with FS? • Hardware Network-Attached Storage? •

    Software Network-Attached Storage? • Remote Procedure Calls to FS shards? • Kill it?
  30. • Hard-NAS: Alibaba has non-IOE policies. • Soft-NAS: Alibaba does

    not have it yet. • RPC: GitRPC? Good. GitHub does that. • Kill FS: Use the cloud. Try something new!
  31. by “cloud” we mean… • Amazon S3: Amazon Simple Storage

    Service • Alibaba OSS: Alibaba Object Storage Service
  32. libgit2 git grit • used in wiki’s • via gollum-lib

    • via gollum-grit_adapter • eliminate-able via
 gollum-rugged_adapter gitlab-rails
  33. gitlab-rails libgit2 git • via gitlab_git • via rugged •

    backend
 replace-able • via gitlab-shell • via gitlab-workhorse • via popen • backend
 hard-to-replace (FS) grit
  34. Basic Idea

  35. gitlab-workhorse gitlab-rails gitlab-shell git libgit2 Cloud Based Backend 
 


    grit
  36. Cloud Based Backend

  37. odb’s refdb • stored via OSS • locked via redis

    hi-priority lo-priority loose OSS store packed OSS store
  38. OSS refdb (read)

  39. OSS refdb (write)

  40. loose OSS store (write)

  41. loose OSS store (read)

  42. packed OSS store (write)

  43. packed OSS store (read) via HTTP “Range” header

  44. packed OSS store (read)

  45. Example • First byte of the name is 0x9f •

    IDX[8 + (0x9f - 1) * 4] == 0x0403 == 1027 • IDX[8 + 0x9f * 4] == 0x0403 == 1029 • Object No. 1027 ~ 1029 Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
  46. Example • Binary search 1027 ~ 1029 • Found at

    8 + 4 * 256 + 1027 * 20 == 21572 • Skip the rest total_num*(20+4) == 1628*24 Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
  47. Example • IDX[8 + 4 * 256 + 1628*24 +

    4 * 1027] Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 • PACK[0x0004482D] == PACK[280621]
  48. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 E3 11100011 1_______ => MSB 1 continue

    _110____ => type == 6 == OFS_DELTA ____0011 => length == 3 3-bit type, (n-1)*7+4-bit length
  49. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 • PACK[0x0004482D] 01 00000001 0_______ => MSB

    0 break _0000001 => length += (1 << 4) final length == 19
  50. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 • PACK[0x0004482D] AA 10101010 1_______ MSB 1

    continue _0101010 base offset == 42
  51. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 • PACK[0x0004482D] 44 01000100 0_______ MSB 0

    break _1000100 offset == ((42+1)<<7)+68 == 5572
  52. Example Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9 offset == 5572 push 0x0004482D into stack

    deal with (0x0004482D - 5572) push (0x0004482D - 5572) into stack … root base
  53. Example SHA1 type size size-pack offset- pack depth base 9fcf811e00fa469

    688943a9152c16d 4ee90fb9a9 blob 19 32 280621 4 6110c89446f2281 e5db9b798a0fa02 0fad6e63e1 6110c89446f2281 e5db9b798a0fa02 0fad6e63e1 blob 52 45 275049 3 3bbeff3fc22b75c 1a26f4ab9b64449 b33002aea5 3bbeff3fc22b75c 1a26f4ab9b64449 b33002aea5 blob 2935 1263 273786 2 a39920830904665 6ecc01f7653c5d5 b8905fc16e a39920830904665 6ecc01f7653c5d5 b8905fc16e blob 4686 1540 272246 1 e4e56117de8b3bd 0bd899701da4712 caee27c7d6 e4e56117de8b3bd 0bd899701da4712 caee27c7d6 blob 12635 3279 115703 0 -
  54. git → libgit2

  55. git fetch / clone • git upload-pack --advertise-refs
 (rewritten via

    libgit2) • git upload-pack
 (untouched) • git pack-objects
 (rewritten via libgit2 pack builder)
  56. git push (small data) • git upload-pack --advertise-refs
 (rewritten via

    libgit2) • git upload-pack
 (untouched) • ntohl(hdr.hdr_entries) < unpack_limit • git unpack-objects
 (modified via libgit2, writing to loose OSS store)
  57. git push (big data) • git upload-pack --advertise-refs
 (rewritten via

    libgit2) • git upload-pack
 (untouched) • ntohl(hdr.hdr_entries) >= unpack_limit • git index-pack
 (modified via libgit2, writing to packed OSS store)
  58. Naked Benchmark
 (no cache)

  59. Fixture • Repository: gitlab-ce • https://gitlab.com/gitlab-org/gitlab-ce.git • More than 200k

    objects • More than 100MB when packed
  60. git push • FS-based:
 6.27s user 1.72s system 14% cpu

    53.299 total • Cloud-based:
 6.13s user 1.29s system 13% cpu 54.697 total
  61. git push (delta) • FS-based:
 0.09s user 0.07s system 5%

    cpu 3.059 total • Cloud-based:
 0.04s user 0.05s system 3% cpu 2.845 total
  62. git clone • FS-based:
 6.89s user 8.99s system 33% cpu

    47.096 total • Cloud-based:
 7.08s user 8.12s system 20% cpu 1:14.12 total
  63. git fetch (delta) • FS-based:
 0.14s user 0.13s system 33%

    cpu 0.806 total • Cloud-based:
 0.09s user 0.10s system 1% cpu 16.019 total
  64. GET /namespace/repo/tree/ master • FS-based:
 Executing action: show - 74.5

    ms • Cloud-based:
 Executing action: show - 5877.7 ms
  65. GET /namespace/repo/tree/ master/builds • FS-based:
 Executing action: show - 50.0

    ms • Cloud-based:
 Executing action: show - 4547.0 ms
  66. Cache

  67. odb hamburger refdb • cached via redis hi-priority lo-priority loose

    OSS store packed OSS store loose FS cache packed FS cache
  68. loose FS cache • cache written when
 ntohl(hdr.hdr_entries) < unpack_limit


    in git-unpack-objects • when reading via loose OSS store
  69. packed FS cache • cache written when
 ntohl(hdr.hdr_entries) >= unpack_limit


    in git-index-pack • cache written in git-pack-objects
  70. redis refdb cache • cache written when read and cache-miss

    • cache expired when refdb got updated
 e.g. git-receive-pack
  71. Future Work

  72. • develop libgit2 backends for AWS S3 • gitlab: favour

    libgit2, eliminate direct calls to git • gitlab: add settings to choose backends • gollum: use rugged as the default • libgit2: improve performance, e.g. pack builder
  73. https://github.com/pmq20