Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we scaled GitLab for a 30k-employee company

How we scaled GitLab for a 30k-employee company

GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O.

https://www.youtube.com/watch?v=byZcOH92CiY

Minqi Pan

May 06, 2016
Tweet

More Decks by Minqi Pan

Other Decks in Programming

Transcript

  1. How we scaled GitLab
    for a 30k-employee company
    Minqi Pan

    View full-size slide

  2. Hello, I’m Minqi Pan
    github.com/pmq20
    twitter
    @psvr

    View full-size slide

  3. What’s GitLab?

    View full-size slide

  4. GitLab
    a git-box

    installed on-premises

    View full-size slide

  5. GitLab
    HTTP
    80/443
    SSH
    22

    View full-size slide

  6. GitLab
    HTTP
    80/443
    SSH
    22

    View full-size slide

  7. GitLab
    Redis
    MySQL File
    System

    View full-size slide

  8. What’s inside?
    GitLab

    View full-size slide

  9. NGINX OpenSSH Server
    Unicorn gitlab-shell
    Gitlab Workhorse
    git
    gitlab_git
    rails sidekiq rugged
    libgit2

    View full-size slide

  10. Works great
    for small teams

    View full-size slide

  11. to make it easy to do business anywhere

    View full-size slide

  12. Let’s scale it!

    View full-size slide

  13. GitLab
    HTTP
    80/443
    SSH
    22

    View full-size slide

  14. HTTP
    80/443
    SSH
    22
    unicorn unicorn
    unicorn …

    View full-size slide

  15. HTTP
    80/443
    SSH
    22
    unicorn unicorn
    unicorn …
    nginx ?

    View full-size slide

  16. HTTP
    80/443
    SSH
    22
    unicorn unicorn
    unicorn …
    nginx
    ssh2http
    https://github.com/pmq20/ssh2http

    View full-size slide

  17. unicorn unicorn
    unicorn …
    LVS (IPVS)
    HTTP
    80/443
    SSH
    22

    View full-size slide

  18. Linux Virtual Server

    (IP Virtual Server)
    • transport-layer load balancing inside kernel
    • layer-4 switching, unlike nginx (layer-7)
    • can: IP weighting, IP blocking, health checking
    • can’t: HTTP 200 Health Checking, URL rewriting

    View full-size slide

  19. Complications
    • SSH Host Key Synchronisation: do it once
    • SSH Client Key Synchronisation: do it every time
    • synchronised via redis pub-sub

    View full-size slide

  20. Does it scale
    in the backend?

    View full-size slide

  21. IV. Backing services
    Treat backing services as attached resources

    View full-size slide


  22. Redis

    MySQL

    File System
    GitLab
    * git repositories
    * user generated
    attachments / avatars

    View full-size slide

  23. GitLab Geo
    • introduced in GitLab 8.5 EE
    • 1 Master N Slave Replication
    • achieves A-P in C-A-P theorem
    • no disaster recovery
    • no sharing

    View full-size slide

  24. HTTP
    80/443
    SSH
    22
    nginx
    ssh2http
    routing via key
    namespace/repo_name
    GitLab shard
    FS shard
    GitLab shard
    FS shard
    GitLab shard
    FS shard

    View full-size slide

  25. GitLab Sharding
    • Introduces Sidekiq sharing as well
    • Introduces many changes to the application
    layer as well

    - need to have super user authentication

    - need to eliminate every page with requests
    across shards (e.g. admin page of repo sizes)
    • Tedious changes on the application level.

    View full-size slide

  26. How to deal with FS?

    Hardware Network-Attached Storage?

    Software Network-Attached Storage?

    Remote Procedure Calls to FS shards?

    Kill it?

    View full-size slide

  27. • Hard-NAS: Alibaba has non-IOE policies.
    • Soft-NAS: Alibaba does not have it yet.
    • RPC: GitRPC? Good. GitHub does that.
    • Kill FS: Use the cloud. Try something new!

    View full-size slide

  28. by “cloud” we mean…
    • Amazon S3: Amazon Simple Storage Service
    • Alibaba OSS: Alibaba Object Storage Service

    View full-size slide

  29. libgit2 git grit
    • used in wiki’s
    • via gollum-lib
    • via gollum-grit_adapter
    • eliminate-able via

    gollum-rugged_adapter
    gitlab-rails

    View full-size slide

  30. gitlab-rails
    libgit2 git
    • via gitlab_git
    • via rugged
    • backend

    replace-able
    • via gitlab-shell
    • via gitlab-workhorse
    • via popen
    • backend

    hard-to-replace (FS)
    grit

    View full-size slide

  31. gitlab-workhorse
    gitlab-rails gitlab-shell
    git
    libgit2
    Cloud Based Backend


    grit

    View full-size slide

  32. Cloud Based Backend

    View full-size slide

  33. odb’s refdb
    • stored via OSS
    • locked via redis
    hi-priority
    lo-priority
    loose OSS store
    packed OSS store

    View full-size slide

  34. OSS refdb (read)

    View full-size slide

  35. OSS refdb (write)

    View full-size slide

  36. loose OSS store (write)

    View full-size slide

  37. loose OSS store (read)

    View full-size slide

  38. packed OSS store (write)

    View full-size slide

  39. packed OSS store (read)
    via HTTP “Range” header

    View full-size slide

  40. packed OSS store (read)

    View full-size slide

  41. Example
    • First byte of the name is 0x9f
    • IDX[8 + (0x9f - 1) * 4] == 0x0403 == 1027
    • IDX[8 + 0x9f * 4] == 0x0403 == 1029
    • Object No. 1027 ~ 1029
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

    View full-size slide

  42. Example
    • Binary search 1027 ~ 1029
    • Found at 8 + 4 * 256 + 1027 * 20 == 21572
    • Skip the rest total_num*(20+4) == 1628*24
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

    View full-size slide

  43. Example
    • IDX[8 + 4 * 256 + 1628*24 + 4 * 1027]
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D] == PACK[280621]

    View full-size slide

  44. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    E3 11100011
    1_______ => MSB 1 continue
    _110____ => type == 6 == OFS_DELTA
    ____0011 => length == 3
    3-bit type, (n-1)*7+4-bit length

    View full-size slide

  45. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D]
    01 00000001
    0_______ => MSB 0 break
    _0000001 => length += (1 << 4)
    final length == 19

    View full-size slide

  46. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D]
    AA 10101010
    1_______ MSB 1 continue
    _0101010 base offset == 42

    View full-size slide

  47. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D]
    44 01000100
    0_______ MSB 0 break
    _1000100 offset == ((42+1)<<7)+68
    == 5572

    View full-size slide

  48. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    offset == 5572
    push 0x0004482D into stack
    deal with (0x0004482D - 5572)
    push (0x0004482D - 5572) into stack

    root base

    View full-size slide

  49. Example
    SHA1 type size size-pack
    offset-
    pack
    depth base
    9fcf811e00fa469
    688943a9152c16d
    4ee90fb9a9
    blob 19 32 280621 4 6110c89446f2281
    e5db9b798a0fa02
    0fad6e63e1
    6110c89446f2281
    e5db9b798a0fa02
    0fad6e63e1
    blob 52 45 275049 3 3bbeff3fc22b75c
    1a26f4ab9b64449
    b33002aea5
    3bbeff3fc22b75c
    1a26f4ab9b64449
    b33002aea5
    blob 2935 1263 273786 2 a39920830904665
    6ecc01f7653c5d5
    b8905fc16e
    a39920830904665
    6ecc01f7653c5d5
    b8905fc16e
    blob 4686 1540 272246 1 e4e56117de8b3bd
    0bd899701da4712
    caee27c7d6
    e4e56117de8b3bd
    0bd899701da4712
    caee27c7d6
    blob 12635 3279 115703 0 -

    View full-size slide

  50. git → libgit2

    View full-size slide

  51. git fetch / clone
    • git upload-pack --advertise-refs

    (rewritten via libgit2)
    • git upload-pack

    (untouched)
    • git pack-objects

    (rewritten via libgit2 pack builder)

    View full-size slide

  52. git push (small data)
    • git upload-pack --advertise-refs

    (rewritten via libgit2)
    • git upload-pack

    (untouched)
    • ntohl(hdr.hdr_entries) < unpack_limit
    • git unpack-objects

    (modified via libgit2, writing to loose OSS store)

    View full-size slide

  53. git push (big data)
    • git upload-pack --advertise-refs

    (rewritten via libgit2)
    • git upload-pack

    (untouched)
    • ntohl(hdr.hdr_entries) >= unpack_limit
    • git index-pack

    (modified via libgit2, writing to packed OSS store)

    View full-size slide

  54. Naked Benchmark

    (no cache)

    View full-size slide

  55. Fixture
    • Repository: gitlab-ce
    • https://gitlab.com/gitlab-org/gitlab-ce.git
    • More than 200k objects
    • More than 100MB when packed

    View full-size slide

  56. git push
    • FS-based:

    6.27s user 1.72s system 14% cpu 53.299 total
    • Cloud-based:

    6.13s user 1.29s system 13% cpu 54.697 total

    View full-size slide

  57. git push (delta)
    • FS-based:

    0.09s user 0.07s system 5% cpu 3.059 total
    • Cloud-based:

    0.04s user 0.05s system 3% cpu 2.845 total

    View full-size slide

  58. git clone
    • FS-based:

    6.89s user 8.99s system 33% cpu 47.096 total
    • Cloud-based:

    7.08s user 8.12s system 20% cpu 1:14.12 total

    View full-size slide

  59. git fetch (delta)
    • FS-based:

    0.14s user 0.13s system 33% cpu 0.806 total
    • Cloud-based:

    0.09s user 0.10s system 1% cpu 16.019 total

    View full-size slide

  60. GET /namespace/repo/tree/
    master
    • FS-based:

    Executing action: show - 74.5 ms
    • Cloud-based:

    Executing action: show - 5877.7 ms

    View full-size slide

  61. GET /namespace/repo/tree/
    master/builds
    • FS-based:

    Executing action: show - 50.0 ms
    • Cloud-based:

    Executing action: show - 4547.0 ms

    View full-size slide

  62. odb hamburger refdb
    • cached via redis
    hi-priority
    lo-priority
    loose OSS store
    packed OSS store
    loose FS cache
    packed FS cache

    View full-size slide

  63. loose FS cache
    • cache written when

    ntohl(hdr.hdr_entries) < unpack_limit

    in git-unpack-objects
    • when reading via loose OSS store

    View full-size slide

  64. packed FS cache
    • cache written when

    ntohl(hdr.hdr_entries) >= unpack_limit

    in git-index-pack
    • cache written in git-pack-objects

    View full-size slide

  65. redis refdb cache
    • cache written when read and cache-miss
    • cache expired when refdb got updated

    e.g. git-receive-pack

    View full-size slide

  66. • develop libgit2 backends for AWS S3
    • gitlab: favour libgit2, eliminate direct calls to git
    • gitlab: add settings to choose backends
    • gollum: use rugged as the default
    • libgit2: improve performance, e.g. pack builder

    View full-size slide

  67. https://github.com/pmq20

    View full-size slide