Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we scaled GitLab for a 30k-employee company

How we scaled GitLab for a 30k-employee company

GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O.

https://www.youtube.com/watch?v=byZcOH92CiY

Minqi Pan

May 06, 2016
Tweet

More Decks by Minqi Pan

Other Decks in Programming

Transcript

  1. How we scaled GitLab
    for a 30k-employee company
    Minqi Pan

    View Slide

  2. Hello, I’m Minqi Pan
    github.com/pmq20
    twitter
    @psvr

    View Slide

  3. What’s GitLab?

    View Slide

  4. GitLab
    a git-box

    installed on-premises

    View Slide

  5. GitLab
    HTTP
    80/443
    SSH
    22

    View Slide

  6. GitLab
    HTTP
    80/443
    SSH
    22

    View Slide

  7. GitLab
    Redis
    MySQL File
    System

    View Slide

  8. What’s inside?
    GitLab

    View Slide

  9. NGINX OpenSSH Server
    Unicorn gitlab-shell
    Gitlab Workhorse
    git
    gitlab_git
    rails sidekiq rugged
    libgit2

    View Slide

  10. Works great
    for small teams

    View Slide

  11. However

    View Slide

  12. to make it easy to do business anywhere

    View Slide

  13. Let’s scale it!

    View Slide

  14. GitLab
    HTTP
    80/443
    SSH
    22

    View Slide

  15. HTTP
    80/443
    SSH
    22
    unicorn unicorn
    unicorn …

    View Slide

  16. HTTP
    80/443
    SSH
    22
    unicorn unicorn
    unicorn …
    nginx ?

    View Slide

  17. HTTP
    80/443
    SSH
    22
    unicorn unicorn
    unicorn …
    nginx
    ssh2http
    https://github.com/pmq20/ssh2http

    View Slide

  18. unicorn unicorn
    unicorn …
    LVS (IPVS)
    HTTP
    80/443
    SSH
    22

    View Slide

  19. Linux Virtual Server

    (IP Virtual Server)
    • transport-layer load balancing inside kernel
    • layer-4 switching, unlike nginx (layer-7)
    • can: IP weighting, IP blocking, health checking
    • can’t: HTTP 200 Health Checking, URL rewriting

    View Slide

  20. Complications
    • SSH Host Key Synchronisation: do it once
    • SSH Client Key Synchronisation: do it every time
    • synchronised via redis pub-sub

    View Slide

  21. Does it scale
    in the backend?

    View Slide

  22. IV. Backing services
    Treat backing services as attached resources

    View Slide

  23. View Slide


  24. Redis

    MySQL

    File System
    GitLab
    * git repositories
    * user generated
    attachments / avatars

    View Slide

  25. View Slide

  26. GitLab Geo
    • introduced in GitLab 8.5 EE
    • 1 Master N Slave Replication
    • achieves A-P in C-A-P theorem
    • no disaster recovery
    • no sharing

    View Slide

  27. HTTP
    80/443
    SSH
    22
    nginx
    ssh2http
    routing via key
    namespace/repo_name
    GitLab shard
    FS shard
    GitLab shard
    FS shard
    GitLab shard
    FS shard

    View Slide

  28. GitLab Sharding
    • Introduces Sidekiq sharing as well
    • Introduces many changes to the application
    layer as well

    - need to have super user authentication

    - need to eliminate every page with requests
    across shards (e.g. admin page of repo sizes)
    • Tedious changes on the application level.

    View Slide

  29. How to deal with FS?

    Hardware Network-Attached Storage?

    Software Network-Attached Storage?

    Remote Procedure Calls to FS shards?

    Kill it?

    View Slide

  30. • Hard-NAS: Alibaba has non-IOE policies.
    • Soft-NAS: Alibaba does not have it yet.
    • RPC: GitRPC? Good. GitHub does that.
    • Kill FS: Use the cloud. Try something new!

    View Slide

  31. by “cloud” we mean…
    • Amazon S3: Amazon Simple Storage Service
    • Alibaba OSS: Alibaba Object Storage Service

    View Slide

  32. libgit2 git grit
    • used in wiki’s
    • via gollum-lib
    • via gollum-grit_adapter
    • eliminate-able via

    gollum-rugged_adapter
    gitlab-rails

    View Slide

  33. gitlab-rails
    libgit2 git
    • via gitlab_git
    • via rugged
    • backend

    replace-able
    • via gitlab-shell
    • via gitlab-workhorse
    • via popen
    • backend

    hard-to-replace (FS)
    grit

    View Slide

  34. Basic Idea

    View Slide

  35. gitlab-workhorse
    gitlab-rails gitlab-shell
    git
    libgit2
    Cloud Based Backend


    grit

    View Slide

  36. Cloud Based Backend

    View Slide

  37. odb’s refdb
    • stored via OSS
    • locked via redis
    hi-priority
    lo-priority
    loose OSS store
    packed OSS store

    View Slide

  38. OSS refdb (read)

    View Slide

  39. OSS refdb (write)

    View Slide

  40. loose OSS store (write)

    View Slide

  41. loose OSS store (read)

    View Slide

  42. packed OSS store (write)

    View Slide

  43. packed OSS store (read)
    via HTTP “Range” header

    View Slide

  44. packed OSS store (read)

    View Slide

  45. Example
    • First byte of the name is 0x9f
    • IDX[8 + (0x9f - 1) * 4] == 0x0403 == 1027
    • IDX[8 + 0x9f * 4] == 0x0403 == 1029
    • Object No. 1027 ~ 1029
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

    View Slide

  46. Example
    • Binary search 1027 ~ 1029
    • Found at 8 + 4 * 256 + 1027 * 20 == 21572
    • Skip the rest total_num*(20+4) == 1628*24
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

    View Slide

  47. Example
    • IDX[8 + 4 * 256 + 1628*24 + 4 * 1027]
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D] == PACK[280621]

    View Slide

  48. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    E3 11100011
    1_______ => MSB 1 continue
    _110____ => type == 6 == OFS_DELTA
    ____0011 => length == 3
    3-bit type, (n-1)*7+4-bit length

    View Slide

  49. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D]
    01 00000001
    0_______ => MSB 0 break
    _0000001 => length += (1 << 4)
    final length == 19

    View Slide

  50. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D]
    AA 10101010
    1_______ MSB 1 continue
    _0101010 base offset == 42

    View Slide

  51. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    • PACK[0x0004482D]
    44 01000100
    0_______ MSB 0 break
    _1000100 offset == ((42+1)<<7)+68
    == 5572

    View Slide

  52. Example
    Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9
    offset == 5572
    push 0x0004482D into stack
    deal with (0x0004482D - 5572)
    push (0x0004482D - 5572) into stack

    root base

    View Slide

  53. Example
    SHA1 type size size-pack
    offset-
    pack
    depth base
    9fcf811e00fa469
    688943a9152c16d
    4ee90fb9a9
    blob 19 32 280621 4 6110c89446f2281
    e5db9b798a0fa02
    0fad6e63e1
    6110c89446f2281
    e5db9b798a0fa02
    0fad6e63e1
    blob 52 45 275049 3 3bbeff3fc22b75c
    1a26f4ab9b64449
    b33002aea5
    3bbeff3fc22b75c
    1a26f4ab9b64449
    b33002aea5
    blob 2935 1263 273786 2 a39920830904665
    6ecc01f7653c5d5
    b8905fc16e
    a39920830904665
    6ecc01f7653c5d5
    b8905fc16e
    blob 4686 1540 272246 1 e4e56117de8b3bd
    0bd899701da4712
    caee27c7d6
    e4e56117de8b3bd
    0bd899701da4712
    caee27c7d6
    blob 12635 3279 115703 0 -

    View Slide

  54. git → libgit2

    View Slide

  55. git fetch / clone
    • git upload-pack --advertise-refs

    (rewritten via libgit2)
    • git upload-pack

    (untouched)
    • git pack-objects

    (rewritten via libgit2 pack builder)

    View Slide

  56. git push (small data)
    • git upload-pack --advertise-refs

    (rewritten via libgit2)
    • git upload-pack

    (untouched)
    • ntohl(hdr.hdr_entries) < unpack_limit
    • git unpack-objects

    (modified via libgit2, writing to loose OSS store)

    View Slide

  57. git push (big data)
    • git upload-pack --advertise-refs

    (rewritten via libgit2)
    • git upload-pack

    (untouched)
    • ntohl(hdr.hdr_entries) >= unpack_limit
    • git index-pack

    (modified via libgit2, writing to packed OSS store)

    View Slide

  58. Naked Benchmark

    (no cache)

    View Slide

  59. Fixture
    • Repository: gitlab-ce
    • https://gitlab.com/gitlab-org/gitlab-ce.git
    • More than 200k objects
    • More than 100MB when packed

    View Slide

  60. git push
    • FS-based:

    6.27s user 1.72s system 14% cpu 53.299 total
    • Cloud-based:

    6.13s user 1.29s system 13% cpu 54.697 total

    View Slide

  61. git push (delta)
    • FS-based:

    0.09s user 0.07s system 5% cpu 3.059 total
    • Cloud-based:

    0.04s user 0.05s system 3% cpu 2.845 total

    View Slide

  62. git clone
    • FS-based:

    6.89s user 8.99s system 33% cpu 47.096 total
    • Cloud-based:

    7.08s user 8.12s system 20% cpu 1:14.12 total

    View Slide

  63. git fetch (delta)
    • FS-based:

    0.14s user 0.13s system 33% cpu 0.806 total
    • Cloud-based:

    0.09s user 0.10s system 1% cpu 16.019 total

    View Slide

  64. GET /namespace/repo/tree/
    master
    • FS-based:

    Executing action: show - 74.5 ms
    • Cloud-based:

    Executing action: show - 5877.7 ms

    View Slide

  65. GET /namespace/repo/tree/
    master/builds
    • FS-based:

    Executing action: show - 50.0 ms
    • Cloud-based:

    Executing action: show - 4547.0 ms

    View Slide

  66. Cache

    View Slide

  67. odb hamburger refdb
    • cached via redis
    hi-priority
    lo-priority
    loose OSS store
    packed OSS store
    loose FS cache
    packed FS cache

    View Slide

  68. loose FS cache
    • cache written when

    ntohl(hdr.hdr_entries) < unpack_limit

    in git-unpack-objects
    • when reading via loose OSS store

    View Slide

  69. packed FS cache
    • cache written when

    ntohl(hdr.hdr_entries) >= unpack_limit

    in git-index-pack
    • cache written in git-pack-objects

    View Slide

  70. redis refdb cache
    • cache written when read and cache-miss
    • cache expired when refdb got updated

    e.g. git-receive-pack

    View Slide

  71. Future Work

    View Slide

  72. • develop libgit2 backends for AWS S3
    • gitlab: favour libgit2, eliminate direct calls to git
    • gitlab: add settings to choose backends
    • gollum: use rugged as the default
    • libgit2: improve performance, e.g. pack builder

    View Slide

  73. https://github.com/pmq20

    View Slide