Going planet-scale with Cloud Native Technologies

Slide 1

Slide 1 text

Abubakar Siddiq Ango, GitLab @sarki247 Lagos Going Planet-Scale with Cloud Native technologies. A GitLab Story

Slide 2

Slide 2 text

About Me - Based in Bauchi, Nigeria - Support Engineering at GitLab BV - Executive Director, Uplift Nigeria (uplift.ng) - Lead Organizer, GDG Bauchi & Google Cloud Developer Community Bauchi - CTO, GladePay.com

Slide 3

Slide 3 text

In 2015 - 20,000+ Users - 100,000+ Hosted Repositories - 2 servers, 1 Active, 1 Backup - Server model: HP DL180 G6 (reconditioned, this model was introduced in 2009) - Processors: 2x X5690 (24 cores in total) - 32GB RAM - 12x 2TB HDDs, (2 for root volume in RAID 1, 10 for storage in RAID 10, ext4 filesystem)

Slide 4

Slide 4 text

In 2016, on Azure... - Running GitLab.com as an application - 5 HAProxy load balancers that are handling GitLab.com HTTP, HTTPS, and SSH - 2 HAProxy load balancers that are handling "alternative SSH" (altssh.GitLab.com) so they do redirection from 443 to 22 - 2 HAProxy load balancers that are handling https://pages.gitlab.io HTTP and HTTPS - 20 workers running GitLab EE application stack (Nginx, Workhorse, Unicorn + Rails, Redis + Sidekiq) - 2 NFS servers for the storage - 2 Redis servers - 2 PostgreSQL servers - 3 Elasticsearch servers - 6 of Azure's "Availability Sets": 3 for load balancers, 1 for Redis HA, 1 for PostgreSQL HA, and 1 for Elasticsearch HA. - 3 servers for GitLab Runners in autoscale mode. See: https://about.gitlab.com/2016/04/29/look-into-gitlab-infrastructure/ With Build Hosts for Shared Runners, between 60 to 200 servers are running at a time.

Slide 5

Slide 5 text

In 2016 With over 2,000 new repos being created during peak hours, and CI runners requesting new builds 3,000,000 times per hour, We built a CephFS cluster to tackle both the capacity and performance issues of using NFS appliances. https://about.gitlab.com/2016/09/26/infrastructure-update/

Slide 6

Slide 6 text

Late 2016, We almost went bare metal... With latency issues using CephFS, we thought of going bare metal: https://about.gitlab.com/2016/11/10/why-choose-bare-metal/ Came up with a server purchase proposal: https://about.gitlab.com/2016/12/11/proposed-server-purchase -for-gitlab-com/ And shared it on YC Hacker News: https://news.ycombinator.com/item?id=13153031 64 nodes with 1TB memories (using 128 GB DIMMS) and 20Gbps of bandwidth, allowing 1.4 PB of raw storage at a replication factor of 3, this is 480TB of usable storage using CephFS

Slide 7

Slide 7 text

...then we took a step back after listening to the community: https://about.gitlab.com/2017/03/02/why-we-are-not-leaving-t he-cloud/ We decided to do 2 things: ● We spread all our storage into multiple NFS shards and dropped CephFS from our stack. ● We created Gitaly so that we can stop relying on NFS for horizontal scaling and speed up Git access through caching.

Slide 8

Slide 8 text

“We want to scale intelligently and build great software; we don’t want to be an infrastructure company. We are embracing and are excited about solving the challenge of scaling GitLab.com on the cloud, because solving it for us also solved it for the largest enterprises in the world using GitLab on premise.”

Slide 9

Slide 9 text

In 2018 We decided to move from Azure to GCP, mainly to improve the performance and reliability of GitLab.com and we believe Kubernetes is the future https://about.gitlab.com/2018/06/25/moving-to-gcp/

Slide 10

Slide 10 text

https://about.gitlab.com/handbook/engineering/infrastructure/production-architecture/ GitLab.com on GCP

Slide 11

Slide 11 text

...but we need more

Slide 12

Slide 12 text

https://docs.gitlab.com/ee/development/architecture.html

Slide 13

Slide 13 text

Cloud Native “Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.”

Slide 14

Slide 14 text

Why Cloud Native ● Right-sized Capacity ● Speed ● Reliability ● Collaboration ● Continuous Delivery ● Automatic Scalability ● Rapid Recovery

Slide 15

Slide 15 text

Key components of Cloud Native ● Microservices ● CI/CD Toolset ● Containers e.g. Docker, Containerd ● Orchestrators e.g. Kubernetes ● Service Meshes e.g. Istio ● And others

Slide 16

Slide 16 text

Microservices Structuring an application, as a collection of loosely coupled services, that implement business capabilities.

Slide 17

Slide 17 text

Continuous Integration / Deployment Test, Build, Deploy your microservices. ● GitLab CI ● Circle CI ● Jenkins ● TeamCity ● And so on.

Slide 18

Slide 18 text

Containers A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. Containers enables microservices.

Slide 19

Slide 19 text

Orchestrators A system for automating deployment, scaling, and management of containerized applications. E.g. Kubernetes.

Slide 20

Slide 20 text

Service Mesh A service mesh is a configurable infrastructure layer for a microservices application. It makes communication between service instances flexible, reliable, and fast. The mesh provides service discovery, load balancing, encryption, authentication and authorization, support for the circuit breaker pattern, and other capabilities. https://www.nginx.com/blog/what-is-a-service-mesh/

Slide 21

Slide 21 text

At GitLab See: https://gitlab.com/charts/gitlab/blob/master/doc/architecture/architecture.md

Slide 22

Slide 22 text

Q&A . Lagos

Slide 23

Slide 23 text

Abubakar Siddiq Ango, GitLab @sarki247 Lagos Thank you! Slide at http://bit.ly/devfestlagos18-cloudnative