Slide 1

Slide 1 text

Masahito Muroi / Senior SWE, Manager / LINE Mitsuhiro Tanino / Senior SWE / LINE #ossummit LINE’s Journey; Road to 4 Million Cores in the Private Cloud

Slide 2

Slide 2 text

#ossummit Ourself Masahito Muroi • Senior Software Engineer / Manager • Working for LINE over 3 years Mitsuhiro Tanino • Senior Software Engineer • Working for LINE over 3 years

Slide 3

Slide 3 text

#ossummit Verda Private Cloud • Verda is the private cloud platform for LINE NAT Load Balancer VM / Baremetal DNS App engine (like heroku) And More… Image Repo Shared Filesystem Shared Filesystem Elasticsearch Container

Slide 4

Slide 4 text

#ossummit Verda & LINE infra scale Virtual Machine 100,000+ Baremetal server 46,000+ Hypervisor 7,600+ All Physical Servers 70,000+ Peak of User Traffic 3Tbps+ As of Sep. 2022

Slide 5

Slide 5 text

#ossummit Service stack in the Verda Identity VM/Baremetal Network Image DNS Block Storage Object Storage Shared FS Kubernetes MySQL Redis Function Elasticsearch Kafka CI/CD PIPE Load Balancer NAT PaaS Managed Service IaaS

Slide 6

Slide 6 text

#ossummit Start up (2016-2019) Expansion (2019-2021) New infra (2022-)

Slide 7

Slide 7 text

#ossummit Start up period • LINE Infra Problems – Infra provisioning took too long time • ~ 2 weeks for 1 VM provisioning • Verda solved – Open the private cloud with minimum API set to LINE developers • VM, Baremetal, Block Storage Start up (2016-2019) Expansion (2019-2021) New infra (2022-) App developers Infra team 1. Apply infra request WF 2. Consult the request details 4. Serve the configured infra VM VM Storage 3. Set up the infra App developers 1. Create a resource by API/GUI VM VM Storage Before Verda After Verda 5. Start to setup apps 2. Automatic provisioning 3. Start to setup apps Infra team HV a. Bulk resource management b. Administrate the Verda cloud

Slide 8

Slide 8 text

#ossummit Start up period • OpenStack challenges – Start to serve common OpenStack API to the developers ASAP – Focused on opening API to the developers • Minimum API set from OpenStack • Develop lots of LINE original API and components – Baremetal API – API filters • Culture changes – Verda changed infra resource characteristic from a facility to on- demand resource and API manageable. • App team view: Less communication with Infra team to tell the infra demands • Infra team view: Bulk facility management Start up (2016-2019) Expansion (2019-2021) New infra (2022-)

Slide 9

Slide 9 text

#ossummit Expansion period • Problems – LINE developers had to install common middleware set by themselves. • Infra preparation: 2 weeks → 10 mins • Middleware preparation: no change • Verda solved – Open managed middleware service API • Kubernetes, MySQL, Redis, and etc Start up (2016-2019) Expansion (2019-2021) New infra (2022-) App developers 1. Create infra resources by API/GUI VM VM Storage 2. Automatic provisioning DBA 3. Request DB setup 4. Install DB middleware 5. Start monitoring and DB administration 6. Serve the DB App developers 1. Create DB resource by API/GUI 2. Automatic DB resource provisioning DBA VM VM a. Start monitoring and DB administration MySQL cluster

Slide 10

Slide 10 text

#ossummit Expansion period • OpenStack challenges – Opening managed services triggered rapid growth of the OpenStack scale. • 1,400 hypervisors to 6,000 hypervisors in 2 years • The rapid growth required OpenStack deployment topology changes and tool change – Some OpenSource OpenStack API plugin was not matured in the large scale cluster. • Kubernetes cinder csi plugin • Ansible Keystone user management plugin • Culture changes – LINE developers can focus on developing service applications. Start up (2016-2019) Expansion (2019-2021) New infra (2022-)

Slide 11

Slide 11 text

#ossummit New infra period • Problems – Infra management skills really rely on development team’s Verda knowledge • Standard Infra management tool can’t use some Verda API • Some teams can use sophisticated infra management tools to Verda • Others rely on traditional manual operation. Start up (2016-2019) Expansion (2019-2021) New infra (2022-) 1. Develop Verda original API set modules in the tool 2. Develop application infra information VM VM MySQL cluster VM VM Kubernetes cluster PM PM Verda App developers Baremetal API Resource Provisioning tools

Slide 12

Slide 12 text

#ossummit New infra period • Verda solved – Straiten API stack in the Verda to follow the de-facto standard API set Start up (2016-2019) Expansion (2019-2021) New infra (2022-) 1. Develop application infra information VM VM MySQL cluster VM VM Kubernetes cluster PM PM Verda App developers Resource Provisioning tools libvirt driver baremetal driver OpenStack standard API

Slide 13

Slide 13 text

#ossummit New infra period • OpenStack challenges – Revisit the OpenStack API philosophy • Unified API to manage some type of backend resources • Renovated some API implementations • Culture change – Solve tool silo in the application development team Start up (2016-2019) Expansion (2019-2021) New infra (2022-)

Slide 14

Slide 14 text

#ossummit Start up (2016-2019) Expansion (2019-2021) New infra (2022-) Verda realized • Change Infra communication style • Change middleware management style • Change team knowledge gap OpenStack challenges • Open the IaaS API • Support 500% rapid growth in 3 years • Straiten API stack

Slide 15

Slide 15 text

#ossummit Lesson learned from the 7 years journey • Culture change made drastic improvements • Technical bottleneck depends on the Infra scale • Open Source eco system has strong power

Slide 16

Slide 16 text

Introduction of Baremetal server management in Verda

Slide 17

Slide 17 text

#ossummit Background • Verda provides VM and Baremetal server for application developers to host their applications – VM: we supports the OpenStack-based IaaS management system – Baremetal: we supported an in-house server management system Verda App developers libvirt driver baremetal management OpenStack standard API Baremetal API Resource Provisioning tools

Slide 18

Slide 18 text

#ossummit Background • However, due to providing two different management systems, – Developers need to understand completely different two types of API to automate VM and Baremetal operations – Verda operators always need to develop the same functionality for both OpenStack and the in-house management system. This increased our operation cost. • We started a new project to improve baremetal server management from 2020.

Slide 19

Slide 19 text

#ossummit Requirements for baremetal server management • We had multiple requirements for the developers and Verda operators • For application developers – To provide unified APIs for multiple resources – To provide the same functionalities for both VM and Baremetal server – To provide private stock management system

Slide 20

Slide 20 text

#ossummit Requirements for baremetal server management • For Verda operators – To reduce development, maintenance, and management cost as much as possible – Re-use existing strong hardware layer management systems • We already had hardware management systems for IPMI operation and OS installation which were distributed to multiple data centers in multiple regions

Slide 21

Slide 21 text

#ossummit What can be done to archive requirements • Developed Nova compute driver for baremetal server management – We decided to develop Nova’s compute driver rather than using OpenStack Ironic • Implemented baremetal server stock management mechanisms for Nova • Provided a feature to distribute baremetal server for HA purpose • Prepared CI/CD pipeline to deploy nova compute services

Slide 22

Slide 22 text

#ossummit Introduction to baremetal compute driver • What is baremetal driver • Architecture • Deep dive to features

Slide 23

Slide 23 text

#ossummit What is baremetal compute driver • OpenStack Nova’s compute driver developed by LINE • The driver communicates with the physical server management system to build up baremetal servers • Basic operations like create, delete, rebuild etc. are supported • Allow Verda users to create a new baremetal server from their pre-assigned stock VM VM MySQL cluster VM VM Kubernetes cluster PM PM Verda App developers Resource Provisioning tools libvirt driver baremetal driver OpenStack standard API

Slide 24

Slide 24 text

#ossummit New architecture 1. A user requests to create new baremetal instance via dashboard 2. Nova API receives a request 3. Nova-scheduler picks new Nova compute to launch instance 4. Baremetal nova-compute makes a request to IPMI management to run PXE Boot for baremetal server 5. Nova-compute creates OS install task and wait until the completion Summary of baremetal instance creation flow

Slide 25

Slide 25 text

#ossummit Verda Dashboard - VM / Baremetal server management • "Instance" management view for developers

Slide 26

Slide 26 text

#ossummit Deep dive to features • Stock management • HA group support • Deployment procedure of baremetal driver

Slide 27

Slide 27 text

#ossummit Stock management • Stock management with host aggregates – Public and private – Stocks are registered to nova’s host aggregate

Slide 28

Slide 28 text

#ossummit Stock management • Private Stock example $ openstack aggregate show TEST_N3.small.metal.uuid12345 +-------------------+----------------------------------------------------------------------------------------+ | Field | Value | +-------------------+----------------------------------------------------------------------------------------+ | availability_zone | None | | created_at | 2022-04-11T01:55:52.000000 | | deleted | False | | deleted_at | None | | hosts | server_zzzz1, server_zzzz2, server_zzzz3, server_zzzz4, server_zzzz5, server_zzzz6 … | | id | 317 | | name | TEST_N3.small.metal.uuid12345 | | properties | TEST_N3.small.metal='true', baremetal='true', filter_flavor_id=‘1234567’ | | updated_at | None | +-------------------+----------------------------------------------------------------------------------------+

Slide 29

Slide 29 text

#ossummit Stock management • Workflow system for private stock management • Developers create WF then Verda operator check if it is suitable

Slide 30

Slide 30 text

#ossummit Deep dive to features • Stock management • HA group support • Deployment procedure of baremetal driver

Slide 31

Slide 31 text

#ossummit HA Group support with failure domain • Production VMs and Baremetal servers require High availability based on location • Verda supports – Multi-Regions – Multi-Availability Zones – Server rack level failure domain Region1 Region2 AZ1 AZ2 AZ3 Rack1 Rack2 Rack3 Rack1 Rack2 Rack1 Rack2 Available via HA Group Request from stock WF https://logmi.jp/tech/articles/327491

Slide 32

Slide 32 text

#ossummit HA Group support with failure domain • HA Group – Allow users to deploy baremetal servers with HA based on failure domains – Supported policies Policies Summary of the policy Hard Multiple servers must be distributed to multiple failure domains Soft Multiple servers will be distributed to multiple failure domains as much as possible None Skip HA group

Slide 33

Slide 33 text

#ossummit HA Group support with failure domain • Verda user can select HA Group policy based on the requirements of services

Slide 34

Slide 34 text

#ossummit HA Group support with failure domain HA Group “Hard” policy distributed 5 baremetal servers to different failure domains(=Server racks) Hard

Slide 35

Slide 35 text

#ossummit Deep dive to features • Stock management • HA group support • Deployment procedure of baremetal driver

Slide 36

Slide 36 text

#ossummit Deployment procedure 1. Operator registers new server information to git 2. Argo CD watches and sync new change to Verda Kubernetes 3. During deployment job, ansible k8s module deploy config map and deployment 4. Ansible OpenStack module register servers to host aggregate via nova-api Deployment flow

Slide 37

Slide 37 text

No content