platform for LINE NAT Load Balancer VM / Baremetal DNS App engine (like heroku) And More… Image Repo Shared Filesystem Shared Filesystem Elasticsearch Container
provisioning took too long time • ~ 2 weeks for 1 VM provisioning • Verda solved – Open the private cloud with minimum API set to LINE developers • VM, Baremetal, Block Storage Start up (2016-2019) Expansion (2019-2021) New infra (2022-) App developers Infra team 1. Apply infra request WF 2. Consult the request details 4. Serve the configured infra VM VM Storage 3. Set up the infra App developers 1. Create a resource by API/GUI VM VM Storage Before Verda After Verda 5. Start to setup apps 2. Automatic provisioning 3. Start to setup apps Infra team HV a. Bulk resource management b. Administrate the Verda cloud
serve common OpenStack API to the developers ASAP – Focused on opening API to the developers • Minimum API set from OpenStack • Develop lots of LINE original API and components – Baremetal API – API filters • Culture changes – Verda changed infra resource characteristic from a facility to on- demand resource and API manageable. • App team view: Less communication with Infra team to tell the infra demands • Infra team view: Bulk facility management Start up (2016-2019) Expansion (2019-2021) New infra (2022-)
install common middleware set by themselves. • Infra preparation: 2 weeks → 10 mins • Middleware preparation: no change • Verda solved – Open managed middleware service API • Kubernetes, MySQL, Redis, and etc Start up (2016-2019) Expansion (2019-2021) New infra (2022-) App developers 1. Create infra resources by API/GUI VM VM Storage 2. Automatic provisioning DBA 3. Request DB setup 4. Install DB middleware 5. Start monitoring and DB administration 6. Serve the DB App developers 1. Create DB resource by API/GUI 2. Automatic DB resource provisioning DBA VM VM a. Start monitoring and DB administration MySQL cluster
triggered rapid growth of the OpenStack scale. • 1,400 hypervisors to 6,000 hypervisors in 2 years • The rapid growth required OpenStack deployment topology changes and tool change – Some OpenSource OpenStack API plugin was not matured in the large scale cluster. • Kubernetes cinder csi plugin • Ansible Keystone user management plugin • Culture changes – LINE developers can focus on developing service applications. Start up (2016-2019) Expansion (2019-2021) New infra (2022-)
really rely on development team’s Verda knowledge • Standard Infra management tool can’t use some Verda API • Some teams can use sophisticated infra management tools to Verda • Others rely on traditional manual operation. Start up (2016-2019) Expansion (2019-2021) New infra (2022-) 1. Develop Verda original API set modules in the tool 2. Develop application infra information VM VM MySQL cluster VM VM Kubernetes cluster PM PM Verda App developers Baremetal API Resource Provisioning tools
stack in the Verda to follow the de-facto standard API set Start up (2016-2019) Expansion (2019-2021) New infra (2022-) 1. Develop application infra information VM VM MySQL cluster VM VM Kubernetes cluster PM PM Verda App developers Resource Provisioning tools libvirt driver baremetal driver OpenStack standard API
OpenStack API philosophy • Unified API to manage some type of backend resources • Renovated some API implementations • Culture change – Solve tool silo in the application development team Start up (2016-2019) Expansion (2019-2021) New infra (2022-)
realized • Change Infra communication style • Change middleware management style • Change team knowledge gap OpenStack challenges • Open the IaaS API • Support 500% rapid growth in 3 years • Straiten API stack
application developers to host their applications – VM: we supports the OpenStack-based IaaS management system – Baremetal: we supported an in-house server management system Verda App developers libvirt driver baremetal management OpenStack standard API Baremetal API Resource Provisioning tools
systems, – Developers need to understand completely different two types of API to automate VM and Baremetal operations – Verda operators always need to develop the same functionality for both OpenStack and the in-house management system. This increased our operation cost. • We started a new project to improve baremetal server management from 2020.
requirements for the developers and Verda operators • For application developers – To provide unified APIs for multiple resources – To provide the same functionalities for both VM and Baremetal server – To provide private stock management system
– To reduce development, maintenance, and management cost as much as possible – Re-use existing strong hardware layer management systems • We already had hardware management systems for IPMI operation and OS installation which were distributed to multiple data centers in multiple regions
Nova compute driver for baremetal server management – We decided to develop Nova’s compute driver rather than using OpenStack Ironic • Implemented baremetal server stock management mechanisms for Nova • Provided a feature to distribute baremetal server for HA purpose • Prepared CI/CD pipeline to deploy nova compute services
driver developed by LINE • The driver communicates with the physical server management system to build up baremetal servers • Basic operations like create, delete, rebuild etc. are supported • Allow Verda users to create a new baremetal server from their pre-assigned stock VM VM MySQL cluster VM VM Kubernetes cluster PM PM Verda App developers Resource Provisioning tools libvirt driver baremetal driver OpenStack standard API
baremetal instance via dashboard 2. Nova API receives a request 3. Nova-scheduler picks new Nova compute to launch instance 4. Baremetal nova-compute makes a request to IPMI management to run PXE Boot for baremetal server 5. Nova-compute creates OS install task and wait until the completion Summary of baremetal instance creation flow
and Baremetal servers require High availability based on location • Verda supports – Multi-Regions – Multi-Availability Zones – Server rack level failure domain Region1 Region2 AZ1 AZ2 AZ3 Rack1 Rack2 Rack3 Rack1 Rack2 Rack1 Rack2 Available via HA Group Request from stock WF https://logmi.jp/tech/articles/327491
– Allow users to deploy baremetal servers with HA based on failure domains – Supported policies Policies Summary of the policy Hard Multiple servers must be distributed to multiple failure domains Soft Multiple servers will be distributed to multiple failure domains as much as possible None Skip HA group
git 2. Argo CD watches and sync new change to Verda Kubernetes 3. During deployment job, ansible k8s module deploy config map and deployment 4. Ansible OpenStack module register servers to host aggregate via nova-api Deployment flow