Verda is the private cloud platform for LINE • This platform hosts many of LINE's services • With a high level of abstraction of compute and network resources, we are continuously improving to enable LINE developers to use a reliable service infrastructure at a low cost 99,000+ Virtual Machines 45,000+ Baremetals 7,600+ Hypervisors
of Load Balancing ref. Software engineering that supports LINE-original LBaaS https://speakerdeck.com/line_devday2019/software-engineering-that-supports-line-original-lbaas Server Server Server Req Req Req Req Server Server Server Req Req Req Req Layer4 Load Balancer (L4LB) Layer7 Load Balancer (L7LB) a.k.a. Reverse Proxy
Certs Management System L7LB API Developer Operator Security team 2. purchase 1. request to purchase Certi fi cation Authority audit 3. sync certificates 4. create LB 4. create • Developers can request to purchase certificates & create HTTPS L7LB after that • Security team audits certificates on Certs Management System 👎 There is only this flow even if DV certificates were sufficient on developers.
👎 LB Admin has to do the following procedures manually to update existing L7 LBs. • LB Admin edits configs of each LB • LB Admin notifies to users (developers) • Users deploy by the above config • or LB Admin deploys forced by one L7LB API Developers LB for Developer A LB for Developer B LB for Developer C LB Admin ʜ 2. notify 1. create config 3. deploy config 3. force deploy
on the user Project as VM Instance. • Based on Rancher • And some custom features (Etcd-backup, cluster upgrade, …) • Supported cloud-controller-manager for Verda • Users can use “type LoadBalancer Service” and so on • Possible to pre-install some applications as “add-on” Verda Kubernetes Service (VKS)
Node Port Worker Worker Worker Verda ccm Pod L4LB Pod for Ingress App Pod 👎 Users need to prepare L7LB in the user cluster 👎 Users need to manage certificates for TLS create
L7LB does not exist. • Develop Ingress Controller Certificates activation/renewal flow is painful. • Develop a mechanism to activate/renew certificates automatically Verda L7LB is not L7-aware. • Develop to extend L7LB API & DB schema
System & L7LB certificates API by users role must be prohibited • Even if we have to register certificates to ones • We want not to make users manage certificates • Because unnecessary downloads can cause leaks github.com/cert-manager/cert-manager
(on each user cluster) and server (on shared environment) • Controllers have the responsibility for reconciling external services to desired state • Server has the responsibility for • requesting to issue certificate (DNS-01 challenge on ACME) • registering to Certs Management System • registering to L7LB certificate API (Admin API) • Don’t place certificates on the user clusters
Project Worker Worker Worker Controller App Verda DNS Developer 3. RecordSet controller creates A Record with a dummy address to Verda DNS in the Developer’s Project Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 watch create A record
Project Worker Worker Worker Controller App Verda DNS Developer 4. ManagedCertificate controller requests CertificateServer to get a certificate Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 Certificate Server C-plane side watch get example.com
Project Worker Worker Worker Controller App Verda DNS Developer Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 Certificate Server C-plane side 5. Authentication & Authorization of requests • authenticate: request to Identity API • authorize: check the existence of A records corresponding to the requested FQDN get A record get example.com
Project Worker Worker Worker Controller App Verda DNS Developer Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 Certificate Server C-plane side 6. Check existence of the certificate (go to step 11 if a certificate exists and hasn’t revoked) Certs Management System get get example.com get (User API)
Project Worker Worker Worker Controller App Verda DNS Developer Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 Certificate Server C-plane side 7. ManagedCertificate controller requests CertificateServer to issue a certificate (CertificateServer does authn/authz too) get A record request challenge (async API) watch
Project Worker Worker Worker Controller App Verda DNS Developer Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 _acme-challenge.example.com. IN TXT <snip> Certificate Server C-plane side 8. ACME DNS-01 Challenge • Get token from Let’s Encrypt • POST TXT record to Verda DNS • Get the certificate issued by Let’s Encrypt create TXT record verify
Project Worker Worker Worker Controller App Verda DNS Developer Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 Certificate Server C-plane side 9. Register to some services • Certs Management System • L7LB certificates API register (Admin API) register Certs Management System
Project Worker Worker Worker Controller App Verda DNS Developer Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 192.0.2.1 Certificate Server C-plane side 10. ManagedCertificate controller checks existence of a certificate periodically. When it detects to complete challenge, it update status of ManagedCertificate object get Certs Management System update status
Project Worker Worker Worker Controller App Verda DNS Developer 11. Ingress controller detects the completion of certificate challenge from the status of ManagedCertificate object. Then Ingress controller creates L7LB. Ingress ManagedCertificate RecordSet Sequence for creating resources watch watch create L7LB (User API) example.com. IN A 192.0.2.1
Project Worker Worker Worker Controller App Verda DNS Developer 13. RecordSet controller detects to update status of Ingress object and updates A Record on Verda DNS. Then End users are able to access to App. Ingress ManagedCertificate RecordSet Sequence for creating resources example.com. IN A 198.51.100.1 update A record watch Address: 198.51.100.1 End users To https://example.com/
Server as shared API from controllers • Grant administrator role to only Certificate Server • Possible to manage rate-limit for Let’s Encrypt in one place • Don’t place certificates on the user clusters • Users never have to manage certificates manually • Separate controllers by each responsibility • For testability: since integration tests using envtest are performed for each controller, separating the controllers by the responsibility will simplify testing.
Oct. Nov. Apr. ʙʙ ʙʙ I joined Shimizu-san joined Development Release Preparation QA task force task force Release to Verda Dev Build VT extend L7LB remaining implementation Offline Meeting To focus on development tasks during 1 sprint Build an environment where a set of Verda runs as a VM on one machine * deployment preparation * monitoring preparation *Etc... 2022 Start Project Achieved by 2 Project members
tasks • Decrease interruptions to minimize • Task force members should attend only weekly meeting • How to proceed the task force • Define the MVP to be completed in 1 sprint (2 weeks) in advance • Report the progress on the half-of-sprint weekly meeting • Make a course correction based on comments • Demonstrate the MVP on the end-of-sprint weekly meeting
Implement “Ingress controller” to create HTTP LB by requesting L7LB API • Implement “Certificate Server” to retrieve certificates and register one to Certs Management System • Implement “ManagedCertificate controller” for UI on Certificate Server • 2nd task force • Implement all controllers that do everything from retrieving certificates to creating HTTPS LBs 👍$POUJOVPVTJODSFNFOUTBMMPXFEBEFUBJMFESFWJFX PGUIFTZTUFNBSDIJUFDUVSFFBDIUJNF
to prepare the Verda Component which a development target depends on. • Mock isn’t enough on confirmation of behavior. • If using a shared staging environment, It can only be worked by 1 person. • Virtual Testbed: Build all Verda services by VMs on 1 physical machine • Possible to use as private development environment. • Possible to build by executing scripts (around 1 hour) • Anyone can build at all times
we needed to build VT by the following reasons • Ingress Controller requests L7LB API • We must make L7LB allow L7-aware routing 👍 Ready for development in about two weeks by using the VT
shared environment. • Where to communicate with User VM and Admin APIs • How to deploy Ingress controller to user cluster? » VKS supports an add-on feature realized by “vks-addon-controller”.
based on addon-controller’s Reconciliation Loop • Possible to update addons by VKS Team • addon-controller generates manifests per environment (eg. K8s version) • Base-manifests are written by jsonnet vks-addon-controller 40 C-plane side addon-controller User Projects addon A addon B addonC manifests … base-manifests manifests
Support Validation for applied manifests • CEL became Beta on K8s v1.25 • Support to use Secret on User Cluster as certificates • Support GatewayAPI • Support Pod-native Load Balancing
Support Validation for applied manifests • CEL became Beta on K8s v1.25 • Support to use Secret on User Cluster as certificates • Support GatewayAPI • Support Pod-native Load Balancing For user experience For monitoring For specific service requirements
packets are DNAT-ed by proxy (eg. iptables) on Worker and forwarded to Pod on some Worker. • Especially, ad-technology is strict on latency • Need to be balanced to Pods directly by L7LB • cf. GKE - Container-native load balancing Future works: Pod-native Load Balancing 44 App to Worker address User Projects Requirements on specific services Node Port Worker Worker Worker to Pod address by DNAT
Develop CNI Plugin if needed • Add feature on Ingress Controller • Watch EndpointSlice for register Pod address to L7LB • Update Pod status for ReadinessGate • Prepare the approval workflow • Because Pod addresses is limited Future works: Pod-native Load Balancing 45 App Worker Worker Worker to Pod address directly BGP Speaker advertise to DC network User Projects What’s needed to resolve
Automated to activate/renew certificates • Supported L7-aware load balancing • Released in about 5 months • Some future works • Let us know if you’re interested!