to communicate 3. Difficult to scale for provision a. For 2000+ engineer Problems/Concerns: 1. Need to ask for Infrastructure Department - Change Infrasture - More server 2. Need to predict (difficult) - How many servers - When need servers 3. Tend to have unnecessary amount of server Dev Team 2 Give us a server Provide a server 3 month - Buy - Register - Setup Dev Team 1 Infrastructure Department
cost communication 3. No provisioning cost 4. Optimize usage of resource But 1. Need Software development Improved 1. Get Infrastructure Resource without human interaction - Capability of - Automate Resource Allocation - Automate Resource Deallocation - No need prediction - No need unnecessary resources Provide a server Dev Team 1 Infrastructure Department Private Cloud Give us a server Just few sec Communicate by API Maintain
ToR Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Aggregation ToR OpenStack database OpenStack database OpenStack API OpenStack API Core Aggregation Datacenter Rack • Knowledge of Networking ◦ Design/Plan whole DC Network • Knowledge of Operation for Large Product ◦ Build Operation Tool which is not for specific software ◦ Consider User Support • Knowledge of Server Kitting ◦ Communicate procurement department • Knowledge of OpenStack Software ◦ Design deployment of OpenStack ◦ Deploy OpenStack ◦ Customize OpenStack ◦ Troubleshooting ▪ OpenStack Component ▪ Related Software
stable What we have done 1. Legacy System Integration 2. Bring New Network Architecture into OpenStack Network 3. Maintain Customization for OSS while keep to catch up upstream What we will do 1. Scale Emulation Environment 2. Internal Communication Visualizing/Tuning 3. Containerize OpenStack 4. Event Hub as a Platform
stable What we have done 1. Legacy System Integration 2. Bring New Network Architecture into OpenStack Network 3. Maintain Customization for OSS while keep to catch up upstream What we will do 1. Scale Emulation Environment 2. Internal Communication Visualizing/Tuning 3. Containerize OpenStack 4. Event Hub as a Platform
cloud, We have many Company-wide Systems for some purpose CMDB Monitoring System Server Login Authority Management IPDB Server Register Spec, OS, Location.. Register IP address, Hostname Register server as a monitoring target Register acceptable user of server setup Ask for new server Infra Dev
Creation” is completed without Infrastructure department interruption. Thus Private Cloud itself should register new server Private Cloud Configuration Management CMDB Monitoring System Server Login Authority Management IPDB Server Create new server Dev Register
neutron-server neutron-dhcp-agent neutron-linuxbridge-agent OSS implementation neutron-metadata-agent Expect to share L2 Network We want all vms not to share l2 network neutron-custom-agent Replace New
many OpenStack Components • Previously we just customize it after customize again and again OpenStack VM (Nova) Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) VM (Nova) customize commit for A customize commit for C customize commit for A customize commit for B customize commit for A It’s difficult for us to take specific patch away from our customized OpenStack. Specific version upstream LINE version forked
for A customize commit for C customize commit for A customize commit for B customize commit for A Specific version upstream LINE version forked patch for A patch for B patch for C Base Commit ID VM (Nova) Specific version maintain by git maintain by git • Don’t fork/Stop to fork • Just maintain only patch file in git => easily take patch out than before
far) • Develop user faced feature ◦ Keep same experience as before (legacy system) ◦ Support new architecture • Daily operation ◦ Predictable ◦ Unpredictable based on trouble Day2 (from now) • Enhance Operation • Optimize Development • Reduce daily operation ◦ Predictable ◦ Unpredictable
stable What we have done 1. Legacy System Integration 2. Bring New Network Architecture into OpenStack Network 3. Maintain Customization for OSS while keep to catch up upstream What we will do 1. Scale Emulation Environment 2. Internal Communication Visualizing/Tuning 3. Containerize OpenStack 4. Event Hub as a Platform
view without preparing same number of Hypervisor • Database Access • RPC over RabbitMQ Future Challenge 1: Scale Emulation Environment They are control plane specific load. We can use this environment for tuning of control plane in OpenStack
communication among OpenStack each software Authentication (Keystone) VM (Nova) Network (Neutron) Microservice • Restful API (between component) • RPC over Messaging Bus (inside component) Restful API Restful API Restful API neutron-agent neutron-server RPC
(Neutron) Microservice Restful API Restful API Restful API neutron-agent neutron-server RPC Anytime this can be broken Communication can be failed. - Because of scale - Because of in-proper config Error sometimes got propagated from one to other
(Neutron) Microservice Restful API Restful API Restful API neutron-agent neutron-server RPC Anytime this can be broken Communication can be failed. - Because of scale - Because of in-proper config Error sometimes got propagated from one to other 1. Very difficult to troubleshoot this kind of issue because of - Error got propagated from one to another - Log is not always enough information - Log is only shown when something happen 2. Sometimes problem can be predicted by some metrics - how many rpc got received - how many rpc waited for reply
of packaging tool like RPM ◦ Dependency between packages ◦ Configuration for new file => We need to build RPM everytime we changed the code • Impossible to run different version of OpenStack on same server ◦ Dependency of common library of OpenStack => we actually deployed much more control plane servers than we actually need • Lack of observability for all softwares running on control plane ◦ No way to identify which part is to install depended library and which part is to install our software in deployment script (ansible, chef…) ◦ Deployment script doesn’t take care software running after deployed ◦ We can not notice if some developer run something temporally script
Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) Loadbalancer L4LB L7LB Kubernetes (Rancher) Storage Block Storage (Ceph) Object Storage (Ceph) Database Search/Analytics Engine (ElasticSearch) RDBMS (Mysql) KVS (Redis) Messaging (Kafka) Function (Knative) Baremetal Operation Tools Depending on others Some component/operation script want to do something When User(actually project) in Keystone is deleted When VM is created When RealServer is added to Loadbalancer
important event of own component Subscribe just interested events Network Component This component can do something when interested event happened This component don’t have to consider who this component need to work with Messaging bus (RabbitMQ)
important event of own component Subscribe just interested events Network Component This component can do something when interested event happened This component don’t have to consider who this component need to work with Messaging bus This mechanism allow us to extend Private Cloud (Microservice) without changing existing code for future
notification logic has been already implemented in OpenStack but... Authentication Component (Keystone) Messaging bus (RabbitMQ) VM Component (Nova) Operation ScriptA Operation ScriptB L7LB Kubernetes Publish Event Subscribe Event Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Business logic Business logic Business logic Business logic
notification logic has been already implemented in OpenStack but... Authentication Component (Keystone) Messaging bus (RabbitMQ) VM Component (Nova) Operation ScriptA Operation ScriptB L7LB Kubernetes Publish Event Subscribe Event Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Business logic Business logic Business logic Business logic • Sometimes Logic for access rabbitmq code got bigger than actual business logic • All of components/script need to implement that logic first
developing new component which allow us to register program with interested event. It will make more easy to co-work with other component Authentication Component (Keystone) Messaging bus (RabbitMQ) VM Component (Nova) Operation ScriptA Operation ScriptB L7LB Kubernetes Publish Event Logic for access rabbitmq Business logic Business logic Business logic Business logic Subscribe Event Business logic Business logic Business logic Function as a Service New