TOPICS TO TALK ABOUT Overview of LINE's Private Cloud: VERDA and our First Task to play with it. LINE’s cloud & First Task Brief details about our work including nova features and l2isolate containerization. Our Work Challenges faced as beginners working with OpenStack and large production cloud. Challenges Concise details about OpenStack Upgrade Project that we are working on. Current Project 2
Hardware Requirement Cost & Time Work on Idea WHY CLOUD? WHAT IS CLOUD? ➢ Cloud computing is running and managing workload within clouds. ➢ Clouds are environments that abstract, pool and share scalable resources (memory, network, storage, etc) across the internet. BEFORE CLOUD Instant Cost Efficient Scalable Reliable Security
Industry What runs on OpenStack? Energy WHY OPENSTACK? OpenStack is a cloud operating system that controls large pool of compute, storage and networking resources, all managed and provisioned through APIs.
TASK 10 • Beginners → Need for an OpenStack playground • Multiple options already present in the market: ➢ Devstack: • Quick setup of OpenStack environment • Abstracts details from the user • Not compatible for testing custom features • What we do at LINE (Personal Verda): • Use ansible to create personal OpenStack clusters • An automated script uses dev OpenStack cluster to create smaller clusters • Uses of Personal Verda: • Beginner's playground • Personal test environment before staging or dev/production dev cluster hypervisors -> dev cluster VMs -> (configured) personal cluster hypervisors -> personal cluster VMs
feature for VMs - nova-metadata accesses the script from nova DB - cloud-init runs this script on VM bootup - Added the same feature for PM - LINE's original nova-baremetal driver - using LINE’s custom APIs OUR WORK 11 A. user-script feature for PM (Physical Machine) - Specify target aggregate in OpenStack VM create command - n - Adds a new nova scheduling filter - It filters the host mentioned in specified aggregate - Matches the aggregate key value pair based on hint B. Schedule VM on specified aggregate B > $ openstack server create --image "CentOS 7.9” --availability-zone nova --hint aggregate_id=23 testvm A > $ openstack server create --image "CentOS 7.9” --user-data userscript.sh testvm
on the same hypervisor • Run them as containerized processes ( using docker or podman ): avoid any package dependencies or clash among them • In LINE, containerized our neutron-agent ( l2isolate agent ) using podman • Enables us to use different package versions independently OUR WORK 12
Of Test environments: 5 - No. Of Production environments: 4 - No. Of Development environment: 1 Some tricks and Solution: • Some shell (bash/zsh) config setup and tricks can help: • Color change and display the env/region name • Add the line source ~/.bashrc to the end of each of the regions' openrc files • Enable shell autocomplete and autosuggestions by adding bind 'set show-all-if-ambiguous on' bind 'TAB:menu-complete' CHALLENGES 13 • Multiple openrc (authentication URL) files to authenticate keystone regions for each environment. • Need to quickly source different region's openrc files • Or there are third party tools that manage multiple openrc and provide a user friendly interface for fast context switching. For ex: rally user123 user123
10k hypervisors • Takes few hours to deploy to compute nodes (for nova-compute or neutron-agent deployments) • Use smaller playbooks/handlers instead of running the entire task file in ansible • For example: if you require only to restart the neutron-agents, run only the playbook restart-agent.yml • The hypervisors are divided into host subgroups in ansible host file • This enables us to deploy the same task parallelly to the hypervisor groups CHALLENGES 14
"failed to allocate network" • Get the tap-interface ID ( tapXXXX ) of the VM using the command: • neutron port-list --device-id=<UUID> • First 10 characters of the port id becomes the XXXX in above tapXXXX. • Check tap device on the compute node: ip addr | grep <tapXXXX> CHALLENGES 15 • tap interface failed to create on the compute node: - following checks: 1) check if the neutron-agent is running or not - systemctl status <neutron-agent> 2) check if too many interfaces already present. some hardwares impose upper constraint 3) check if iproute, iptables and ipset packages are installed - yum list | grep <package> ( in RHEL os ) 4) check logs of the neutron-agent and search for ERROR msgs • nova-api failed to receive any vif attachment info from the neutron-server : • Check the messaging driver being used: messages being moved or not from neutron- server to nova-api
like dnsmasq ): - check if the dnsmasq/dhcp-agent process is running and it is assigning the IP: - use tcpdump to check the communication through the tap interface : - tcpdump -ni <tap-id> 2) check if security group rules allow inbound traffic on the compute node: - if not then create a new SG rule for inbound traffic to VMs using: - openstack security group rule create --options <name> CHALLENGES 16 • Error/Issue: IP is assigned but not pingable • Some of the useful tools and commands: - ip addr | grep <tap-device - route –n - iptables –L - cat /etc/sysconfig/iptables - iptables -nL
Old Reached EOL Compatible with python 2.7 • EOL: 2020 Release Date: Oct 2022 Maintained Compatible with python 3 HOW TO ACHIEVE? . ➢ Test setup of Zed version for comfortable switch ➢ Install and setup e2e multinode OpenStack environment with Zed version
config: • rabbit_hosts, rpc_backend, allow_overlapping_ips, dnsmasq_dns_server, secure_proxy_ssl_header • Required networking packages intallation like: • Dnsmasq-utils • Ipset • Conntrack • Linux kernel version (compatibility with network): • Some kernel parameters like nf_hooks_lwtunnel is introduced in linux kernel version > v5.15 or above • nf_conntrack module is needed on the compute nodes to enable network tunneling ( libvirtd installs this module automatically ) CURRENT PROJECT: UPGRADE OPENSTACK 19