Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction of_Private Cloud in LINE

LINE Developers
PRO
February 15, 2019
1.9k

Introduction of_Private Cloud in LINE

LINE Developers
PRO

February 15, 2019
Tweet

More Decks by LINE Developers

Transcript

  1. Introduction of
    Private Cloud in LINE
    Yuki Nishiwaki

    View Slide

  2. Agenda
    1. Introduction/Background of Private Cloud
    2. OpenStack in LINE
    3. Challenge of OpenStack

    View Slide

  3. Who are we?
    Responsibility
    - Develop/Maintain Common/Fundamental Function for Private Cloud (IaaS)
    - Consider/Think of Optimization for Whole Private Cloud
    Network Service Operation Platform
    Storage
    Software
    - IaaS (OpenStack + α)
    - Kubernetes
    Knowledge
    - Software
    - Network, Virtualization, Linux

    View Slide

  4. Before Private Cloud
    Problems/Concerns:
    1. Many Manual Procedure
    2. Cost to communicate
    3. Difficult to scale for provision
    a. For 2000+ engineer
    Problems/Concerns:
    1. Need to ask for Infrastructure Department
    - Change Infrasture
    - More server
    2. Need to predict (difficult)
    - How many servers
    - When need servers
    3. Tend to have unnecessary amount of server
    Dev Team 2
    Give us a server
    Provide a server
    3 month
    - Buy
    - Register
    - Setup
    Dev Team 1
    Infrastructure Department

    View Slide

  5. After Private Cloud
    Improved
    1. Automate many operation
    2. No cost communication
    3. No provisioning cost
    4. Optimize usage of resource
    But
    1. Need Software development
    Improved
    1. Get Infrastructure Resource without human
    interaction
    - Capability of
    - Automate Resource Allocation
    - Automate Resource Deallocation
    - No need prediction
    - No need unnecessary resources
    Provide a server
    Dev Team 1
    Infrastructure Department
    Private Cloud
    Give us a server
    Just few
    sec
    Communicate by API
    Maintain

    View Slide

  6. Private Cloud
    OpenStack
    VM
    (Nova)
    Image
    Store
    (Glance)
    Network
    Controller
    (Neutron)
    Identify
    (Keystone)
    DNS
    Controller
    (Designate)
    Loadbalancer
    L4LB L7LB
    Kubernetes
    (Rancher)
    Storage
    Block
    Storage
    (Ceph)
    Object
    Storage
    (Ceph)
    Database
    Search/Analytics
    Engine
    (ElasticSearch)
    RDBMS
    (Mysql)
    KVS
    (Redis)
    Messaging
    (Kafka)
    Function
    (Knative)
    Baremetal
    Platform
    Service
    Network
    Storage
    Operation
    Operation Tools

    View Slide

  7. Today’s Topic
    OpenStack
    VM
    (Nova)
    Image
    Store
    (Glance)
    Network
    Controller
    (Neutron)
    Identify
    (Keystone)
    DNS
    Controller
    (Designate)
    Loadbalancer
    L4LB L7LB
    Kubernetes
    (Rancher)
    Storage
    Block
    Storage
    (Ceph)
    Object
    Storage
    (Ceph)
    Database
    Search/Analytics
    Engine
    (ElasticSearch)
    RDBMS
    (Mysql)
    KVS
    (Redis)
    Messaging
    (Kafka)
    Function
    (Knative)
    Baremetal
    Operation Tools

    View Slide

  8. OpenStack
    ● Open Source Software to build Private Cloud (like AWS)
    ● Microservice Architecture
    ○ Use only parts/components to be needed
    ○ Scale out for parts/components to be needed

    View Slide

  9. Microservice Architecture of OpenStack

    View Slide

  10. Assemble your own cloud Private Cloud

    View Slide

  11. OpenStack in LINE
    導入時期 2016年
    Version Mitaka + Customization
    クラスタ数 4
    Hypervisor数 1100+
    ● Dev Cluster: 400
    ● Prod Cluster: 600 (region 1)
    ● Prod Cluster: 76 (region 2)
    ● Prod Cluster: 80 (region 3)
    VM数 26000+
    ● Dev Cluster: 15503
    ● Prod Cluster: 8870 (region 1)
    ● Prod Cluster: 335 (region 2)
    ● Prod Cluster: 229 (region 3)

    View Slide

  12. Difficulty of building OpenStack Cloud
    TOR
    Core
    Aggregation
    ToR
    Aggregation
    ToR
    Hypervisor
    Hypervisor
    Hypervisor
    Hypervisor
    Hypervisor
    Hypervisor
    Hypervisor
    Hypervisor
    Aggregation
    ToR
    OpenStack
    database
    OpenStack
    database
    OpenStack
    API
    OpenStack
    API
    Core
    Aggregation
    Datacenter
    Rack
    ● Knowledge of Networking
    ○ Design/Plan whole DC Network
    ● Knowledge of Operation for Large Product
    ○ Build Operation Tool which is not for
    specific software
    ○ Consider User Support
    ● Knowledge of Server Kitting
    ○ Communicate procurement department
    ● Knowledge of OpenStack Software
    ○ Design deployment of OpenStack
    ○ Deploy OpenStack
    ○ Customize OpenStack
    ○ Troubleshooting
    ■ OpenStack Component
    ■ Related Software

    View Slide

  13. Building OpenStack is not completed in one team
    Network Operation Platform
    ● Maintain
    ○ Golden VM Image
    ○ ElasticSearch for logging
    ○ Prometheus for alerting
    ● Develop Operation Tools
    ● User Support
    ● Buy New Servers
    ● Design/Planning
    ○ DC Network
    ○ Inter-DC Network
    ● Implement Network Orchestrator
    (Outside OpenStack)
    ● Design OpenStack Deployment
    ● Deploy OpenStack
    ● Customize OpenStack
    ● Troubleshooting
    Member: 3+ Member: 4+ Member: 4+

    View Slide

  14. Challenge of OpenStack
    Basically We are trying to make OpenStack(IaaS) stable
    What we have done
    1. Legacy System Integration
    2. Bring New Network Architecture into OpenStack Network
    3. Maintain Customization for OSS while keep to catch up upstream
    What we will do
    1. Scale Emulation Environment
    2. Internal Communication Visualizing/Tuning
    3. Containerize OpenStack
    4. Event Hub as a Platform

    View Slide

  15. Challenge of OpenStack
    Basically We are trying to make OpenStack(IaaS) stable
    What we have done
    1. Legacy System Integration
    2. Bring New Network Architecture into OpenStack Network
    3. Maintain Customization for OSS while keep to catch up upstream
    What we will do
    1. Scale Emulation Environment
    2. Internal Communication Visualizing/Tuning
    3. Containerize OpenStack
    4. Event Hub as a Platform

    View Slide

  16. Configuration Management
    Challenge 1: Integration with Legacy System
    Even before cloud, We have many Company-wide Systems for some purpose
    CMDB
    Monitoring System
    Server Login
    Authority Management
    IPDB
    Server
    Register Spec, OS, Location..
    Register IP address, Hostname
    Register server as a monitoring target
    Register acceptable user of server
    setup
    Ask for new server
    Infra Dev

    View Slide

  17. Challenge 1: Integration with Legacy System
    After private cloud, “Server Creation” is completed without Infrastructure
    department interruption. Thus Private Cloud itself should register new server
    Private Cloud
    Configuration Management
    CMDB
    Monitoring System
    Server Login
    Authority Management
    IPDB
    Server
    Create new server
    Dev
    Register

    View Slide

  18. Challenge 2: New Network Architecture in our DC
    For scalability, operatabilty.
    We introduce CLOS Network Architecture and terminate L3 on Hypervisor.
    Previous New

    View Slide

  19. Challenge 2: Support new architecture in OpenStack
    Network Controller
    (Neutron)
    neutron-server
    neutron-dhcp-agent
    neutron-linuxbridge-agent
    OSS implementation
    neutron-metadata-agent
    Expect to share L2 Network
    We want all vms not to share l2 network
    neutron-custom-agent
    Replace
    New

    View Slide

  20. Challenge 3: Improve Customization for OSS
    ● We have customized many OpenStack Components
    ● Previously we just customize it after customize again and again
    OpenStack
    VM
    (Nova)
    Image
    Store
    (Glance)
    Network
    Controller
    (Neutron)
    Identify
    (Keystone)
    DNS
    Controller
    (Designate)
    VM
    (Nova)
    customize commit for A
    customize commit for C
    customize commit for A
    customize commit for B
    customize commit for A
    It’s difficult for us to take specific patch away from
    our customized OpenStack.
    Specific version
    upstream
    LINE version
    forked

    View Slide

  21. Challenge 3: Improve Customization for OSS
    VM
    (Nova)
    customize commit for A
    customize commit for C
    customize commit for A
    customize commit for B
    customize commit for A
    Specific version
    upstream
    LINE version
    forked
    patch for A
    patch for B
    patch for C
    Base Commit ID
    VM
    (Nova)
    Specific version
    maintain by git
    maintain by git
    ● Don’t fork/Stop to fork
    ● Just maintain only patch file in git
    => easily take patch out than before

    View Slide

  22. Challenge will be different from Day1 to Day2
    Day1 (So far)
    ● Develop user faced feature
    ○ Keep same experience as before
    (legacy system)
    ○ Support new architecture
    ● Daily operation
    ○ Predictable
    ○ Unpredictable based on trouble
    Day2 (from now)
    ● Enhance Operation
    ● Optimize Development
    ● Reduce daily operation
    ○ Predictable
    ○ Unpredictable

    View Slide

  23. Challenge of OpenStack
    Basically We are trying to make OpenStack(IaaS) stable
    What we have done
    1. Legacy System Integration
    2. Bring New Network Architecture into OpenStack Network
    3. Maintain Customization for OSS while keep to catch up upstream
    What we will do
    1. Scale Emulation Environment
    2. Internal Communication Visualizing/Tuning
    3. Containerize OpenStack
    4. Event Hub as a Platform

    View Slide

  24. Future Challenge 1: Scale Emulation Environment
    導入時期 2016年
    Version Mitaka + Customization
    クラスタ数 4+1 (WIP: Semi Public Cloud)
    Hypervisor数 1100+
    ● Dev Cluster: 400
    ● Prod Cluster: 600 (region 1)
    ● Prod Cluster: 76 (region 2)
    ● Prod Cluster: 80 (region 3)
    VM数 26000+
    ● Dev Cluster: 15503
    ● Prod Cluster: 8870 (region 1)
    ● Prod Cluster: 335 (region 2)
    ● Prod Cluster: 229 (region 3)
    The number of hypervisor is continuously
    increased
    We faced the situation
    - Timing/Scale related error
    - Some operation took long time
    !

    View Slide

  25. We need environment to simulate scale from following point of view without
    preparing same number of Hypervisor
    ● Database Access
    ● RPC over RabbitMQ
    Future Challenge 1: Scale Emulation Environment
    They are control plane specific load.
    We can use this environment for tuning of control plane in OpenStack

    View Slide

  26. ● Implement Fake Agent
    (nova-compute)
    (neutron-agent)
    ● Use container instead
    of actual HV
    Future Challenge 1: Scale Emulation Environment
    Hypervisor
    (nova-compute, neutron-agent)
    Controle Plane
    Controle Plane
    Controle Plane
    600 HV
    Orchestrate/Manage
    Real Environment Scale Environment
    Controle Plane
    Controle Plane
    Controle Plane
    ● Use same env
    600 fake-HV
    Server
    Fake HV (docker container)
    (nova-compute, neutron-agent)
    Hypervisor
    (nova-compute, neutron-agent)
    Hypervisor (HV)
    (nova-compute, neutron-agent)
    Fake HV (docker container)
    (nova-compute, neutron-agent)

    View Slide

  27. ● Implement Fake Agent
    (nova-compute)
    (neutron-agent)
    ● Use container instead
    of actual HV
    Future Challenge 1: Scale Emulation Environment
    Hypervisor
    (nova-compute, neutron-agent)
    Controle Plane
    Controle Plane
    Controle Plane
    600 HV
    Orchestrate/Manage
    Real Environment Scale Environment
    Controle Plane
    Controle Plane
    Controle Plane
    ● Use same env
    600 fake-HV
    Server
    Fake HV (docker container)
    (nova-compute, neutron-agent)
    Hypervisor
    (nova-compute, neutron-agent)
    Hypervisor (HV)
    (nova-compute, neutron-agent)
    Fake HV (docker container)
    (nova-compute, neutron-agent)
    Easy to add new Fake HV
    => We can emulate any number of scale

    View Slide

  28. Future Challenge 2: Communication Visualizing
    There are 2 types of communication among OpenStack each software
    Authentication
    (Keystone)
    VM
    (Nova)
    Network
    (Neutron)
    Microservice
    ● Restful API
    (between component)
    ● RPC over Messaging Bus
    (inside component)
    Restful API
    Restful API
    Restful API
    neutron-agent
    neutron-server
    RPC

    View Slide

  29. Future Challenge 2: Communication Visualizing
    Authentication
    (Keystone)
    VM
    (Nova)
    Network
    (Neutron)
    Microservice
    Restful API
    Restful API
    Restful API
    neutron-agent
    neutron-server
    RPC
    Anytime this can be broken
    Communication can be failed.
    - Because of scale
    - Because of in-proper config
    Error sometimes got
    propagated from one to other

    View Slide

  30. Future Challenge 2: Communication Visualizing
    Authentication
    (Keystone)
    VM
    (Nova)
    Network
    (Neutron)
    Microservice
    Restful API
    Restful API
    Restful API
    neutron-agent
    neutron-server
    RPC
    Anytime this can be broken
    Communication can be failed.
    - Because of scale
    - Because of in-proper config
    Error sometimes got
    propagated from one to other
    1. Very difficult to troubleshoot this kind of issue because of
    - Error got propagated from one to another
    - Log is not always enough information
    - Log is only shown when something happen
    2. Sometimes problem can be predicted by some metrics
    - how many rpc got received
    - how many rpc waited for reply

    View Slide

  31. Future Challenge 2: Communication Visualizing
    Authentication
    (Keystone)
    VM
    (Nova)
    Network
    (Neutron)
    Microservice
    Restful API
    Restful API
    Restful API
    neutron-agent
    neutron-server
    RPC
    Monitoring tool
    Monitor Communication
    related metrics

    View Slide

  32. Future Challenge 3: Containerize OpenStack
    Motivation/Current Pain Point
    ● Complexity of packaging tool like RPM
    ○ Dependency between packages
    ○ Configuration for new file
    => We need to build RPM everytime we changed the code
    ● Impossible to run different version of OpenStack on same server
    ○ Dependency of common library of OpenStack
    => we actually deployed much more control plane servers than we actually need
    ● Lack of observability for all softwares running on control plane
    ○ No way to identify which part is to install depended library and which part is to install our
    software in deployment script (ansible, chef…)
    ○ Deployment script doesn’t take care software running after deployed
    ○ We can not notice if some developer run something temporally script

    View Slide

  33. Future Challenge 3: Containerize OpenStack
    Server Server Server
    Ansible
    Playbook
    Ansible
    Playbook
    Ansible
    Playbook
    Install library
    Install software
    Start software
    K8s manifest
    K8s manifest
    nova-api
    neutron-server
    common-library
    RPM
    Server
    nova-api
    neutron-server
    common-library
    Docker
    Registry
    Get package
    Server Server Server
    nova-api container
    nova-api
    common-library
    nova-api container
    nova-api
    common-library
    Install software
    Start software

    View Slide

  34. Future Challenge 4: EventHub as a Platform
    OpenStack
    VM
    (Nova)
    Image
    Store
    (Glance)
    Network
    Controller
    (Neutron)
    Identify
    (Keystone)
    DNS
    Controller
    (Designate)
    Loadbalancer
    L4LB L7LB
    Kubernetes
    (Rancher)
    Storage
    Block
    Storage
    (Ceph)
    Object
    Storage
    (Ceph)
    Database
    Search/Analytics
    Engine
    (ElasticSearch)
    RDBMS
    (Mysql)
    KVS
    (Redis)
    Messaging
    (Kafka)
    Function
    (Knative)
    Baremetal
    Operation Tools

    View Slide

  35. Future Challenge 4: EventHub as a Platform
    OpenStack
    VM
    (Nova)
    Image
    Store
    (Glance)
    Network
    Controller
    (Neutron)
    Identify
    (Keystone)
    DNS
    Controller
    (Designate)
    Loadbalancer
    L4LB L7LB
    Kubernetes
    (Rancher)
    Storage
    Block
    Storage
    (Ceph)
    Object
    Storage
    (Ceph)
    Database
    Search/Analytics
    Engine
    (ElasticSearch)
    RDBMS
    (Mysql)
    KVS
    (Redis)
    Messaging
    (Kafka)
    Function
    (Knative)
    Baremetal
    Operation Tools
    Depending on others
    Some component/operation script want to do something
    When User(actually project) in Keystone is deleted
    When VM is created
    When RealServer is added to Loadbalancer

    View Slide

  36. Pub/Sub Concept in Microservice Architecture
    Authentication
    Component
    VM
    Component
    Publish important event of
    own component
    Subscribe just interested
    events
    Network
    Component
    This component can do
    something when interested event
    happened
    This component don’t have to
    consider who this component
    need to work with
    Messaging bus
    (RabbitMQ)

    View Slide

  37. Pub/Sub Concept in Microservice Architecture
    Authentication
    Component
    VM
    Component
    Publish important event of
    own component
    Subscribe just interested
    events
    Network
    Component
    This component can do
    something when interested event
    happened
    This component don’t have to
    consider who this component
    need to work with
    Messaging bus
    This mechanism allow us to extend Private Cloud
    (Microservice) without changing existing code for future

    View Slide

  38. Future Challenge 4: EventHub as a Platform
    This part of notification logic has been already implemented in OpenStack but...
    Authentication
    Component
    (Keystone)
    Messaging bus
    (RabbitMQ)
    VM
    Component
    (Nova)
    Operation ScriptA
    Operation ScriptB
    L7LB
    Kubernetes
    Publish Event Subscribe Event
    Logic for access rabbitmq
    Logic for access rabbitmq
    Logic for access rabbitmq
    Logic for access rabbitmq
    Business logic
    Business logic
    Business logic
    Business logic

    View Slide

  39. Future Challenge 4: EventHub as a Platform
    This part of notification logic has been already implemented in OpenStack but...
    Authentication
    Component
    (Keystone)
    Messaging bus
    (RabbitMQ)
    VM
    Component
    (Nova)
    Operation ScriptA
    Operation ScriptB
    L7LB
    Kubernetes
    Publish Event Subscribe Event
    Logic for access rabbitmq
    Logic for access rabbitmq
    Logic for access rabbitmq
    Logic for access rabbitmq
    Business logic
    Business logic
    Business logic
    Business logic
    ● Sometimes Logic for access rabbitmq code got bigger
    than actual business logic
    ● All of components/script need to implement that logic first

    View Slide

  40. Future Challenge 4: EventHub as a Platform
    We are currently developing new component which allow us to register program
    with interested event. It will make more easy to co-work with other component
    Authentication
    Component
    (Keystone)
    Messaging bus
    (RabbitMQ)
    VM
    Component
    (Nova)
    Operation ScriptA
    Operation ScriptB
    L7LB
    Kubernetes
    Publish Event
    Logic for access rabbitmq
    Business logic
    Business logic
    Business logic
    Business logic
    Subscribe Event
    Business logic
    Business logic
    Business logic
    Function as a Service
    New

    View Slide

  41. For more future: IaaS to PaaS, CaaS….
    We are currently trying to introduce additional abstraction layer above from IaaS
    ● https://engineering.linecorp.com/ja/blog/japan-container-days-v18-12-report/
    ● https://www.slideshare.net/linecorp/lines-private-cloud-meet-cloud-native-world

    View Slide