Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extensibility and Possibility of Kubernetes Cus...

Extensibility and Possibility of Kubernetes Custom Resource @ Open Source Summit Japan 2019 - Masaya Aoyama, CyberAgent, Inc / ossj2019-k8s

Extensibility and Possibility of Kubernetes Custom Resource @ Open Source Summit Japan 2019 - Masaya Aoyama, CyberAgent, Inc
------------------------------------------------------------------------------------------------------------------
Kubernetes is implemented some resources (eg. Deployment, Service and ConfigMap).
The basic resources is used for general purpose, but we need specific resources in each own domain.
For example, if developers build MySQL cluster with basic resources, startup scripts will be so complex. But the MySQL custom resource with simply manifest will make MySQL cluster as managed Database services.

Kubernetes is not only for container platform, but Kubernetes is also a framework for applications using kubebuilder, Operator SDK and so on.At the CyberAgent, we implement GPUaaS resources and some application cluster resources for sophisticated application management.

In this talk, attendees will learn about how to implement own custom resources and the controller, and I will talk about the possibility and extensibility of kubernetes.

------------------------------------------------------------------------------------------------------------------
Masaya Aoyama is a Infrastructure Engineer for CyberAgent, Inc.

And also, technical advisor of CREATIONLINE, inc., and visiting researcher of SAKURA Internet, Inc.

He is co-chair at Cloud Native Days Tokyo, and he published Kubernetes books.
And he also organizes Cloud Native Meetup Tokyo, Kubernetes Meetup Tokyo and Japanese meetup in KubeCon at the venue.
He passed Kubernetes certification #2 in the world.

Involved in Kubernetes for more than few years, he contributes for Kubernetes (reviewer).
He has been working on production private cloud, which is includes an OpenStack based Kubernetes as a Service Platform.

Masaya Aoyama (@amsy810)

July 18, 2019
Tweet

More Decks by Masaya Aoyama (@amsy810)

Other Decks in Technology

Transcript

  1. Masaya Aoyama CyberAgent - adtech studio Extensibility and Possibility of

    Kubernetes Custom Resource @Open Source Summit Japan 2019 amsy810 @amsy810
  2. Publicity (excerpt)   Books 『Kubernetes Perfect Guide』 『Docker/K8s for everyone』

     Keynotes 『Japan Container Days v18.04』 『Google Cloud K8s Day』   Invitations 『IPSJ Computer System Symposium』         『AWS Dev Day Tokyo』 『IBM Think Japan』 『JEITA Committee』   Sessions 『KubeCon + CloudNativeCon China 2019』 and so on  Certifications 『CKAD #2』 『CKA #138』 Masaya Aoyama (@amsy810) Infrastructure Engineer Community   Co-chair 『Cloud Native Days Tokyo (旧 Japan Container Days)』  Organizer 『Cloud Native Meetup Tokyo』   『Kubernetes Meetup Tokyo』   『KubeCon Japanese exchange meeting』   Contribute to OpenStack and Kubernetes Main job: * Implementing K8s as a Service * Architect for related K8s + CREATIONLINE - Technology Advisor + SAKURA Internet Research Center - Visiting Researcher
  3. Agenda 1.  What is Kubernetes? 2.  Extends your own CustomResource

    with kubebuilder 3.  GPUaaS abstraction plan at CyberAgent
  4. What is As a “Container / Application Execution Platform “

    As a “X as a Service Platform” As a “Framework for Distributed System”
  5. As a “Container / Application Execution Platform” Sophisticated platform based

    on Google Borg •  Self-healing •  Integration with load balancer •  auto scaling •  data management •  adaption for several workload types
  6. As a ”X as a Service Platform” Platform for Platform

    •  Database as a Service on Kubernetes •  Queue as a Service on Kubernetes •  Serverless as a Service on Kubernetes •  ML as a Service on Kubernetes oracle/mysql-operator Automate “cluster repair at failure” “operation such as backup” like a managed service
  7. As a ”X as a Service Platform” Kubernetes like a

    “small public cloud” oracle/mysql-operator Relational DB Key Value Store Document DB Queue Developer Automate “cluster repair at failure” “operation such as backup” like a managed service
  8. As a “Framework for distributed systems” “Decralative API” and “Framework

    for distributed system” ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch cluster state Create and delete pod Controller
  9. Reconcile Loop converges to desired state  = not only to

    launch, but also to keep the replicas Observe Diff Act Actual state (Cluster state) Desired state reconcile() { … } Controller As a “Framework for distributed systems” Observe actual state calculate Difference Action for filling difference
  10. As a “Framework for distributed systems” e.g.) when only 2

    pods are running at cluster Observe: desired pods=3, actual pods=2 Observe Diff Act reconcile() { … } Controller Actual state (Cluster state) Desired state Observe actual state calculate Difference Action for filling difference
  11. As a “Framework for distributed systems” e.g.) when only 2

    pods are running at cluster Diff: 1 pod is missing Observe Diff Act reconcile() { … } Controller Actual state (Cluster state) Desired state Observe actual state calculate Difference Action for filling difference
  12. As a “Framework for distributed systems” e.g.) when only 2

    pods are running at cluster Act: launch 1 pod includes a nginx:1.12 container Observe Diff Act reconcile() { … } Controller Actual state (Cluster state) Desired state Observe actual state calculate Difference Action for filling difference
  13. As a “Framework for distributed systems” Reconcile Loop converges to

    desired state  = not only launch, keep the replicas operation knowledge convert to program for automation This program is called as “Controller (Operator)” ReplicaSet Controller is working with actual replicas •  if missing, then launchs new pods •  if over, then stops some pods Observe Diff Act Observe actual state calculate Difference Action for filling difference
  14. As a “Framework for distributed systems” A lot of controllers

    are running on Kubernetes •  ReplicaSet Controller •  Deployment Controller •  Endpoints Controller •  Cloud Controller •  etc. There are many controllers that work asynchronously and they make Kubernetes a distributed system. reconcile() { … } Controller reconcile() { … } Controller reconcile() { … } Controller reconcile() { … } Controller reconcile() { … } Controller Actual state Desired state watch
  15. As a “Framework for distributed systems” “Decralative API” and “Framework

    for distributed system” ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch cluster state Controller Create and delete pod
  16. Extending Kubernetes Resource using CustomResource CustomResource (extended resource) ※ strictly,

    controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage MySQL cluster Controller reconcile() { … } Controller What to do for Custom Resource Write a controller for make operation easy (Operation knowledges become a program)
  17. How to create your Kubernetes-native app? we need to create

    CustomResource: 1.  Scheme (definition) 2.  Controller ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage MySQL cluster Controller
  18. How to create your Kubernetes-native app? ※ strictly, controller also

    create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage MySQL cluster Controller reconcile() { … } Controller What to do for Custom Resource Write a controller for make operation easy (Operation knowledges become a program)
  19. Example, WebServer Custom Resource we need to create CustomResource: 1. 

    Scheme (definition) 2.  Controller ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage simple web server hosting WebServer Controller hello, ossummit! hello, ossummit! hello, ossummit!
  20. kubebuilder (v1) # Initialize project $ kubebuilder init \ --domain

    example.com \ --license apache2 \ --owner "amsy810" # Create CustomResource scheme and controller skeleton $ kubebuilder create api \ --group servers \ --version v1beta1 \ --kind WebServer apiVersion: servers.example.com/v1beta1 kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 https://github.com/kubernetes-sigs/kubebuilder
  21. Directory structure $ tree ./pkg/ pkg ├─ apis │ ├─

    addtoscheme_servers_v1beta1.go │ ├─ apis.go │ └─ servers │ ├─ group.go │ └─ v1beta1 │ ├─ doc.go │ ├─ register.go │ ├─ v1beta1_suite_test.go │ ├─ webserver_types.go │ ├─ webserver_types_test.go │ └─ zz_generated.deepcopy.go │ ├─ controller │ ├─ add_webserver.go │ ├─ controller.go │ └─ webserver │ ├─ webserver_controller.go │ ├─ webserver_controller_suite_test.go │ └─ webserver_controller_test.go └─ webhook └─ webhook.go apiVersion: servers.example.com/v1beta1 kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 1.  API Scheme part 2.  Controller logic part
  22. Update CustomResource scheme Insert additional fileds kind: WebServer metadata: name:

    webserver-sample spec: content: "hello, ossummit!" replicas: 3 webserver_types.go
  23. Update CustomResource scheme Of course, we can insert nested map

    struct kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 software: name: nginx version: 1.12 webserver_types.go
  24. Update controller logic When CustomResource and related resources are changed,

    Reconcile() method is called webserver_controller.go
  25. Update controller logic At the default, controller has small logic

    which is creating nginx Deployment webserver_controller.go
  26. Update controller logic kind: WebServer metadata: name: webserver-sample spec: content:

    "hello, ossummit!" replicas: 3 At the default, controller has small logic which is creating nginx Deployment
  27. Update controller logic “instance” variable is WebServer resoruce kind: WebServer

    metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3
  28. Example, WebServer CRD we need to create CustomResource: 1.  Scheme

    2.  Controller ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage simple web server hosting WebServer Controller hello, ossummit! hello, ossummit! hello, ossummit!
  29. Example, WebServer CRD reconcile() { … } Register (via API

    Request) Watch Manage Deployment WebServer Controller hello, ossummit! hello, ossummit! hello, ossummit!
  30. MySQL operator reconcile() { … } Register (via API Request)

    Watch Manage StatefulSet MySQL Operator
  31. GPU environment GPU GPU GPU GPU GPU GPU GPU GPU

    GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU with nvidia-docker Developer Developer Developer Developer Developer Developer Developer Developer GPU GPU GPU 払い出し
  32. Complex YAML manifest GPU GPU GPU GPU GPU GPU GPU

    GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU with nvidia-docker Developer Developer Developer Developer Developer Developer Developer Developer GPU GPU GPU 払い出し
  33. MLTask CustomResource for GPUaaS abstraction Abstract GPUaaS for ML engineers

    Difficult for most ML engineers to understand complex manifest Simple settings •  Which image to use •  Which training data to use •  Where to place the calculated models •  Where to place the temporary shared files •  How many parallels to run •  How many GPU resources to use
  34. reconcile() { … } Register (via API Request) Watch Manage

    StatefulSet MLTask Controller GPU GPU GPU GPU GPU GPU MLTask CustomResource for GPUaaS abstraction
  35. our company own Ingress controller L4 LB VM Network Pod

    Network VM . VM . Software LB cluster Ingress Controller Watch Manage
  36. Conclusion As a “Container / Application Execution Platform “ As

    a “X as a Service Platform” As a “Framework for Distributed System” Observe Diff Act Controller (operation knowledge) CustomResource (scheme) Watch Processing