Slide 1

Slide 1 text

Masaya Aoyama CyberAgent - adtech studio Extensibility and Possibility of Kubernetes Custom Resource @Open Source Summit Japan 2019 amsy810 @amsy810

Slide 2

Slide 2 text

Publicity (excerpt)   Books 『Kubernetes Perfect Guide』 『Docker/K8s for everyone』  Keynotes 『Japan Container Days v18.04』 『Google Cloud K8s Day』   Invitations 『IPSJ Computer System Symposium』         『AWS Dev Day Tokyo』 『IBM Think Japan』 『JEITA Committee』   Sessions 『KubeCon + CloudNativeCon China 2019』 and so on  Certifications 『CKAD #2』 『CKA #138』 Masaya Aoyama (@amsy810) Infrastructure Engineer Community   Co-chair 『Cloud Native Days Tokyo (旧 Japan Container Days)』  Organizer 『Cloud Native Meetup Tokyo』   『Kubernetes Meetup Tokyo』   『KubeCon Japanese exchange meeting』   Contribute to OpenStack and Kubernetes Main job: * Implementing K8s as a Service * Architect for related K8s + CREATIONLINE - Technology Advisor + SAKURA Internet Research Center - Visiting Researcher

Slide 3

Slide 3 text

Agenda 1.  What is Kubernetes? 2.  Extends your own CustomResource with kubebuilder 3.  GPUaaS abstraction plan at CyberAgent

Slide 4

Slide 4 text

Do you know Kubernetes?

Slide 5

Slide 5 text

Do you use Kubernetes in production?

Slide 6

Slide 6 text

Do you like Kubernetes?

Slide 7

Slide 7 text

Palm-sized Personal Kubernetes Cluster Everyone has a personal Kubernetes cluster just like a smartphone, right?

Slide 8

Slide 8 text

Kubernetes is anywhere We can use phyisical Kubernetes cluster anywhere

Slide 9

Slide 9 text

Kubernetes is anywhere Be careful, this is not a bomb, this is Kubernetes.

Slide 10

Slide 10 text

What is Kubernetes?

Slide 11

Slide 11 text

What is As a “Container / Application Execution Platform “ As a “X as a Service Platform” As a “Framework for Distributed System”

Slide 12

Slide 12 text

As a “Container / Application Execution Platform” Sophisticated platform based on Google Borg •  Self-healing •  Integration with load balancer •  auto scaling •  data management •  adaption for several workload types

Slide 13

Slide 13 text

As a ”X as a Service Platform” Platform for Platform •  Database as a Service on Kubernetes •  Queue as a Service on Kubernetes •  Serverless as a Service on Kubernetes •  ML as a Service on Kubernetes oracle/mysql-operator Automate “cluster repair at failure” “operation such as backup” like a managed service

Slide 14

Slide 14 text

As a ”X as a Service Platform” Kubernetes like a “small public cloud” oracle/mysql-operator Relational DB Key Value Store Document DB Queue Developer Automate “cluster repair at failure” “operation such as backup” like a managed service

Slide 15

Slide 15 text

As a “Framework for distributed systems” “Decralative API” and “Framework for distributed system” ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch cluster state Create and delete pod Controller

Slide 16

Slide 16 text

Reconcile Loop converges to desired state  = not only to launch, but also to keep the replicas Observe Diff Act Actual state (Cluster state) Desired state reconcile() { … } Controller As a “Framework for distributed systems” Observe actual state calculate Difference Action for filling difference

Slide 17

Slide 17 text

As a “Framework for distributed systems” e.g.) when only 2 pods are running at cluster Observe: desired pods=3, actual pods=2 Observe Diff Act reconcile() { … } Controller Actual state (Cluster state) Desired state Observe actual state calculate Difference Action for filling difference

Slide 18

Slide 18 text

As a “Framework for distributed systems” e.g.) when only 2 pods are running at cluster Diff: 1 pod is missing Observe Diff Act reconcile() { … } Controller Actual state (Cluster state) Desired state Observe actual state calculate Difference Action for filling difference

Slide 19

Slide 19 text

As a “Framework for distributed systems” e.g.) when only 2 pods are running at cluster Act: launch 1 pod includes a nginx:1.12 container Observe Diff Act reconcile() { … } Controller Actual state (Cluster state) Desired state Observe actual state calculate Difference Action for filling difference

Slide 20

Slide 20 text

As a “Framework for distributed systems” Reconcile Loop converges to desired state  = not only launch, keep the replicas operation knowledge convert to program for automation This program is called as “Controller (Operator)” ReplicaSet Controller is working with actual replicas •  if missing, then launchs new pods •  if over, then stops some pods Observe Diff Act Observe actual state calculate Difference Action for filling difference

Slide 21

Slide 21 text

As a “Framework for distributed systems” A lot of controllers are running on Kubernetes •  ReplicaSet Controller •  Deployment Controller •  Endpoints Controller •  Cloud Controller •  etc. There are many controllers that work asynchronously and they make Kubernetes a distributed system. reconcile() { … } Controller reconcile() { … } Controller reconcile() { … } Controller reconcile() { … } Controller reconcile() { … } Controller Actual state Desired state watch

Slide 22

Slide 22 text

As a “Framework for distributed systems” “Decralative API” and “Framework for distributed system” ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch cluster state Controller Create and delete pod

Slide 23

Slide 23 text

Extending Kubernetes Resource using CustomResource CustomResource (extended resource) ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage MySQL cluster Controller reconcile() { … } Controller What to do for Custom Resource Write a controller for make operation easy (Operation knowledges become a program)

Slide 24

Slide 24 text

Build your CustomResource and controller

Slide 25

Slide 25 text

How to create your Kubernetes-native app? we need to create CustomResource: 1.  Scheme (definition) 2.  Controller ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage MySQL cluster Controller

Slide 26

Slide 26 text

How to create your Kubernetes-native app? ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage MySQL cluster Controller reconcile() { … } Controller What to do for Custom Resource Write a controller for make operation easy (Operation knowledges become a program)

Slide 27

Slide 27 text

Example, WebServer Custom Resource we need to create CustomResource: 1.  Scheme (definition) 2.  Controller ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage simple web server hosting WebServer Controller hello, ossummit! hello, ossummit! hello, ossummit!

Slide 28

Slide 28 text

kubebuilder (v1) # Initialize project $ kubebuilder init \ --domain example.com \ --license apache2 \ --owner "amsy810" # Create CustomResource scheme and controller skeleton $ kubebuilder create api \ --group servers \ --version v1beta1 \ --kind WebServer apiVersion: servers.example.com/v1beta1 kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 https://github.com/kubernetes-sigs/kubebuilder

Slide 29

Slide 29 text

Directory structure $ tree ./pkg/ pkg ├─ apis │ ├─ addtoscheme_servers_v1beta1.go │ ├─ apis.go │ └─ servers │ ├─ group.go │ └─ v1beta1 │ ├─ doc.go │ ├─ register.go │ ├─ v1beta1_suite_test.go │ ├─ webserver_types.go │ ├─ webserver_types_test.go │ └─ zz_generated.deepcopy.go │ ├─ controller │ ├─ add_webserver.go │ ├─ controller.go │ └─ webserver │ ├─ webserver_controller.go │ ├─ webserver_controller_suite_test.go │ └─ webserver_controller_test.go └─ webhook └─ webhook.go apiVersion: servers.example.com/v1beta1 kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 1.  API Scheme part 2.  Controller logic part

Slide 30

Slide 30 text

Default generated scheme code kind: WebServer metadata: name: webserver-sample webserver_types.go

Slide 31

Slide 31 text

Update CustomResource scheme kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 here webserver_types.go

Slide 32

Slide 32 text

Update CustomResource scheme Insert additional fileds kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 webserver_types.go

Slide 33

Slide 33 text

Update CustomResource scheme Of course, we can insert nested map struct kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 software: name: nginx version: 1.12 webserver_types.go

Slide 34

Slide 34 text

Update controller logic When CustomResource and related resources are changed, Reconcile() method is called webserver_controller.go

Slide 35

Slide 35 text

Update controller logic At the default, controller has small logic which is creating nginx Deployment webserver_controller.go

Slide 36

Slide 36 text

Update controller logic kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3 At the default, controller has small logic which is creating nginx Deployment

Slide 37

Slide 37 text

Update controller logic “instance” variable is WebServer resoruce kind: WebServer metadata: name: webserver-sample spec: content: "hello, ossummit!" replicas: 3

Slide 38

Slide 38 text

Example, WebServer CRD we need to create CustomResource: 1.  Scheme 2.  Controller ※ strictly, controller also create and delete pod/container via API reconcile() { … } Register (via API Request) Watch Manage simple web server hosting WebServer Controller hello, ossummit! hello, ossummit! hello, ossummit!

Slide 39

Slide 39 text

Example, WebServer CRD reconcile() { … } Register (via API Request) Watch Manage Deployment WebServer Controller hello, ossummit! hello, ossummit! hello, ossummit!

Slide 40

Slide 40 text

MySQL operator reconcile() { … } Register (via API Request) Watch Manage StatefulSet MySQL Operator

Slide 41

Slide 41 text

At CyberAgent

Slide 42

Slide 42 text

GPU environment GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU with nvidia-docker Developer Developer Developer Developer Developer Developer Developer Developer GPU GPU GPU 払い出し

Slide 43

Slide 43 text

Complex YAML manifest GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU with nvidia-docker Developer Developer Developer Developer Developer Developer Developer Developer GPU GPU GPU 払い出し

Slide 44

Slide 44 text

MLTask CustomResource for GPUaaS abstraction Abstract GPUaaS for ML engineers Difficult for most ML engineers to understand complex manifest Simple settings •  Which image to use •  Which training data to use •  Where to place the calculated models •  Where to place the temporary shared files •  How many parallels to run •  How many GPU resources to use

Slide 45

Slide 45 text

reconcile() { … } Register (via API Request) Watch Manage StatefulSet MLTask Controller GPU GPU GPU GPU GPU GPU MLTask CustomResource for GPUaaS abstraction

Slide 46

Slide 46 text

our company own Ingress controller L4 LB VM Network Pod Network VM . VM . Software LB cluster Ingress Controller Watch Manage

Slide 47

Slide 47 text

Conclusion

Slide 48

Slide 48 text

Conclusion As a “Container / Application Execution Platform “ As a “X as a Service Platform” As a “Framework for Distributed System” Observe Diff Act Controller (operation knowledge) CustomResource (scheme) Watch Processing

Slide 49

Slide 49 text

Thank you for your attention Let’s enjoy with Kubernetes follow me: @amsy810

Slide 50

Slide 50 text

No content