Slide 1

Slide 1 text

Deployment flow of PayPay Apps PayPay DevTalk Vol.1 DevSecOps & DevOps Masaya Ozawa

Slide 2

Slide 2 text

About me Masaya Ozawa - Joined PayPay in July 2018 - 2018/07 ~ Backend engineer - 2019/05 ~ Infrastructure engineer Work From Anywhere - I'm improving my home environment

Slide 3

Slide 3 text

Deployment flow of PayPay Apps

Slide 4

Slide 4 text

Background

Slide 5

Slide 5 text

Previous deployment flow 1. Pull Request 2. Merge 3. Deploy Non-production clusters Production cluster K8s manifests repository PayPayでのk8s活用事例: Kubernetes Meetup Tokyo #22 https://www.slideshare.net/PayPay_career/paypayk8s

Slide 6

Slide 6 text

About Argo CD - One of the OSS projects called argo project - https://argoproj.github.io/argocd - Provides a CRD that behaves like aligning the state of the cluster with the manifest on GitHub - Can deploy various middleware and applications including Argo CD itself

Slide 7

Slide 7 text

Previous deployment flow 1. Pull Request 2. Merge 3. Deploy Non-production clusters Production cluster K8s manifests repository PayPayでのk8s活用事例: Kubernetes Meetup Tokyo #22 https://www.slideshare.net/PayPay_career/paypayk8s

Slide 8

Slide 8 text

Previous deployment flow issues - As the number of applications increased, problems such as dependencies between deployments increased. - Deployment order of function addition across multiple services, etc. - It was solved by communication - However, accidents still occurred, so we decided to consider the mechanism. 1. Pull Request 2. Merge 3. Deploy Non-production clusters Production cluster K8s manifests repository

Slide 9

Slide 9 text

Previous deployment flow issues Issue - When doing a production deployment, it can be difficult to see if the version of each application deployed in production meets the requirements of the application. - The stg cluster has a newer version deployed than the production cluster due to the active development of functions. Solution - Build an environment equivalent to production and make sure to perform integration tests on it

Slide 10

Slide 10 text

Canary Environment Dedicated Load Balancer Public Load Balancer Canary Environment Canary environment App Public App Production Environment Run and verify automated tests on all releases Operation check with a dedicated app DB is shared with production Safely verify operations equivalent to production

Slide 11

Slide 11 text

Canary Environment Requirements - Mandatory automated testing in this environment during production deployment. - The canary environment must always be maintained in a production-like environment - Image version of application, etc. Integrate into our deployment flow!!

Slide 12

Slide 12 text

Deployment flow Design

Slide 13

Slide 13 text

Workflow design In a production deployment, perform the following steps in 1 PR. 1. Create a PR in the manifests repository 2. Tech Lead Approval 3. Deploy to Canary Environment 4. Automatic integration test 5. Manual testing of features outside the cover of automated testing 6. Deploy to Production Environment

Slide 14

Slide 14 text

Which technology to use What technology to use to build a deployment flow?? - Jenkins ?? - GitHub Actions ?? - Or other OSS ??

Slide 15

Slide 15 text

Previous deployment flow (repost) 1. Pull Request 2. Merge 3. Deploy Non-production clusters Production cluster K8s manifests repository

Slide 16

Slide 16 text

Which technology to use To use GitHub Actions - High affinity with GitHub features including Pull Request - Common management mechanisms such as Organization Secret have become available - You don't have to check other GUIs while deploying

Slide 17

Slide 17 text

Workflow design Doing all the processing with 1PR cannot be triggered by branch merge X< So we decided to use GitHub tags as a means of tracking deployments. - https://argoproj.github.io/argo-cd/user-guide/tracking_strategies/#tag-tracking Create the following tags _release For deployment _ For history

Slide 18

Slide 18 text

Workflow design About Tech-Lead Approval. Since the pull request itself has an approve function, we decided to use it. We will prepare a dedicated team for TechLead and check in the following two places. - GitHub Actions - Argo CD PreSync - Assuming a GitHub tag is created outside the deployment flow

Slide 19

Slide 19 text

Workflow design How to proceed with Workflow steps on 1PR - Execute a comment starting with a slash to trigger by referring to prow - It is over-engineered to prepare prow just for this purpose, so I decided to realize it with Github Actions. - How to do it with GitHub Actions - PR comments can be retrieved in the issue_comment event - However, detailed PR information cannot be obtained at issue_comment event. - So, by creating a label, it is converted to the labeled event of pull_request.

Slide 20

Slide 20 text

Workflow design What to do with the processing of each step ? - All applications need to perform operations related to the deployment flow Manage the script to process in another repository - In the workflow job, use the checkout action to get and execute the script - Private Access Token required for use from GitHub Actions - However, central management of scripts is possible.

Slide 21

Slide 21 text

Workflow design How to run automated tests ? - All applications need to perform operations related to the deployment flow This also manages test cases in another repository in the same way - Get test cases managed in another repository in workflow - Run them in parallel

Slide 22

Slide 22 text

Deployment flow

Slide 23

Slide 23 text

Deployment flow 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production - As a result of the design, We decided to take such a deployment flow - Step 1 can be done in advance - The actual deployment flow is steps 2-4

Slide 24

Slide 24 text

Deployment flow 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production Create a PR for deployment - We are using Kustomize for deployment - Use Kustomize's images feature to specify the image version of the application you want to deploy - Get approval from Tech-Lead for the created PR

Slide 25

Slide 25 text

Deployment flow 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production The process is executed by the /canary_release comment. The following processing is executed - Create canary_release tag - Deploy to canary environment - Automatic QA test in Canary environment

Slide 26

Slide 26 text

Deployment flow 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production If check other than the case of automatic QA is required, perform manual check Register the check result with the following comment.

Slide 27

Slide 27 text

Deployment flow 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production The process is executed by the /prod_release comment. The following processing is executed - Create prod_release tag - Deploy to production environment We're also using argo-rollouts, so this deployment is a gradual release, making it even more secure.

Slide 28

Slide 28 text

Deployment flow (Rollback) 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production 5. Production environment rollback If rollback is required for some reason, you can rollback with the following comments This process does the following - Recreate prod_release tag with previous content

Slide 29

Slide 29 text

Deployment flow (Rollback) 1. Create a PR and get approval from Tech-Lead 2. Deploy to Canary and integration test 3. Manual testing 4. Deploy to Production 5. Production environment rollback 6. Canary environment rollback After the production rollback, the canary environment also needs to be rolled back. This process does the following - Recreate canary_release tag with previous content

Slide 30

Slide 30 text

Conclusion

Slide 31

Slide 31 text

- A canary environment was prepared to solve the compatibility/dependency problems between services that the existing deployment flow had. Conclusion Dedicated Load Balancer Public Load Balancer Canary Environment Canary environment App Public App Production Environment

Slide 32

Slide 32 text

Conclusion - Maintenance of the environment In order to make it mandatory to execute tests, a Deployment flow was established. 1. Pull Request 4.Canary deploy Production cluster K8s manifests repository developer Canary release Integrat ion test Manual test result registra tion Prod release approver 2. approve Canary cluster 3.canary_release 5.Integration test 6.Manual test 7.prod_release 8.Prod deploy

Slide 33

Slide 33 text

Conclusion - In this way, a canary environment is prepared just before the production deployment, and the final test is performed there to reduce the probability of an incident. - In the future, we plan to make the following improvements. - Reduce deployment time - Automatic Rollback for fault detection - Etc, etc...

Slide 34

Slide 34 text

ご清聴ありがとうございました!

Slide 35

Slide 35 text

Appendix

Slide 36

Slide 36 text

Icons used in this slide, etc. - Kubernetes : - https://github.com/kubernetes/kubernetes/tree/master/logo - https://github.com/kubernetes/community/tree/master/icons - GitHub: https://github.com/logos - Argo proj: https://cncf-branding.netlify.app/projects/argo/ - Icon font - https://github.com/google/material-design-icons - https://fontawesome.com/license/free

Slide 37

Slide 37 text

Assets

Slide 38

Slide 38 text

Deployment flow 1. Pull Request 4.Canary deploy Production cluster K8s manifests repository developer Canary release Integrat ion test Manual test result registra tion Prod release approver 2. approve Canary cluster 3.canary_release 5.Integration test 6.Manual test 7.prod_release 8.Prod deploy