Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deployment flow of PayPay Apps

Deployment flow of PayPay Apps

PayPay Corporation.
PRO

July 07, 2021
Tweet

More Decks by PayPay Corporation.

Other Decks in Technology

Transcript

  1. Deployment flow of PayPay
    Apps
    PayPay DevTalk Vol.1
    DevSecOps & DevOps
    Masaya Ozawa

    View Slide

  2. About me
    Masaya Ozawa
    - Joined PayPay in July 2018
    - 2018/07 ~ Backend engineer
    - 2019/05 ~ Infrastructure engineer
    Work From Anywhere
    - I'm improving my home environment

    View Slide

  3. Deployment flow of
    PayPay Apps

    View Slide

  4. Background

    View Slide

  5. Previous deployment flow
    1. Pull Request
    2. Merge
    3. Deploy
    Non-production clusters Production cluster
    K8s manifests repository
    PayPayでのk8s活用事例: Kubernetes Meetup Tokyo #22
    https://www.slideshare.net/PayPay_career/paypayk8s

    View Slide

  6. About Argo CD
    - One of the OSS projects called argo
    project
    - https://argoproj.github.io/argocd
    - Provides a CRD that behaves like aligning
    the state of the cluster with the manifest
    on GitHub
    - Can deploy various middleware and
    applications including Argo CD itself

    View Slide

  7. Previous deployment flow
    1. Pull Request
    2. Merge
    3. Deploy
    Non-production clusters Production cluster
    K8s manifests repository
    PayPayでのk8s活用事例: Kubernetes Meetup Tokyo #22
    https://www.slideshare.net/PayPay_career/paypayk8s

    View Slide

  8. Previous deployment flow issues
    - As the number of applications increased, problems such as dependencies
    between deployments increased.
    - Deployment order of function addition across multiple services, etc.
    - It was solved by communication
    - However, accidents still occurred,
    so we decided to consider the mechanism.
    1. Pull Request
    2. Merge
    3. Deploy
    Non-production clusters Production cluster
    K8s manifests repository

    View Slide

  9. Previous deployment flow issues
    Issue
    - When doing a production deployment, it can be difficult to see if the version of
    each application deployed in production meets the requirements of the
    application.
    - The stg cluster has a newer version deployed than the production cluster due to the active
    development of functions.
    Solution
    - Build an environment equivalent to production and make sure to perform
    integration tests on it

    View Slide

  10. Canary Environment
    Dedicated Load Balancer Public Load Balancer
    Canary
    Environment
    Canary
    environment App
    Public App
    Production
    Environment
    Run and verify
    automated tests
    on all releases
    Operation check
    with a
    dedicated app
    DB is shared with production
    Safely verify operations
    equivalent to production

    View Slide

  11. Canary Environment
    Requirements
    - Mandatory automated testing in this environment during production
    deployment.
    - The canary environment must always be maintained in a production-like
    environment
    - Image version of application, etc.
    Integrate into our deployment flow!!

    View Slide

  12. Deployment
    flow
    Design

    View Slide

  13. Workflow design
    In a production deployment, perform the following steps in 1 PR.
    1. Create a PR in the manifests repository
    2. Tech Lead Approval
    3. Deploy to Canary Environment
    4. Automatic integration test
    5. Manual testing of features outside the cover of automated testing
    6. Deploy to Production Environment

    View Slide

  14. Which technology to use
    What technology to use to build a deployment flow??
    - Jenkins ??
    - GitHub Actions ??
    - Or other OSS ??

    View Slide

  15. Previous deployment flow (repost)
    1. Pull Request
    2. Merge
    3. Deploy
    Non-production clusters Production cluster
    K8s manifests repository

    View Slide

  16. Which technology to use
    To use GitHub Actions
    - High affinity with GitHub features including Pull Request
    - Common management mechanisms such as Organization Secret have
    become available
    - You don't have to check other GUIs while deploying

    View Slide

  17. Workflow design
    Doing all the processing with 1PR cannot be triggered by branch merge X<
    So we decided to use GitHub tags as a means of tracking deployments.
    - https://argoproj.github.io/argo-cd/user-guide/tracking_strategies/#tag-tracking
    Create the following tags
    _release For deployment
    _ For history

    View Slide

  18. Workflow design
    About Tech-Lead Approval.
    Since the pull request itself has an approve function, we decided to use it.
    We will prepare a dedicated team for TechLead and check in the following two places.
    - GitHub Actions
    - Argo CD PreSync
    - Assuming a GitHub tag is created outside the deployment flow

    View Slide

  19. Workflow design
    How to proceed with Workflow steps on 1PR
    - Execute a comment starting with a slash to trigger by referring to prow
    - It is over-engineered to prepare prow just for this purpose,
    so I decided to realize it with Github Actions.
    - How to do it with GitHub Actions
    - PR comments can be retrieved in the issue_comment event
    - However, detailed PR information cannot be obtained at issue_comment event.
    - So, by creating a label, it is converted to the labeled event of pull_request.

    View Slide

  20. Workflow design
    What to do with the processing of each step ?
    - All applications need to perform operations related to the deployment flow
    Manage the script to process in another repository
    - In the workflow job, use the checkout action to get and execute the script
    - Private Access Token required for use from GitHub Actions
    - However, central management of scripts is possible.

    View Slide

  21. Workflow design
    How to run automated tests ?
    - All applications need to perform operations related to the deployment flow
    This also manages test cases in another repository in the same way
    - Get test cases managed in another repository in workflow
    - Run them in parallel

    View Slide

  22. Deployment
    flow

    View Slide

  23. Deployment flow
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    - As a result of the design, We decided to
    take such a deployment flow
    - Step 1 can be done in advance
    - The actual deployment flow is steps 2-4

    View Slide

  24. Deployment flow
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    Create a PR for deployment
    - We are using Kustomize for deployment
    - Use Kustomize's images feature to specify
    the image version of the application you
    want to deploy
    - Get approval from Tech-Lead for the
    created PR

    View Slide

  25. Deployment flow
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    The process is executed by the
    /canary_release comment.
    The following processing is executed
    - Create canary_release tag
    - Deploy to canary environment
    - Automatic QA test in Canary environment

    View Slide

  26. Deployment flow
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    If check other than the case of automatic QA is
    required, perform manual check
    Register the check result with the following
    comment.

    View Slide

  27. Deployment flow
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    The process is executed by the /prod_release
    comment.
    The following processing is executed
    - Create prod_release tag
    - Deploy to production environment
    We're also using argo-rollouts, so this deployment
    is a gradual release, making it even more secure.

    View Slide

  28. Deployment flow (Rollback)
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    5. Production environment rollback
    If rollback is required for some reason, you can
    rollback with the following comments
    This process does the following
    - Recreate prod_release tag with previous content

    View Slide

  29. Deployment flow (Rollback)
    1. Create a PR and get approval from
    Tech-Lead
    2. Deploy to Canary and integration test
    3. Manual testing
    4. Deploy to Production
    5. Production environment rollback
    6. Canary environment rollback
    After the production rollback, the canary
    environment also needs to be rolled back.
    This process does the following
    - Recreate canary_release tag with previous
    content

    View Slide

  30. Conclusion

    View Slide

  31. - A canary environment was prepared to solve the compatibility/dependency
    problems between services that the existing deployment flow had.
    Conclusion
    Dedicated Load Balancer Public Load Balancer
    Canary
    Environment
    Canary environment
    App Public App
    Production
    Environment

    View Slide

  32. Conclusion
    - Maintenance of the environment In order to make it mandatory to execute
    tests, a Deployment flow was established.
    1. Pull Request 4.Canary deploy
    Production cluster
    K8s manifests repository
    developer
    Canary
    release
    Integrat
    ion test
    Manual
    test
    result
    registra
    tion
    Prod
    release
    approver
    2. approve
    Canary cluster
    3.canary_release
    5.Integration
    test
    6.Manual test
    7.prod_release
    8.Prod deploy

    View Slide

  33. Conclusion
    - In this way, a canary environment is prepared just before the production
    deployment, and the final test is performed there to reduce the probability of
    an incident.
    - In the future, we plan to make the following improvements.
    - Reduce deployment time
    - Automatic Rollback for fault detection
    - Etc, etc...

    View Slide

  34. ご清聴ありがとうございました!

    View Slide

  35. Appendix

    View Slide

  36. Icons used in this slide, etc.
    - Kubernetes :
    - https://github.com/kubernetes/kubernetes/tree/master/logo
    - https://github.com/kubernetes/community/tree/master/icons
    - GitHub: https://github.com/logos
    - Argo proj: https://cncf-branding.netlify.app/projects/argo/
    - Icon font
    - https://github.com/google/material-design-icons
    - https://fontawesome.com/license/free

    View Slide

  37. Assets

    View Slide

  38. Deployment flow
    1. Pull Request 4.Canary deploy
    Production cluster
    K8s manifests repository
    developer
    Canary
    release
    Integrat
    ion test
    Manual
    test
    result
    registra
    tion
    Prod
    release
    approver
    2. approve
    Canary cluster
    3.canary_release
    5.Integration
    test
    6.Manual test
    7.prod_release
    8.Prod deploy

    View Slide