Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[OSS 2022] Improving CI/CD Process for Cloud Na...

[OSS 2022] Improving CI/CD Process for Cloud Native Python Applications with PyPI Cloud

Python remains one of the most popular languages for building data science applications given its simplicity and widespread support for open-source data science projects. While it is easy to integrate such projects into your own Python applications, the CI/CD pipeline of the applications will become heavier and more tightly coupled, even in a modern containerized infrastructure. PyPI Cloud is an open-source private PyPI server which enables users to store and distribute their own Python packages in a cloud-native way. Not only does it provide easy-to-use package management similar to public PyPI, it also helps users build flexible and manageable Python applications by decoupling the CI/CD process of Python packages from application containerization. This talk explores how you can design cloud infrastructure that allows developers to focus on writing interesting Python code without worrying about how to build, deploy, or distribute it. By using a real world example, we will understand strategies to improve each step of the CI/CD pipeline for your Python applications using PyPI cloud.

Jae Sim

June 12, 2024
Tweet

More Decks by Jae Sim

Other Decks in Technology

Transcript

  1. Improving CI/CD Process for Cloud Native Python Applications with PyPI

    Cloud Jaehyun Sim | Ikigai #ossummit @_simjay
  2. #ossummit Motivation As a Tech startup, 1. We did not

    spend much time on our cloud infrastructure and CI/CD process 2. We spent lots of time trying too many options
  3. #ossummit Overview • Where We Started • CI/CD Challenges •

    PyPI: The Python Package Index • Cloud Architecture with PyPI
  4. #ossummit Where We Started To build a data science platform

    on cloud, we wanted to setup a microservice ecosystem with Python … and CI/CD pipeline for the services
  5. #ossummit Continuous Integration (CI) Write Python code Compile Protobuf Run

    unittest Merge to main branch Build Docker Image
  6. #ossummit Continuous Deployment (CD) Deploy to development Kubernetes cluster Deploy

    to production Kubernetes cluster Run platform tests against the development cluster Run platform tests against the production cluster (written in Python) (written in Python)
  7. #ossummit Challenge: Shared Codebase -> Inefficiency API Server ETL Service

    ML Service Changes to the shared codebase trigger entire pipeline
  8. #ossummit Libraries in Python Module: A bunch of related code

    saved in a file with the extension `.py` Package: A directory of a collection of modules. To be considered a package, a directory must contain a file named `__init__.py` Library: A collection of packages. A library is an umbrella term referring to a reusable chunk of code.
  9. #ossummit Why Is This Bad? Problems 1. We still need

    to re-deliver services when library update is required 2. Hard to keep track of library versions within each service versions -> Decoupling is still required at CI/CD level
  10. #ossummit Are We Done? We can decouple CI/CD process of

    libraries and services “if” we have a Python package repository
  11. #ossummit “Cloud native technologies empower organizations to build and run

    scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach. These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.” - Cloud Native Computing Foundation (https://github.com/cncf/foundation/blob/main/charter.md) Cloud Native?
  12. #ossummit PyPI Server: Scaling Issue? Reference: https://pypi.org/project/pypiserver/#serving-thousands-of-packages PyPI Server scans

    the entire package directory for every HTTP request -> This causes significant slow-downs when serving thousands of packages -> Can speed up with Caching with - pip install pypiserver[cache] - Enabling caching at reverse proxy
  13. #ossummit PyPI Server vs PyPI Cloud [PyPI Server] 1. Portability:

    Yes 2. Security: Yes 3. Resiliency: Yes (with effort) 4. Speed: Yes (with effort) PyPI Server: Disk (NFS for Resiliency) Install manually containerized PyPI Cloud: Hosted Solution Hosted Solution containerized [PyPI Cloud] 1. Portability: Yes 2. Security: Yes 3. Resiliency: Yes 4. Speed: Yes
  14. #ossummit In Conclusion, By using PyPI cloud, we can 1.

    Organize codebase with libraries and services without sacrificing efficiency of CI/CD pipeline 2. Host scalable and secure Python Package Repository for your cloud native environment 3. Engineers can focus on building interesting stuff!