Slide 1

Slide 1 text

Improving CI/CD Process for Cloud Native Python Applications with PyPI Cloud Jaehyun Sim | Ikigai #ossummit @_simjay

Slide 2

Slide 2 text

#ossummit Introduction Jaehyun Sim Boston, MA Linkedin: linkedin.com/in/simjay [email protected] [email protected] @ Ikigai since 2018 Ikigai Labs ikigailabs.io

Slide 3

Slide 3 text

#ossummit Motivation As a Tech startup, 1. We did not spend much time on our cloud infrastructure and CI/CD process 2. We spent lots of time trying too many options

Slide 4

Slide 4 text

#ossummit Overview • Where We Started • CI/CD Challenges • PyPI: The Python Package Index • Cloud Architecture with PyPI

Slide 5

Slide 5 text

Where We Started

Slide 6

Slide 6 text

#ossummit Where We Started To build a data science platform on cloud, we wanted to setup a microservice ecosystem with Python … and CI/CD pipeline for the services

Slide 7

Slide 7 text

#ossummit Continuous Integration (CI) Write Python code Compile Protobuf Run unittest Merge to main branch Build Docker Image

Slide 8

Slide 8 text

#ossummit Continuous Deployment (CD) Deploy to development Kubernetes cluster Deploy to production Kubernetes cluster Run platform tests against the development cluster Run platform tests against the production cluster (written in Python) (written in Python)

Slide 9

Slide 9 text

#ossummit CI/CD for Python Applications: “what we expected” API Server ETL Service ML Service

Slide 10

Slide 10 text

CI/CD Challenges

Slide 11

Slide 11 text

#ossummit Challenge: Shared Codebase

Slide 12

Slide 12 text

#ossummit Challenge: Shared Codebase -> Inefficiency API Server ETL Service ML Service Changes to the shared codebase trigger entire pipeline

Slide 13

Slide 13 text

#ossummit Decoupling Codebase

Slide 14

Slide 14 text

#ossummit Libraries in Python Module: A bunch of related code saved in a file with the extension `.py` Package: A directory of a collection of modules. To be considered a package, a directory must contain a file named `__init__.py` Library: A collection of packages. A library is an umbrella term referring to a reusable chunk of code.

Slide 15

Slide 15 text

#ossummit Is This Enough?

Slide 16

Slide 16 text

#ossummit Challenge: Tightly Coupled CI/CD Process For Services

Slide 17

Slide 17 text

#ossummit Challenge: Tightly Coupled CI/CD Process For Libraries

Slide 18

Slide 18 text

#ossummit Why Is This Bad? Problems 1. We still need to re-deliver services when library update is required 2. Hard to keep track of library versions within each service versions -> Decoupling is still required at CI/CD level

Slide 19

Slide 19 text

#ossummit Python Package Repository

Slide 20

Slide 20 text

#ossummit CI/CD for Libraries

Slide 21

Slide 21 text

#ossummit CI/CD for Services ml_model==0.0.2 ml_model==0.0.2 ml_model==0.0.2

Slide 22

Slide 22 text

#ossummit Service - Library Versioning ML Service ETL Service

Slide 23

Slide 23 text

#ossummit Are We Done? We can decouple CI/CD process of libraries and services “if” we have a Python package repository

Slide 24

Slide 24 text

The Python Package Index

Slide 25

Slide 25 text

#ossummit “Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach. These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.” - Cloud Native Computing Foundation (https://github.com/cncf/foundation/blob/main/charter.md) Cloud Native?

Slide 26

Slide 26 text

#ossummit Requirements for Python Package Repository 1. Portability 2. Security 3. Resiliency 4. Speed

Slide 27

Slide 27 text

#ossummit PyPI: The Python Package Index

Slide 28

Slide 28 text

#ossummit Public PyPI 1. Security: NONE 2. Resiliency: Yes 3. Portability: Yes 4. Speed: Yes

Slide 29

Slide 29 text

#ossummit PyPI Server

Slide 30

Slide 30 text

#ossummit PyPI Server OR

Slide 31

Slide 31 text

#ossummit PyPI Server: Scaling Issue? Reference: https://pypi.org/project/pypiserver/#serving-thousands-of-packages PyPI Server scans the entire package directory for every HTTP request -> This causes significant slow-downs when serving thousands of packages -> Can speed up with Caching with - pip install pypiserver[cache] - Enabling caching at reverse proxy

Slide 32

Slide 32 text

#ossummit PyPI Server - Demo https://github.com/simjay/pypi-demo/tree/main/pypi-server

Slide 33

Slide 33 text

#ossummit PyPI Cloud

Slide 34

Slide 34 text

#ossummit PyPI Cloud Portable!

Slide 35

Slide 35 text

#ossummit PyPI Cloud - Demo https://github.com/simjay/pypi-demo/tree/main/pypi-cloud

Slide 36

Slide 36 text

#ossummit PyPI Server vs PyPI Cloud [PyPI Server] 1. Portability: Yes 2. Security: Yes 3. Resiliency: Yes (with effort) 4. Speed: Yes (with effort) PyPI Server: Disk (NFS for Resiliency) Install manually containerized PyPI Cloud: Hosted Solution Hosted Solution containerized [PyPI Cloud] 1. Portability: Yes 2. Security: Yes 3. Resiliency: Yes 4. Speed: Yes

Slide 37

Slide 37 text

Cloud Architecture with PyPI

Slide 38

Slide 38 text

#ossummit Cloud Architecture with PyPI Server

Slide 39

Slide 39 text

#ossummit Cloud Architecture with PyPI Server

Slide 40

Slide 40 text

#ossummit Cloud Architecture with PyPI Cloud

Slide 41

Slide 41 text

In Conclusion…

Slide 42

Slide 42 text

#ossummit In Conclusion, By using PyPI cloud, we can 1. Organize codebase with libraries and services without sacrificing efficiency of CI/CD pipeline 2. Host scalable and secure Python Package Repository for your cloud native environment 3. Engineers can focus on building interesting stuff!

Slide 43

Slide 43 text

Thank you!

Slide 44

Slide 44 text

No content