Microservices with Python and Flask Miguel Grinberg @miguelgrinberg

Agenda ● First hour ○ Introduction & demo ○ Microservices concepts ○ Class project design discussion ● Remaining time ○ Step-by-step development of the class project ○ (Focus is on techniques more than on a particular implementation)

About the MicroFlack Project ● MicroFlack is a microservices version of Flack ○ Flack is a chat server app I used in the “Flask at Scale” class I gave at PyCon 2016 ● The application lives on 7 (yes, seven!) GitHub repositories ● Runs on Python 3.4+ and Docker ● Not tied to any specific cloud or container orchestration technology

Deploying MicroFlack to your Laptop ● Requirements ○ 4GB RAM (8GB recommended) ○ Vagrant ○ VirtualBox ○ Everything is installed in an Ubuntu 16.04 VM (Windows, Mac, Linux laptops are all OK!) ● Deployment commands: git clone cd microflack_admin vagrant up # to create the VM or restart it after shutdown vagrant ssh # to open a shell session on the VM vagrant halt # to shutdown the VM (without destroying it) vagrant snapshot save clean # to save a snapshot with name “clean” vagrant snapshot restore clean --no-provision # to restore the snapshot vagrant destroy # to delete the VM

Monolithic Flack

MicroFlack Features ● Five microservices, four HTTP/REST, one WebSocket ● Each service is a standalone Flask app ● Two of the services use MySQL databases ● Services run in Docker containers ● Services communicate over HTTP, message queue or service registry ● All services are load balanced ● Services scale independently of each other ● Upgrades can be done without downtime

Microservice Benefits ● Less complexity (maybe) ○ Awesome for teams with varying degrees of experience ● Scaling flexibility ● More reliability ● Less coupling ● More choice ● Deploy/upgrade while running So where is performance in this list?

Breaking up a Monolith ● Going from monolith to microservices is very hard ● Several strategies ○ Microservices only going forward ○ Break pieces of functionality into microservices over time ○ Refactor the entire monolith into microservices ● In all cases, a base platform needs to be put in place before refactoring work begins ● Good automated tests are crucial in avoiding bugs when refactoring functionality into a microservice

The Microservices Platform

Load Balancer ● All services are load balanced ● You think you don’t need load balancing? ○ Upgrades without downtime require a load balancer for a seamless transition ○ Autoscaling, A/B testing, green/blue deployments, etc. become possible ● Many options ○ Open source: Nginx, HAProxy, Træfik ○ As a service: AWS, OpenStack, Azure, Google Cloud, etc. ○ Serverless: Load balancing and scaling are implicitly done by the cloud operator ■ The Lambda and API Gateway services on AWS are by far the best in this category ○ Hardware: F5

Service Registry ● A distributed data store that keeps track of running services ● All running services maintain an entry in the service registry ● The load balancer’s configuration is generated and updated from the registry contents ● Many great open source projects to choose from: ○ Etcd (CoreOS) ○ Consul (HashiCorp) ○ Zookeeper (Apache) ○ Eureka (Netflix) ○ SmartStack (Airbnb) ○ and more!

Logging ● Logs from all the services are typically consolidated to a single stream to avoid the complexity of maintaining lots of individual log files ● Open source: ○ ELK stack (elasticsearch + logstash + kibana) ○ Logspout ● Several commercial options: ○ Papertrail ○ Splunk ○ Loggly ○ and more!

Containers ● Provide an additional layer of isolation over processes ● Each container runs a virtualized host ○ You can have containers using different Linux distros on the same host ○ Dependencies that would conflict if installed on the same host can be installed on containers ○ Virtualized network ports ● A container maps to one main process, but there can be additional tasks ● Not a required component of the stack, but very convenient

Application State ● Stateful services are hard to manage ○ No way to avoid them for service registry, databases, message queues, etc. ● Stateless services are easily scalable, replaceable and disposable ● Application-level services should ideally be stateless, and should use stateful services for storage ● The state associated with a service should be private ○ Each service must use its own database ○ This prevents coupled services that are difficult to upgrade independently of each other ○ Database joins across services must be done in the application

Retries ● Distributed systems have a lot of moving parts ● It is always a good idea to implement retries for all network calls ● The “exponential backoff” strategy works nicely in this environment

The “Backwards-Compatible” Rule ● Changes to services must be backwards compatible ○ Why? Because a distributed system cannot be updated atomically without downtime ● Database migrations must not break any code that may still be deployed ○ Phased micro-upgrades can help with column renames or deletes, constraints, etc. ● API changes must not break any code that may still be deployed ○ Why? Need a way to upgrade the API consumer and producer independently ● Complex changes that span several services must be “orchestrated” so that they can be applied as micro-deployments without breaking the system

Lifecycle of a Microservice ● On startup, the microservice registers with the service registry, or is “discovered” by it ● The load balancer watches the registry and updates itself to include the new microservice ● The new service starts receiving traffic from the load balancer ● If more than one instance of the service exist, the traffic is split among them ● The service sends “keep-alive” signals, or responds to periodic health checks ● When the service is stopped, or stops sending keep-alives, or fails a health check, it is removed from the registry, and in turn from the load balancer

Recommended reading: The Twelve-Factor App ● Codebase ● Dependencies ● Config ● Backing services ● Build, release, run ● Processes ● Port binding ● Concurrency ● Disposability ● Dev/prod parity ● Logs ● Admin Processes

MicroFlack Design

Choosing a Stack ● For the services: ○ Flask and friends ● For the platform: ○ Load balancer: HAProxy ○ Service registry: Etcd ○ Confd ( configures the load balancer dynamically ○ Docker containers ○ Logspout log consolidation ○ MySQL databases ○ Redis message queue

Leveraging Public Container Images ● miguelgrinberg/easy-etcd ○ Container image that deploys etcd clusters ● miguelgrinberg/easy-lb-haproxy ○ Load balancer + confd preconfigured ● gliderlabs/logspout ○ Consolidated log output ● mysql:5.7 ○ Official Docker image for MySQL 5.7 ● redis:3.2-alpine ○ Official Docker image for Redis 3.2

Administration Scripts: microflack_admin ● Vagrantfile: deploy to a single-node Vagrant VM ●,, deploy scripts ● mfvars: common environment variables ● mfclone: clone the repositories for all the services ● mfbuild: build Docker images for services ● mfrun: start services ● mfkill: stop services ● mflogs: consolidated log stream of all services ● mfupgrade: upgrade services ● mfenv: generate a .env file with environment needed for development ● mfdev: attach a locally running service to a deployed system for debugging ● etcd-dump: dump the contents of the service registry to the console

Common Code: microflack_common ● There is some functionality that all microservices need ○ Service registration ○ Unit testing helpers ○ Inter-service communication ○ Authentication handlers ● We’ll use a Python package that services can install with pip ● Easy option: install from pypi (if you don’t mind making it public) ● Less easy option: private package installed from a local file ○ We’ll use the Python wheel format for this (pip install wheel) ○ The --find-links option in pip can install packages from a local file system directory ○ The mkwheel script builds the wheel packages

Authentication: Let’s switch to JWTs ● Tokens stored in a database are inconvenient ○ Services would need to send a request to the tokens service for verification ● JSON Web Tokens (JWTs) can be verified just with cryptography ○ A JWT token stores data inside it, such as a username or id ○ When the token is generated, a cryptographic signature is added to it ○ Signature can only be generated or verified if you have a secret key ○ The data in a token can be trusted only if the token has a valid signature ○ Not everything is great with JWTs: token revocations become harder ● Since tokens are opaque, switching to JWT is not a breaking change ● Beware of JWT exploits: always set and check signing algorithm

MicroFlack Service Boundaries Microservice URL(s) User interface / /static/... Users /api/users /api/users/:id Tokens /api/tokens Messages /api/messages /api/messages/:id Socket.IO /

From Flack to MicroFlack

MicroFlack v0.1: Just the UI

UI Service Summary ● Endpoints Method Endpoint Authentication Description GET / None Client HTML page GET /static/app.js None Main client application code GET /static/*.js None Client application code GET /static/*.css None Client application stylesheets

UI Service: microflack_ui ●,, templates/, static/, requirements.txt ○ Ordinary Flask app that serves the index HTML page plus all the JavaScript and CSS files that make up the client application ○ To ease the transition, at this stage we’ll use an older version of the UI that does not use Socket.IO (we’ll add Socket.IO later) ● .env ○ Environment variables ○ This file should not be added to source control, as it can contain secrets ●, tox.ini ○ Unit tests, code coverage and linting ● Dockerfile,, ○ Docker support

Running the UI service ● vagrant ssh (connect to the VM) ● mfkill all (reset your VM to an initial state without any services) ○ Watch the load balancer at ● cd ~/microflack_ui ● git checkout 1 (get version 1 of the UI service) ● ./ (build the service) ● mfrun ui (run the service) ● Connect to the application at ○ Browser errors are expected, as no other services are yet running

Incorporating the Flack Monolith ● The service registry has a simple tree structure (use etcd-dump to see it) ● You can register the Flack monolith with the load balancer in the VM: ○ curl -X PUT $ETCD/v2/keys/services/monolith/location -d value="/api" ○ curl -X PUT $ETCD/v2/keys/services/monolith/upstream/server -d value="" ○ Note: is the IP address the host machine has inside a vagrant VM ● Now the UI is served by the new microservice, while everything else comes from the old Flack+Celery application ● To remove: ○ curl -X DELETE $ETCD/v2/keys/services/monolith?recursive=true

MicroFlack v0.2: Users service

Users Service Summary ● Endpoints (new endpoints in red) Method Endpoint Authentication Description POST /api/users None Register a new user GET /api/users Token Optional Get list of users GET /api/users/:id Token Optional Get user by id PUT /api/users/:id Token Modify user by id GET /api/users/me Basic Authenticate user

Users service: microflack_users ● Same basic structure as the ui microservice ● Includes User model and all /api/users endpoints from original Flack ● Token authentication imported from microflack_common ● Ported existing unit tests and used them as a guide to fix everything up ● Add database migration support (Flask-Migrate) ○ Databases are created by mfrun if they don’t exist yet ○ Migrations are executed in the container startup script ● Add new /api/users/me endpoint to validate username and password and return user information

Running the Users service ● cd ~/microflack_users ● git checkout 1 (select version 1 of the service) ● ./ (build the service) ● mfrun users (run the service) ○ The /api/users family of endpoints should now be working!

MicroFlack v0.3: Tokens service

Tokens Service Summary ● Endpoints (postponed endpoints grayed out) Method Endpoint Authentication Description POST /api/tokens Basic Request a token DELETE /api/tokens Token Revoke a token

Token service: microflack_tokens ● Just one endpoint copied from the monolithic app: /api/tokens ● Authentication is relayed to the users service /me endpoint ● Generated JWT token contains the numeric user id ● No token revocations for now

Running the Tokens service ● cd ~/microflack_tokens ● git checkout 1 (select version 1 of the service) ● ./ (build the service) ● mfrun tokens (run the service) ○ The /api/tokens endpoint should now be working

MicroFlack v0.4: Messages service

Messages Service Summary ● Endpoints Method Endpoint Authentication Description POST /api/messages Token Post a new message GET /api/messages Token Optional Get list of messages GET /api/messages/:id Token Optional Get message by id PUT /api/messages/:id Token Modify message by id

Messages Service: microflack_messages ● Structure based on the user service ● Models and endpoints copied from original Flack code ● Removed all asynchronous functions for now ○ We want a basic app up and running from which we can build on ● Ported unit tests to verify the code works

Running the Messages service ● cd ~/microflack_messages ● git checkout 1 (select version 1 of the service) ● ./ (build the service) ● mfrun messages (run the service) ● The application should be fully functional (though not very performant yet)

MicroFlack v0.5: Async message rendering ● Original Flack used Celery for asynchronous message rendering ○ Unfortunately, Celery workers are by design tightly coupled with the caller process ● Instead of Celery, we will use background threads for rendering ○ Our render task is not CPU intensive, so this works very well ○ For CPU intensive tasks, the multiprocessing module can be used instead ○ If a very high volume of tasks must be supported, an asynchronous server can be used

Development Workflow: Common Package ● Build the common packages locally: ○ cd microflack_admin; source mfvars; cd .. ○ git clone ○ cd microflack_common ○ ./mkwheel all ○ cd ..

Development Workflow: Running a Service ● Set up the source code for the desired microservice: ○ cd microflack_admin; source mfvars; cd .. ○ git clone ○ cd microflack_messages ○ python3 -m venv venv ○ source venv/bin/activate ○ pip install -r requirements.txt ○ flask run ● Test by sending requests with curl, httpie, postman, etc. ● For integration testing with an actual system running in a VM: ○ Create .env file (run mfenv inside VM to get the variables you need) ○ mfdev start messages ○ flask run

Upgrading the Messages service ● cd ~/microflack_messages ● git checkout 2 (select version 2 of the service) ● ./ (build the service) ● mfupgrade try messages (start upgrade) ○ After watching the log for a few seconds, hit Ctrl-C ● mfupgrade roll (rolling upgrade)

MicroFlack v0.6: Token revocation

Tokens Service Summary ● Endpoints (new endpoints in red) Method Endpoint Authentication Description POST /api/tokens Basic Request a token GET /api/tokens Token Check if a token is revoked DELETE /api/tokens Token Revoke a token

Token revocation ● We need to maintain a list of revoked tokens ○ The best place to implement this is the tokens service ○ Revoked tokens need to be kept in a list only until they expire ○ We can keep the list in etcd, and write all entries with the appropriate expiration ● Services need to check tokens against that list ○ We can encapsulate this inside the verify_token function in microflack_common ○ Option 1 (more correct): send a request to the tokens service to check revocation status ○ Option 2 (more performant): check the list in etcd directly ○ Improvement for both options: cache calls to verify_token

Upgrading the Tokens service ● cd ~/microflack_tokens ● git checkout 2 (select version 2 of the service) ● ./ (build the service) ● mfupgrade roll tokens (rolling upgrade, skipping the “try” step) ● Upgrade the services that work with tokens: ○ users to version 2 ○ messages to version 3

MicroFlack v1.0: Socket.IO service

Socket.IO Service Summary ● Socket.IO client to server events ● Socket.IO server to client events (new in red) Event Authentication Description ping_user Token Mark a user as online post_message Token Post a message disconnect Token (from session) Mark the user as offline Event Description updated_model Render updated user or message expired_token Ask user to log in again

System changes for Socket.IO ● Task list for the Socket.IO service ○ Implement “ping_user”, “post_message” and “disconnect” events ○ Push “expired_token” notifications to clients when appropriate ● Task list for the common package ○ Add support for setting sticky sessions in the load balancer ● Task list for the UI service ○ Add Socket.IO support ○ Handle expired tokens in Socket.IO calls (bug in old version) ● Task list for users service ○ Add “ping” and “user offline” endpoints ○ Push “updated_model” notifications to message queue ● Task list for messages service ○ Push “updated_model” notifications to message queue

Socket.IO service: microflack_socketio ● Same structure as a no-database service ● No HTTP endpoints, only the three Socket.IO events ● Needs to be an async service due to the long term WebSocket connections ○ Don’ t understand why? Come to my talk “Asynchronous Python” on Sunday! ○ We have several options ■ python-socketio supports WSGI-compatible async frameworks (eventlet, gevent) and also asyncio ■ Flask-SocketIO builds on python-socketio, but drops asyncio support ○ While we don’t need Flask for this service, having access to Flask’s user session is handy ○ We’ll go with Flask-SocketIO and eventlet for this service

Users Service Summary ● Endpoints (new endpoints in red) Method Endpoint Authentication Description POST /api/users None Register a new user GET /api/users Token Optional Get list of users GET /api/users/:id Token Optional Get user by id PUT /api/users/:id Token Modify user by id GET /api/users/me Basic Authenticate user PUT /api/users/me Token Set user online DELETE /api/users/me Token Set user offline

Running the Socket.IO service ● cd ~/microflack_socketio ● git checkout 1 (select version 1 of the service) ● ./ (build the service) ● mfrun socketio (run the service) ● Upgrade the services that work with Socket.IO (order is important!): ○ users to version 3 ○ messages to version 4 ○ ui to version 2

MicroFlack on Other Platforms

Kubernetes ● Open-source container orchestration, from Google ● Manages a cluster of nodes (container hosts) transparently ● Works with Docker images ● Has its own service registry and load balancer ● Gives each service a DNS name (i.e. http://users connects to the Users service) ● Stores secrets securely ● Handles service replication, and does rolling upgrades ● microflack_admin includes example Kubernetes deployment scripts in install/kubernetes

Amazon ECS ● A cluster of Docker hosts running on EC2 instances ● Access to the AWS ecosystem ● MicroFlack platform and application containers can run without change ● Some effort required in configuring roles and security groups (as with everything done on AWS)

AWS Serverless (Lambda & API Gateway) ● Very different paradigm, only the application logic is uploaded to AWS ○ No need for gunicorn, just the application code that handles the endpoints ○ Tools like Zappa (or my own Slam) enable transparent support for WSGI apps ● AWS Lambda provides automatic load balancing and auto-scaling ● Access to the AWS ecosystem ● Cons: ○ No WebSocket support, since there is no server running all the time ○ Response times are not great

Good ol’ Processes ● Deploying MicroFlack on a host without Docker is possible ● All the core components of the platform can be installed without Docker: ○ etcd, haproxy, confd, mysql, redis ● The MicroFlack application services can run as regular gunicorn processes ○ But a network port assignment strategy needs to be implemented

Improvement Ideas ● Revoke all tokens for a user, and revoke all tokens for everybody ● Use multiple chat rooms instead of just one (this will require some client-side work as well) ● Protect the /stats and /logs endpoints with authentication ● Add SSL termination to (or in front of) the load balancer ● Deploy the ELK stack and configure the logspout container to forward logs to it ● Add a secrets store component (maybe HashiCorp’s Vault) ● Create a multi-host deployment, possibly with redundant load balancers ● Replace the single-node Redis deployment with a Redis or RabbitMQ cluster ● Replace the single-node MySQL with a Galera cluster ● Replace MySQL with NoSQL databases of your choice ● With all services running 3+ instances, create a “chaos monkey” script ● Implement a platform service that recycles application services that die or are unresponsive, and maybe even auto-scales them based on load

