Slide 1

Slide 1 text

Building a Hosting Platorm with Python Andrew Godwin http://www.flickr.com/photos/whiskeytango/1431343034/ @andrewgodwin

Slide 2

Slide 2 text

Hi, I'm Andrew. Serial Python developer Django core committer Co-founder of ep.io

Slide 3

Slide 3 text

We're ep.io Python Platform-as-a-Service Easy deployment, easy upgrades PostgreSQL, Redis, Celery, and more

Slide 4

Slide 4 text

Why am I here? Architecture Overview What we use, and how What did we learn?

Slide 5

Slide 5 text

Architectural Overview In short: Ever so slightly mad.

Slide 6

Slide 6 text

Hardware Real colo'd machines Linode EC2 (pretty unreliable) (pretty reliable) (pretty reliable) IPv6 (as much as we can)

Slide 7

Slide 7 text

Network Internal networks are easy Cross-Atlantic latency is less fun Variety of different restrictions

Slide 8

Slide 8 text

Daemons by the Dozen We have lots of small components 17, as of June 2011 They all need to communicate

Slide 9

Slide 9 text

Redundancy, Redundancy, ... It's very important that no site dies. Everything can be run as a pair HA and backups both needed Cannot rely on a centralised state

Slide 10

Slide 10 text

Security User data is paramount Quite a bit of our code runs as root Permissions, chroot, other isolation VM per site is too much overhead

Slide 11

Slide 11 text

Variety Python sites are pretty varied We need other languages to work too Some things (PostgreSQL vs MySQL) we have to be less flexible on

Slide 12

Slide 12 text

What do we use? Lots of exciting things, that's what.

Slide 13

Slide 13 text

Basic Technologies Eventlet, ZeroMQ, PostgreSQL Historically, Redis Ubuntu/Debian packages & tools

Slide 14

Slide 14 text

Moving Data Message-passing (ZeroMQ, was Redis) Stored state (PostgreSQL, plain text)

Slide 15

Slide 15 text

Storage We're testing btrfs and GlusterFS One type needed for app disk images One type needed for app data store (mounted on every app instance)

Slide 16

Slide 16 text

Eventlet A shiny, coroutine-filled future

Slide 17

Slide 17 text

What is eventlet? Greenlet-based async/"threading" Multiple hubs (including libevent) Threads yield cooperatively on any async operations

Slide 18

Slide 18 text

Brief Example from eventlet.green import urllib results = {} def fetch(key, url): # The urlopen call will cooperatively yield results[key] = urllib.urlopen(url).read() for i in range(10): eventlet.spawn(fetch, i, "http://ep.io/%s" % i) # There's also a waitall() method on GreenPools while len(results) < 10: eventlet.sleep(1)

Slide 19

Slide 19 text

Standard Classes Eventlet-based daemons Multiple main loops, terminates if any die Catches any exceptions Logs to stderr and remote syslog

Slide 20

Slide 20 text

Daemon Example from ... import BaseDaemon, resilient_loop class Locker(BaseDaemon): main_loops = ["heartbeat_loop", "lock_loop"] def pre_run(self): # Initialise a dictionary of known locks. self.locks = {} @resilient_loop(1) def heartbeat_loop(self): self.send_heartbeat( self.lock_port, "locker-lock", )

Slide 21

Slide 21 text

Greening The World You must use greenlet-friendly libraries Others will work, but just block Eventlet supports most of stdlib Can monkeypatch to support other modules

Slide 22

Slide 22 text

We're Not In Kansas Anymore You can still have race conditions Ungreened modules block everything Some combiantions have odd bugs (unpatched Django & psycopg2)

Slide 23

Slide 23 text

Still, it's really useful We've had upwards of 10,000 threads multiprocessing falls over at that level eventlet is easier to use than threading (much less chance of race conditions)

Slide 24

Slide 24 text

Redis Small but perfectly formed.

Slide 25

Slide 25 text

The Beginning Everything in Redis No, really - app disk images too Disk images quickly moved to, uh, disk

Slide 26

Slide 26 text

February - March Doing lots of filtering "queries" Moved user info, permissions to Postgres App info, messaging still there

Slide 27

Slide 27 text

Recently App info moved to Postgres Messaging moved to ZeroMQ Not used by backend any more

Slide 28

Slide 28 text

Why? It's a great database/store, but not for us We may revisit once we get PGSQL issues Looking forward to Redis Cluster

Slide 29

Slide 29 text

ØMQ A møose taught me the symbol.

Slide 30

Slide 30 text

What is ZeroMQ? It's NOT a message queue Basically high-level sockets Comes in many delicious flavours: PUB/SUB REQ/REP PUSH/PULL XREQ/XREP PAIR

Slide 31

Slide 31 text

ZeroMQ Example from eventlet.green import zmq ctx = zmq.Context() # Request-response style socket sock = ctx.sock(zmq.REQ) # Can connect to multiple endpoints, will pick one sock.connect("tcp://1.2.3.4:567") sock.connect("tcp://1.1.1.1:643") # Send a message, get a message sock.send("Hello, world!") print sock.recv()

Slide 32

Slide 32 text

How do we use it? Mostly REQ/XREP Custom @zmq_loop decorator JSON + security measures

Slide 33

Slide 33 text

zmq_loop example from ... import BaseDaemon, zmq_loop class SomeDaemon(BaseDaemon): main_loops = ["query_loop", "stats_loop"] port = 1234 @zmq_loop(zmq.XREP, "port") def query_loop(data): return {"error": "Only a slide demo!"} @zmq_loop(zmq.PULL, "stats_port") def stats_loop(data): # PULL is one-way, so no return data print data

Slide 34

Slide 34 text

Other Nice ZeroMQ things Eventlet supports it, quite well Can use TCP, PGM, or in-process comms Can be faster than raw messages on TCP Doesn't care if your network isn't up yet

Slide 35

Slide 35 text

PTYs Or, How I Learned To Stop Worrying And Love Unix

Slide 36

Slide 36 text

What is a PTY? It's a process-controllable terminal Used for SSH, etc. We needed them for interactivity

Slide 37

Slide 37 text

Attempt One Just run processes in subprocess Great, until you want to be interactive Some programs insist on a terminal

Slide 38

Slide 38 text

Attempt Two Python has a pty module! Take the raw OS filehandles Try to make it greenlet-compatible Works! Most of the time...

Slide 39

Slide 39 text

Greened pty example def run(self): # First, fork to a new PTY. gc.disable() try: pid, fd = pty.fork() except: gc.enable() raise # If we're the child, run our program. if pid == 0: self.run_child() # Otherwise, do parent stuff else: gc.enable() ...

Slide 40

Slide 40 text

Greened pty example fcntl.fcntl(self.fd, fcntl.F_SETFL, os.O_NONBLOCK) # Call IO greenthreads in_thread = eventlet.spawn(self.in_thread) out_thread = eventlet.spawn(self.out_thread) out_thread.wait() out_thread.kill() # Wait for process to terminate rpid = 0 while rpid == 0: rpid, status = os.waitpid(self.pid, 0) eventlet.sleep(0.01) in_thread.wait() in_thread.kill() os.close(self.fd)

Slide 41

Slide 41 text

Attempt Three Use subprocess, but with a wrapper Wrapper exposes pty over stdin/stdout Significantly more reliable

Slide 42

Slide 42 text

Lesser-Known Modules They just want to be your friend.

Slide 43

Slide 43 text

The resource module Lets you set file handle, nproc, etc. limits Lets you discover limits, too

Slide 44

Slide 44 text

The signal module Want to catch Ctrl-C in a sane way? We use it to quit cleanly on SIGTERM Can set handlers for most signals

Slide 45

Slide 45 text

The atexit module Not terribly useful most of the time Used in our command-line admin client

Slide 46

Slide 46 text

The shlex module Implements a shell-like lexer shlex.split("command string") gives you arguments for os.exec

Slide 47

Slide 47 text

The fcntl module The portal to a dark world of Unix We use it for fiddling blocking modes Also contains leases, signals, dnotify, creation flags, and pipe fiddling

Slide 48

Slide 48 text

Closing Remarks Because stopping abruptly is bad.

Slide 49

Slide 49 text

Adopting fresh technologies can be a pain. Eventlet, ZeroMQ, new Redis are all young OS packaging and bugs not always fully worked out.

Slide 50

Slide 50 text

Don't reinvent the wheel, or optimize prematurely. Old advice, but still good. You really don't want to solve things the kernel solves already.

Slide 51

Slide 51 text

Reinvent the wheel, occasionally Don't necessarily use it Helps you to understand the problem Sometimes it's better (e.g. our balancer)

Slide 52

Slide 52 text

Python is really very capable It's easy to develop and maintain It's not too slow for most jobs There's always PyPy...

Slide 53

Slide 53 text

Questions? Andrew Godwin [email protected] @andrewgodwin