The Peris of Writing a PaaS

The Peris of Writing a PaaS

A talk I gave at London Devops in May of 2011.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

May 10, 2011
Tweet

Transcript

  1. The Perils of Writing a PaaS Andrew Godwin http://www.flickr.com/photos/jannem/2719976702/

  2. Hi, I'm Andrew. Serial Python developer Django core committer Sysadmin

    by night
  3. We're ep.io Python Platform-as-a-Service Utility billing PostgreSQL, Redis, Celery, and

    more
  4. We built a… prototype. Me and Ben Firshman Three or

    four days' hacking at DjangoCon Ran code, had simple deployment
  5. The last 10%... A month or two of hibernation Went

    part-time in December Private beta since February Public launch later this year
  6. Why? Why not?

  7. Why? Why not? Lack of good solutions Strong, technical team

    Writing backend code is fun
  8. It's a challenge We're still a closed beta 300+ apps,

    on 4 servers Some people just have crazy code Security, security, security
  9. Our Architecture

  10. ep.io Cloud Request Sugar XML Response Code Magic

  11. Balancer Runner Runner Runner App 1 App 2 App 3

    App 2 App 4 App 1 Databases File Storage
  12. Load Balancer Started with HaProxy Moved to custom Python loadbalancer

    Still needs refinement
  13. Runners Daemon on each machine Nginx + gunicorn for each

    app instance Output captured, CPU time measured
  14. Coordinator Analyses whole system Juggles apps between servers Detects dead

    servers
  15. PostgreSQL Normal PostgreSQL 9 install Daemon to read query logs,

    make users
  16. Redis Custom Redis loadbalancer/manager Starts processes on demand Handles multi-user

    security
  17. Upload Receiver SSH endpoint for git, hg, commands Wraps VCSs,

    extracts uploaded files Creates filesystem images
  18. Other Services Log aggregation UID assignment Calculate costs

  19. Statistics Queued in Redis Consumed asynchronously Currently stored in Redis,

    changing soon Graphed and profiled
  20. Configuration Management Puppet for the simpler stuff Daemons handle complex

    stuff Don't try to reinvent the wheel
  21. Monitoring Nagios SaaS monitoring Nagios Emails, texts, pager Several custom

    checks
  22. Backups Currently just rdiff-backup Moving to btrfs snapshots + DRBD

    HA is not a backup solution
  23. Perils

  24. Initial bad design (To be fair, it was a prototype)

  25. Networks really aren't reliable (Well, EC2's, at least.)

  26. Memory pressure is bad (Prepare to have a fallback. And

    another.)
  27. Raw file handles are… fun. (As is the PTY subsystem.

    Be very careful.)
  28. Write just enough automation (If a server dies, I now

    just go and get a drink)
  29. HaProxy doesn't like 500+ backends (it's not exactly common)

  30. Single redundancy is only so good (and remember, HA is

    not backups!)
  31. Future Perils

  32. Payment (Already underway, still hard)

  33. Oversized Sites (we need to get a lot bigger first)

  34. European Servers (people really do want them)

  35. More Databases (how on earth do you measure MongoDB use?)

  36. More Languages (easy to get it working, hard to polish)

  37. The Potential Big Outage (quite useful as a motivational tool)

  38. Thank you. Andrew Godwin @andrewgodwin andrew@ep.io