$30 off During Our Annual Pro Sale. View Details »

The Peris of Writing a PaaS

The Peris of Writing a PaaS

A talk I gave at London Devops in May of 2011.

Andrew Godwin

May 10, 2011
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. The Perils of Writing a PaaS
    Andrew Godwin
    http://www.flickr.com/photos/jannem/2719976702/

    View Slide

  2. Hi, I'm Andrew.
    Serial Python developer
    Django core committer
    Sysadmin by night

    View Slide

  3. We're ep.io
    Python Platform-as-a-Service
    Utility billing
    PostgreSQL, Redis, Celery, and more

    View Slide

  4. We built a… prototype.
    Me and Ben Firshman
    Three or four days' hacking at DjangoCon
    Ran code, had simple deployment

    View Slide

  5. The last 10%...
    A month or two of hibernation
    Went part-time in December
    Private beta since February
    Public launch later this year

    View Slide

  6. Why?
    Why not?

    View Slide

  7. Why?
    Why not?
    Lack of good solutions
    Strong, technical team
    Writing backend code is fun

    View Slide

  8. It's a challenge
    We're still a closed beta
    300+ apps, on 4 servers
    Some people just have crazy code
    Security, security, security

    View Slide

  9. Our Architecture

    View Slide

  10. ep.io Cloud
    Request
    Sugar
    XML
    Response
    Code Magic

    View Slide

  11. Balancer
    Runner Runner Runner
    App 1
    App 2
    App 3
    App 2
    App 4
    App 1
    Databases File Storage

    View Slide

  12. Load Balancer
    Started with HaProxy
    Moved to custom Python loadbalancer
    Still needs refinement

    View Slide

  13. Runners
    Daemon on each machine
    Nginx + gunicorn for each app instance
    Output captured, CPU time measured

    View Slide

  14. Coordinator
    Analyses whole system
    Juggles apps between servers
    Detects dead servers

    View Slide

  15. PostgreSQL
    Normal PostgreSQL 9 install
    Daemon to read query logs, make users

    View Slide

  16. Redis
    Custom Redis loadbalancer/manager
    Starts processes on demand
    Handles multi-user security

    View Slide

  17. Upload Receiver
    SSH endpoint for git, hg, commands
    Wraps VCSs, extracts uploaded files
    Creates filesystem images

    View Slide

  18. Other Services
    Log aggregation
    UID assignment
    Calculate costs

    View Slide

  19. Statistics
    Queued in Redis
    Consumed asynchronously
    Currently stored in Redis, changing soon
    Graphed and profiled

    View Slide

  20. Configuration Management
    Puppet for the simpler stuff
    Daemons handle complex stuff
    Don't try to reinvent the wheel

    View Slide

  21. Monitoring
    Nagios
    SaaS monitoring Nagios
    Emails, texts, pager
    Several custom checks

    View Slide

  22. Backups
    Currently just rdiff-backup
    Moving to btrfs snapshots + DRBD
    HA is not a backup solution

    View Slide

  23. Perils

    View Slide

  24. Initial bad design
    (To be fair, it was a prototype)

    View Slide

  25. Networks really aren't reliable
    (Well, EC2's, at least.)

    View Slide

  26. Memory pressure is bad
    (Prepare to have a fallback. And another.)

    View Slide

  27. Raw file handles are… fun.
    (As is the PTY subsystem. Be very careful.)

    View Slide

  28. Write just enough automation
    (If a server dies, I now just go and get a drink)

    View Slide

  29. HaProxy doesn't like 500+ backends
    (it's not exactly common)

    View Slide

  30. Single redundancy is only so good
    (and remember, HA is not backups!)

    View Slide

  31. Future Perils

    View Slide

  32. Payment
    (Already underway, still hard)

    View Slide

  33. Oversized Sites
    (we need to get a lot bigger first)

    View Slide

  34. European Servers
    (people really do want them)

    View Slide

  35. More Databases
    (how on earth do you measure MongoDB use?)

    View Slide

  36. More Languages
    (easy to get it working, hard to polish)

    View Slide

  37. The Potential Big Outage
    (quite useful as a motivational tool)

    View Slide

  38. Thank you.
    Andrew Godwin
    @andrewgodwin
    [email protected]

    View Slide