Save 37% off PRO during our Black Friday Sale! »

Open Source Infrastructure

Open Source Infrastructure

Like any other organization, CakePHP has applications and services it deploys to support the organization. Unlike many companies, it's developers are unpaid volunteers in many timezones. How does CakePHP manage it's infrastructure in a transparent, distributed fashion? This talk will cover tips and tricks we use to minimize our server overhead while keeping everyone informed of how things work.


Jose Diaz-Gonzalez

June 29, 2017


  1. Lightning Talk: Open Source Infrastructure June 10, 2017

  2. About Me • Systems Engineer @ SeatGeek • Cake Core

    Developer • •
  3. Some Stats • 25 Core Developers • 5 Different Continents

    • 15+ timezones • 12+ languages
  4. What do we do? • A butcher • A baker

    • A candlestick-maker
  5. What do we do? • Write and translate documentation •

    Maintain existing CakePHP Websites • Investigate new core/community initiatives • Provide support via chat/email/forums • Wrangle Social Media
  6. What day jobs do we have? • Car Parts Salesman

    • Company Owner • Professional Dancer • Software Developer
  7. We have real jobs, With real lives, and concerns other

    than: Is the docs site up?
  8. We have real jobs, With real lives, and concerns other

    than: Did the server get hacked?
  9. We have real jobs, With real lives, and concerns other

    than: Why is the server down?
  10. We have real jobs, With real lives, and concerns other

    than: Who can deploy the bakery?
  11. What is the problem? We need to ensure that that

    the CakePHP Sites and related services are highly available to our users with minimal interference
  12. What does this even mean?

  13. • Centralized Logging • Server Metrics and APM • Authentication

    and ACL for infrastructure access • Backups (and backup testing!) • Scaling • Disaster Recovery
  14. All things that are full time jobs

  15. At normal, paid institutions

  16. For teams of dedicated systems engineers

  17. None
  18. Why is this even a problem?

  19. Moar background • 5-10 people online at any time •

    Might be busy with paid work, side projects, or life • Some with tons of infra experience, most with none • Language Barrier • Little to no on boarding time
  20. Can we pay for a service? • CakeDC isn’t made

    out of money • Services are expensive • Still require maintenance, on-boarding
  21. What about our systems engineers? • Full time jobs/burnout •

    Different tech than day job • May not be available
  22. Choosing Technology Because technology solves everything™

  23. Solves the problem Not creating interesting ones for the hell

    of it
  24. Familiar to maintainers Or at least some of them

  25. Quick to pick up For those for whom the tech

    is new
  26. Boring Choice is the Best Choice

  27. Easy to extend Needs change, so should infra

  28. Infra as code Why is this setting there? Who applied

    this change?
  29. Configuration Management with Ansible

  30. But why tho? • Everyone can read YAML • Low

    learning curve, lots of tutorials • Maps well to existing server tasks • Easy to write custom modules
  31. But why not tho? • Everyone needs ssh access •

    Repo credentials are in the open, even if encrypted • YAML sucks as an automation language • Moves fast and breaks things
  32. Continuous Integration Via Jenkins

  33. But why tho? • Everyone has used it, everyone hates

    it equally • Jobs can be generated via Groovy DSL • Deployable via Docker • Plugins for everything
  34. But why not tho? • Ecosystem is constantly moving •

    Default UI isn’t great • Really easy to use/abuse plugins non-pipeline jobs
  35. But why not CircleCI/TravisCI/Wrecker? • Expensive • Jobs are usually

    attached to a single repo • Hard to do OSS with secure secrets • At the whim of service providers
  36. Automated Deployments Using Dokku

  37. But why tho? • Already built • Integrates with Ansible,

    and mostly unattended • Designed with Docker in mind • Has an internal Champion • OSS
  38. But why other solutions? • Does not need to be

    clustered • We can withstand 30 min of downtime during restores • Setup/training costs much larger for K8s/Mesos/ Nomad • Custom scripting not necessary, can use Heroku Buildpacks • No need to rebuild the wheel and write build/release code
  39. Considerations For the considerate

  40. Access Control • Everyone has access, but is that appropriate?

    • Smaller circle of trust makes infra control easier, but harder to deal with in distributed context
  41. Access Control • Passwords and keys need to be decrypted

    • Web of trust for initial access must be established
  42. Access Control • Use strong authentication, prefer keys to passwords

    • SSH Session Auditing?
  43. Logging • Logs should be aggregated • Use the same

    format, json or logfmt • ISO8601 - Don’t make up date time formats
  44. Logging level=info datetime=2017-06-10T08:01:40+00:00 msg="Stopping all fetchers”

  45. Logging Add metadata to make logs useful

  46. Logging level=info datetime=2017-06-10T08:01:40+00:00 msg="Stopping all fetchers” id=ConsumerFetcherManager-1382721708341 tag=stopping_fetchers module=kafka.consumer.ConsumerFetcherManager

  47. Logging • Self-hosted logging can be cheaper, but more labor

    intensive • Ship first, filter logs later
  48. Monitoring • Our site needs to be globally available, does

    yours? • Server metrics via Diamond/StatsD/Graphite • Visualization via Grafana
  49. Monitoring • APM can be expensive • Site Speed and

  50. Backups Do Them and Verify Them

  51. Backupss • Backups go to Rackspace Cloud, manually cleared •

    Manual verification performed semi-monthly • Automated backup verification coming up
  52. Backupsss • No Offsite Backups • Backups not encrypted (yet)

    • No Disaster Recovery
  53. Questions?

  54. Thank you