Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CFEngine on AWS: a Stateless Infrastructure

CFEngine on AWS: a Stateless Infrastructure

How Percolate uses CFEngine on AWS to build a resilient infrastructure, and leverage it to cut its operating cost using EC2 spot instances.

Avatar for Laurent Raufaste

Laurent Raufaste

October 15, 2013
Tweet

Other Decks in Technology

Transcript

  1. • 5% serving data • 10% doing chores • 85%

    working on data We use a bunch of servers
  2. • 1993 CFEngine • 2003 Puppet • 2006 EC2 •

    2008 CFEngine 3 • 2009 Chef Some history
  3. CFEngine v1 released in 1993, as a “teddy bear”, it’s

    reassuring: it’s been used for this long without any big problem, cf. OpenBSD’s “2 holes since 1996” It’s old
  4. The CFEngine DSL has been tailored for this purpose, no

    legacy, based on the promise theory Dedicated Language
  5. • grep the whole cluster • what's in there is

    what's live • no need to SSH • knowledge is shared • history is kept • company is more valuable Documented Infrastructure
  6. We want to build for success, not failure We hope

    what we build will succeed Scalability
  7. • Decentralized by nature • Can scale both ways •

    Largest cluster is X00,000s • m1.small on AWS Scalability
  8. • DRY • Build service/servers blocks • Reuse them on

    live, staging, dev • Change them once for all Reusability
  9. • Package to install is < 3MB • Largest binary

    is 320kB (96% C, 3% C++) • The server is just letting clients download policies • Clients are trying to apply the policies locally Footprint
  10. It’s free (libre) and will ever be. It’s in Debian

    so it passed the DFSG test: Fastest way to check. It’s GPL
  11. You can open bug reports and submit Pull Requests on

    Github, a must nowadays Open & active community
  12. Same infrastructure on all environments Live policies are used to

    build staging, smaller & fewer instances, and it’s always up to date
  13. Same infrastructure on all environments It takes a few mins

    to get a small replica of live on your workstation, and it’s always up to date
  14. • Develop in a branch • Test (Vagrant) • Review

    (Pull Request) • Merge • Deploy GitHub Flow applied to Ops
  15. Ops use IaaS+Metal to provide a PaaS to devs Be

    the Heroku or the GAE of your team
  16. CFEngine is missing the bootstrap process, is it really its

    job ? We did it in-house, in Python/Bash Bootstrapping
  17. • Request an instance • Name it • Install CFEngine

    • CFEngine handles the rest Bootstrapping
  18. 3 ordered dependencies max, e.g. “Hell” or deploy a Python

    app with on-demand pip requirements We don’t use CFEngine for complex stuff
  19. • Our DNS is our inventory • We leverage it

    with a coordination service (AWS Tags (does not scale), Zookeeper, …) Naming convention to leverage CFEngine classes
  20. • Application layer • CFE: Specialized layer (Role) • CFE:

    Basic layer (Environment) • Pristine Ubuntu • EC2 Server Structure
  21. CFEngine does not take care of it all It makes

    sure the complex pieces are there and operational
  22. syslog, smtp, ... you don’t want to fail big We

    started with the simple and obvious
  23. When we reached the big stuff, it was easy, and

    we had all the bricks to reuse We finished with the critical
  24. • Policies in git • App code in git •

    Data in datastores • No backup: Images are cache Our infrastructure has no state
  25. 2 exceptions: S3 for cryptic generated config files (Jenkins) EBS

    for large non-vital changing data (RabbitMQ) No instance backup at all ?
  26. No state is left on AWS (No AMI), we migrate

    away For better prices, stability, features, mood We are independent
  27. But tell everyone to shut up (email). When something happens,

    you'll know. Your goal is silence: 0 email. We know and hear everything
  28. It does not scale. We update the live version and

    every server updates itself. You can do this if your infrastructure is limpid, CFEnginized. We don’t push to deploy
  29. Anything can go down, it will go up and rebuild

    itself automatically - It happens nightly. We are resilient
  30. Upgrading a server takes 2 commands: 1. Launch a beefier

    instance with the same name 2. Kill the weak one We can change our shape
  31. We can launch and kill any server anytime. It happens

    while we sleep. We use spot instances, it’s cheap!
  32. Don’t bother anything else, it will give you the “I

    understand” feeling we all love Buy Diego’s book