CFEngine on AWS: a Stateless Infrastructure

Laurent Raufaste @_LR_ CFEngine on AWS: a Stateless Infrastructure

Hello Ops

I work at Percolate

Percolate helps brands create content at a social scale We
are a tech company

We are a SaaS We live in the cloud

• 5% serving data • 10% doing chores • 85%
working on data We use a bunch of servers

• ingest data • digest data • close to RT
Those 85% do

We need to act smart to keep the business sustainable
It’s expensive

CFEngine

A tool to gently "dictate" what your infrastructure should be
#dontgetmadmarkburgess #WTF is #CFE ?

• 1993 CFEngine • 2003 Puppet • 2006 EC2 •
2008 CFEngine 3 • 2009 Chef Some history

Our Redis policy A simple example

Why CFEngine ? Chef, Puppet, Ansible

Convergence Keep promises if it can. No need to start
from a known state.

Portability Same policies on Solaris, GNU/Linux, *BSD, AIX, HP-UX, Windows,
OSX, …

CFEngine v1 released in 1993, as a “teddy bear”, it’s
reassuring: it’s been used for this long without any big problem, cf. OpenBSD’s “2 holes since 1996” It’s old

Here come the deal breakers Let’s focus

The CFEngine DSL has been tailored for this purpose, no
legacy, based on the promise theory Dedicated Language

Documented Infrastructure Solves the outdated and useless doc problem

• grep the whole cluster • what's in there is
what's live • no need to SSH • knowledge is shared • history is kept • company is more valuable Documented Infrastructure

We want to build for success, not failure We hope
what we build will succeed Scalability

• Decentralized by nature • Can scale both ways •
Largest cluster is X00,000s • m1.small on AWS Scalability

It let us build things that last and can be
reused Reusability

• DRY • Build service/servers blocks • Reuse them on
live, staging, dev • Change them once for all Reusability

It’s tailored for the job Footprint

• Package to install is < 3MB • Largest binary
is 320kB (96% C, 3% C++) • The server is just letting clients download policies • Clients are trying to apply the policies locally Footprint

It’s free (libre) and will ever be. It’s in Debian
so it passed the DFSG test: Fastest way to check. It’s GPL

You can open bug reports and submit Pull Requests on
Github, a must nowadays Open & active community

Here’s what CFEngine allows us to do

We don’t let it pwn us Pwn our infrastructure

Minimize redundancy and dependency Normalized Infrastructure

As the Netflix Chaos Monkey, I randomly kill instances Being
unpredictable, it’s fun

2011-2013: Employees x10, Clients x20, Servers x2, Infrastructure cost x1.2
Maintain costs

Don’t let exceptions waste your time Keep your infrastructure homogeneous

Ops should not slow things down Not scared of changes

Ops at Percolate

Ops are sysadmins that do their job well: Build+Automate+Maintain+Monitor+Document Ops
are not DevOps

Ask your devs for the commands make them a policy
Devs are not DevOps

Same infrastructure on all environments Live policies are used to
build staging, smaller & fewer instances, and it’s always up to date

Same infrastructure on all environments It takes a few mins
to get a small replica of live on your workstation, and it’s always up to date

• Develop in a branch • Test (Vagrant) • Review
(Pull Request) • Merge • Deploy GitHub Flow applied to Ops

Ops use IaaS+Metal to provide a PaaS to devs Be
the Heroku or the GAE of your team

Pieces we added around CFEngine It does not solve it
all

CFEngine is missing the bootstrap process, is it really its
job ? We did it in-house, in Python/Bash Bootstrapping

• Request an instance • Name it • Install CFEngine
• CFEngine handles the rest Bootstrapping

We define all our servers in a INI file Bootstrapping

Everything can be overridden per instance type Bootstrapping

Easy to define, easy to launch Bootstrapping

3 ordered dependencies max, e.g. “Hell” or deploy a Python
app with on-demand pip requirements We don’t use CFEngine for complex stuff

• [id.][subrole.]role.environment • smtp.live.com • i-1ab345.worker.live.com • i-23f432.api.staging.com • lb.api.staging.com
Naming convention to leverage CFEngine classes

• Our DNS is our inventory • We leverage it
with a coordination service (AWS Tags (does not scale), Zookeeper, …) Naming convention to leverage CFEngine classes

• Application layer • CFE: Specialized layer (Role) • CFE:
Basic layer (Environment) • Pristine Ubuntu • EC2 Server Structure

CFEngine does not take care of it all It takes
care of all the basics

CFEngine does not take care of it all It makes
sure the complex pieces are there and operational

syslog, smtp, ... you don’t want to fail big We
started with the simple and obvious

When we reached the big stuff, it was easy, and
we had all the bricks to reuse We finished with the critical

Achievements

• Documentation • Scalability • Reusability • Easy and fast
to change Recap of previous benefits

But our huge win is ...

What’s the big deal ? Our infrastructure has no state

• Policies in git • App code in git •
Data in datastores • No backup: Images are cache Our infrastructure has no state

2 exceptions: S3 for cryptic generated config files (Jenkins) EBS
for large non-vital changing data (RabbitMQ) No instance backup at all ?

No state is left on AWS (No AMI), we migrate
away For better prices, stability, features, mood We are independent

But tell everyone to shut up (email). When something happens,
you'll know. Your goal is silence: 0 email. We know and hear everything

It does not scale. We update the live version and
every server updates itself. You can do this if your infrastructure is limpid, CFEnginized. We don’t push to deploy

Anything can go down, it will go up and rebuild
itself automatically - It happens nightly. We are resilient

Upgrading a server takes 2 commands: 1. Launch a beefier
instance with the same name 2. Kill the weak one We can change our shape

We can launch and kill any server anytime. It happens
while we sleep. We use spot instances, it’s cheap!

For the smaller instance types

Some free tips We are almost there

It’s pretty dense, e.g. “The Promise of System Configuration” enlightened
me Watch Mark’s videos

Don’t bother anything else, it will give you the “I
understand” feeling we all love Buy Diego’s book

We are hiring JS, Mobile, Python, Data, Ops Work with
CFEngine at Percolate

Laurent Raufaste @_LR_ Thank you

CFEngine on AWS: a Stateless Infrastructure

CFEngine on AWS: a Stateless Infrastructure

Other Decks in Technology

Featured

Transcript