Laurent Raufaste @_LR_
CFEngine on AWS:
a Stateless Infrastructure
Slide 2
Slide 2 text
Hello Ops
Slide 3
Slide 3 text
I work at Percolate
Slide 4
Slide 4 text
Percolate helps brands create
content at a social scale
We are a tech company
Slide 5
Slide 5 text
We are a SaaS
We live in the cloud
Slide 6
Slide 6 text
• 5% serving data
• 10% doing chores
• 85% working on data
We use a bunch of servers
Slide 7
Slide 7 text
• ingest data
• digest data
• close to RT
Those 85% do
Slide 8
Slide 8 text
We need to act smart
to keep the business
sustainable
It’s expensive
Slide 9
Slide 9 text
CFEngine
Slide 10
Slide 10 text
A tool to gently "dictate"
what your infrastructure
should be
#dontgetmadmarkburgess
#WTF is #CFE ?
Slide 11
Slide 11 text
• 1993 CFEngine
• 2003 Puppet
• 2006 EC2
• 2008 CFEngine 3
• 2009 Chef
Some history
Slide 12
Slide 12 text
Our Redis policy
A simple example
Slide 13
Slide 13 text
Why CFEngine ?
Chef, Puppet, Ansible
Slide 14
Slide 14 text
Convergence
Keep promises if it can.
No need to start from a known
state.
Slide 15
Slide 15 text
Portability
Same policies on Solaris,
GNU/Linux, *BSD, AIX,
HP-UX, Windows, OSX, …
Slide 16
Slide 16 text
CFEngine v1
released in 1993,
as a “teddy
bear”, it’s
reassuring: it’s
been used for
this long without
any big problem,
cf. OpenBSD’s “2
holes since 1996”
It’s old
Slide 17
Slide 17 text
Here come the
deal breakers
Let’s focus
Slide 18
Slide 18 text
The CFEngine DSL has been tailored for this
purpose, no legacy, based on the promise theory
Dedicated Language
Slide 19
Slide 19 text
Documented Infrastructure
Solves the outdated and useless doc problem
Slide 20
Slide 20 text
• grep the whole cluster
• what's in there is what's live
• no need to SSH
• knowledge is shared
• history is kept
• company is more valuable
Documented Infrastructure
Slide 21
Slide 21 text
We want to build for success, not failure
We hope what we build will succeed
Scalability
Slide 22
Slide 22 text
• Decentralized by nature
• Can scale both ways
• Largest cluster is X00,000s
• m1.small on AWS
Scalability
Slide 23
Slide 23 text
It let us build things that last and can be reused
Reusability
Slide 24
Slide 24 text
• DRY
• Build service/servers blocks
• Reuse them on live, staging, dev
• Change them once for all
Reusability
Slide 25
Slide 25 text
It’s tailored for the job
Footprint
Slide 26
Slide 26 text
• Package to install is < 3MB
• Largest binary is 320kB
(96% C, 3% C++)
• The server is just letting
clients download policies
• Clients are trying to apply
the policies locally
Footprint
Slide 27
Slide 27 text
It’s free (libre) and will ever be. It’s in Debian so it
passed the DFSG test: Fastest way to check.
It’s GPL
Slide 28
Slide 28 text
You can open bug reports and submit Pull
Requests on Github, a must nowadays
Open & active community
Slide 29
Slide 29 text
Here’s what CFEngine
allows us to do
Slide 30
Slide 30 text
We don’t let it pwn us
Pwn our infrastructure
Slide 31
Slide 31 text
Minimize redundancy and dependency
Normalized Infrastructure
Slide 32
Slide 32 text
As the Netflix Chaos Monkey,
I randomly kill instances
Being unpredictable, it’s fun
• Our DNS is our inventory
• We leverage it with a
coordination service (AWS Tags
(does not scale), Zookeeper, …)
Naming convention to leverage
CFEngine classes
CFEngine does not
take care of it all
It takes care of all
the basics
Slide 54
Slide 54 text
CFEngine does not
take care of it all
It makes sure the
complex pieces are
there and operational
Slide 55
Slide 55 text
syslog, smtp, ... you
don’t want to fail big
We started with the
simple and obvious
Slide 56
Slide 56 text
When we reached the big stuff, it was easy,
and we had all the bricks to reuse
We finished with the critical
Slide 57
Slide 57 text
Achievements
Slide 58
Slide 58 text
• Documentation
• Scalability
• Reusability
• Easy and fast to change
Recap of previous benefits
Slide 59
Slide 59 text
But our huge win is ...
Slide 60
Slide 60 text
What’s the big deal ?
Our
infrastructure
has no state
Slide 61
Slide 61 text
• Policies in git
• App code in git
• Data in datastores
• No backup: Images are cache
Our infrastructure has no state
Slide 62
Slide 62 text
2 exceptions:
S3 for cryptic generated config files (Jenkins)
EBS for large non-vital changing data (RabbitMQ)
No instance backup at all ?
Slide 63
Slide 63 text
No state is left on AWS (No AMI), we migrate away
For better prices, stability, features, mood
We are independent
Slide 64
Slide 64 text
But tell everyone to shut up (email). When
something happens, you'll know. Your goal is
silence: 0 email.
We know and hear everything
Slide 65
Slide 65 text
It does not scale. We update the live version and
every server updates itself. You can do this if your
infrastructure is limpid, CFEnginized.
We don’t push to deploy
Slide 66
Slide 66 text
Anything can go down, it will go up and rebuild
itself automatically - It happens nightly.
We are resilient
Slide 67
Slide 67 text
Upgrading a server takes 2 commands:
1. Launch a beefier instance with the same name
2. Kill the weak one
We can change our shape
Slide 68
Slide 68 text
We can launch and kill any server anytime. It
happens while we sleep.
We use spot instances, it’s
cheap!
Slide 69
Slide 69 text
For the smaller instance types
Slide 70
Slide 70 text
Some free tips
We are almost there
Slide 71
Slide 71 text
It’s pretty dense, e.g. “The Promise of System
Configuration” enlightened me
Watch Mark’s videos
Slide 72
Slide 72 text
Don’t bother anything else, it will give you the “I
understand” feeling we all love
Buy Diego’s book
Slide 73
Slide 73 text
We are hiring JS, Mobile, Python, Data, Ops
Work with CFEngine at Percolate