Open Source Infrastructure

Lightning Talk: Open Source Infrastructure June 10, 2017

About Me • Systems Engineer @ SeatGeek • Cake Core
Developer • twitter.com/savant • github.com/josegonzalez

Some Stats • 25 Core Developers • 5 Different Continents
• 15+ timezones • 12+ languages

What do we do? • A butcher • A baker
• A candlestick-maker

What do we do? • Write and translate documentation •
Maintain existing CakePHP Websites • Investigate new core/community initiatives • Provide support via chat/email/forums • Wrangle Social Media

What day jobs do we have? • Car Parts Salesman
• Company Owner • Professional Dancer • Software Developer

We have real jobs, With real lives, and concerns other
than: Is the docs site up?

than: Did the server get hacked?

than: Why is the server down?

than: Who can deploy the bakery?

What is the problem? We need to ensure that that
the CakePHP Sites and related services are highly available to our users with minimal interference

What does this even mean?

• Centralized Logging • Server Metrics and APM • Authentication
and ACL for infrastructure access • Backups (and backup testing!) • Scaling • Disaster Recovery

All things that are full time jobs

At normal, paid institutions

For teams of dedicated systems engineers

Why is this even a problem?

Moar background • 5-10 people online at any time •
Might be busy with paid work, side projects, or life • Some with tons of infra experience, most with none • Language Barrier • Little to no on boarding time

Can we pay for a service? • CakeDC isn’t made
out of money • Services are expensive • Still require maintenance, on-boarding

What about our systems engineers? • Full time jobs/burnout •
Different tech than day job • May not be available

Choosing Technology Because technology solves everything™

Solves the problem Not creating interesting ones for the hell
of it

Familiar to maintainers Or at least some of them

Quick to pick up For those for whom the tech
is new

Boring Choice is the Best Choice

Easy to extend Needs change, so should infra

Infra as code Why is this setting there? Who applied
this change?

Conﬁguration Management with Ansible

But why tho? • Everyone can read YAML • Low
learning curve, lots of tutorials • Maps well to existing server tasks • Easy to write custom modules

But why not tho? • Everyone needs ssh access •
Repo credentials are in the open, even if encrypted • YAML sucks as an automation language • Moves fast and breaks things

Continuous Integration Via Jenkins

But why tho? • Everyone has used it, everyone hates
it equally • Jobs can be generated via Groovy DSL • Deployable via Docker • Plugins for everything

But why not tho? • Ecosystem is constantly moving •
Default UI isn’t great • Really easy to use/abuse plugins non-pipeline jobs

But why not CircleCI/TravisCI/Wrecker? • Expensive • Jobs are usually
attached to a single repo • Hard to do OSS with secure secrets • At the whim of service providers

Automated Deployments Using Dokku

But why tho? • Already built • Integrates with Ansible,
and mostly unattended • Designed with Docker in mind • Has an internal Champion • OSS

But why other solutions? • Does not need to be
clustered • We can withstand 30 min of downtime during restores • Setup/training costs much larger for K8s/Mesos/ Nomad • Custom scripting not necessary, can use Heroku Buildpacks • No need to rebuild the wheel and write build/release code

Considerations For the considerate

Access Control • Everyone has access, but is that appropriate?
• Smaller circle of trust makes infra control easier, but harder to deal with in distributed context

Access Control • Passwords and keys need to be decrypted
• Web of trust for initial access must be established

Access Control • Use strong authentication, prefer keys to passwords
• SSH Session Auditing?

Logging • Logs should be aggregated • Use the same
format, json or logfmt • ISO8601 - Don’t make up date time formats

Logging level=info datetime=2017-06-10T08:01:40+00:00 msg="Stopping all fetchers”

Logging Add metadata to make logs useful

Logging level=info datetime=2017-06-10T08:01:40+00:00 msg="Stopping all fetchers” id=ConsumerFetcherManager-1382721708341 tag=stopping_fetchers module=kafka.consumer.ConsumerFetcherManager

Logging • Self-hosted logging can be cheaper, but more labor
intensive • Ship ﬁrst, ﬁlter logs later

Monitoring • Our site needs to be globally available, does
yours? • Server metrics via Diamond/StatsD/Graphite • Visualization via Grafana

Monitoring • APM can be expensive • Site Speed and
Analytics

Backups Do Them and Verify Them

Backupss • Backups go to Rackspace Cloud, manually cleared •
Manual veriﬁcation performed semi-monthly • Automated backup veriﬁcation coming up

Backupsss • No Offsite Backups • Backups not encrypted (yet)
• No Disaster Recovery

Questions?

Thank you

Open Source Infrastructure

Open Source Infrastructure

More Decks by Jose Diaz-Gonzalez

Other Decks in Technology

Featured

Transcript