Lessons from a DevOps Transformation on AWS

LESSONS FROM A DEVOPS TRANSFORMATION ON AWS

Who Am I? ❖ Former appserver developer ❖ Started with
Java, some Python, some Go ❖ Working with applications and operations on AWS since 2007 ➢ (Dev)Ops fascination from around the same time ➢ Led the engineering team for a SaaS product ➢ Had the good fortune to work with some extremely smart people ❖ Interests lie in distributed systems and scalability ❖ DevOps/Cloud practice lead @ImagineaTech ❖ DevOps editor at InfoQ.com ❖ Elsewhere ➢ https://www.linkedin.com/in/hrishikeshbarua ➢ https://twitter.com/talonx

The Product in Question ❖ Marketing platform for brands to
run customer engagement and loyalty campaigns ❖ SaaS model

Technology & Infrastructure ❖ Hosted on Amazon Web Services, initially
in one region, later spread over multiple ❖ EC2, S3, EBS, CloudFront ❖ External DNS (and later CDN) ❖ Mostly Java/JavaScript/MySQL/Kafka/Redis ❖ Integration with multiple third-party APIs and services ❖ Puppet/vagrant/Jenkins/Graphite/Collectd/Nagios

To Set Some Context ❖ Roughly covers the period 2010
- 2014, so some things might sound quaint today ❖ DevOps transformation took place over a period of years ➢ Started with small scale AWS infra, legacy tools, monolithic app architecture. ➢ Ended with multi-region one-click deployment, combination of mono + service oriented architecture, OSS + custom built ops tools. ➢ The following slides are a summary of some key learnings on AWS Ops.

We’ll focus on a few interesting areas

Monitoring ❖ Monitoring-as-a-Service or Self-hosted? ➢ You might need both,
if you have a complex/legacy + modern app or want more flexibility. ➢ Monitor the self-hosted monitor using the external one. ➢ Self-hosted monitoring tools and dashboards should have backups. If the AWS AZ in which you host your monitoring system goes down, you’ll be semi-blind. ❖ Choose the right tools ➢ Get rid of the dinosaur. Convincing your traditional IT folks about jettisoning Nagios might be the toughest part. ➢ Relational view is important. A single service might be dependent on others (e.g. a REST API dependent on DNS, LB, backend nodes, database, caching layer) - it’s important to be able to see this relationship in your dashboard.

Monitoring ❖ Watch out for AWS specific quirks ➢ Steal
time? Alerting software needs to take this into account. ❖ There’s no such thing as too much monitoring ➢ Monitor the AWS RSS feed - can serve as an indicator of potential problems. Caveats ▪ AWS Problems are sometimes localized. ▪ This can at best serve as an early warning system. ➢ Collect and plot everything ▪ Deployment points (Thanks, Etsy) ▪ Graphite is a swallow-all, easy to use system

Monitoring ❖ Automate ➢ The provisioning process for a server
(or a service) should take care of including it in your monitoring system.

Backups and Disaster Recovery ❖ Specifics usually depend on the
app architecture and the level of automation ❖ Instances ➢ Base AMI + Configuration Management? (Puppet/Chef/Ansible) ➢ Golden images + Immutable Servers? ➢ All of the above?

Backups & Disaster Recovery ❖ Databases ➢ Self-hosted vs RDS
▪ RDS limitations ➢ Replication, EBS snapshots ➢ Data consistency ▪ Freeze/unfreeze ▪ Database specific quirks for snapshotting ▪ Snapshotting the read-only slave? Ensure that the lag time is low (and monitored) ▪ Cross region backups (but is your app cross-region ready? If not, why bother?)

Security ❖ Go with VPC (older AWS accounts have both
Classic and VPC) ❖ Amazon provides the first level of defence ➢ Strong network component for DDOS, rest depends on you ➢ Plan security groups from the beginning

Security ❖ ssh keys ➢ Adopt a tool to manage
per-user ssh keys ➢ EC2 metadata for instance(s) will continue to show the original keypair name it was created with. The original public key may not even exist on the instance anymore if revoked, but the metadata will show it. This is because AWS has no way of knowing that you changed the authorized_keys file. ➢ You can upload your own keys to the AWS console and they will be available for use while launching EC2 instances. Your generated keys have to be RSA keys of 1024, 2048 or 4096 bits.

Security ❖ ssh keys ➢ Are AWS key-pairs confined to
a single region? This is true only if you consider the default state of affairs. You can get around it. ▪ For keys that you generate, you can import them to all the regions you want using the AWS console or the CLI tools. ▪ For keys that AWS generates, you can take the public key from an EC2 instance launched with that key, and import that in a similar manner to all the regions you want.

Automation ❖ CI ➢ Easy to set up, no excuses.
Once set up, have an owner for incremental improvements ➢ Don’t let Broken Windows remain broken ➢ The move to CD may not be so easy - needs buy-in from all quarters ❖ Configuration Management ➢ Again, hard to do if not done from the beginning ➢ Choose one (Ansible/Puppet/Chef) and master it

People & Architecture ❖ Have an owner for system architecture
➢ All architecture decisions however small, matter ➢ And most such decisions need to be taken “urgently” ❖ Buy-in from management ➢ Demonstrate value to the product/business. Visibility is paramount. Don’t expect to be understood all the time. ➢ “Make more awesome” - Jesse Robbins

People & Architecture ❖ Adopt uniform abstractions ➢ E.g. Don’t
adopt two different queueing software for two different purposes if one can handle both (“cool stuff syndrome”). ❖ Cross region failover is hard if not designed early ➢ Specifics will depend on your product

Thank You

Lessons from a DevOps Transformation on AWS

Lessons from a DevOps Transformation on AWS

Hrishikesh Barua

More Decks by Hrishikesh Barua

Other Decks in Technology

Featured

Transcript