Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Surviving technology transitions: Adding and (more importantly) removing tools from an existing stack

Maggie Zhou
October 27, 2015

Surviving technology transitions: Adding and (more importantly) removing tools from an existing stack

We share our lessons learned in removing and adding technologies, including stories from our Etsy experiences. Expect to come away with a better idea of the technical and political problems involved in these changes.

Your technology stack doesn’t remain static for the life of your company. As you need to scale, as you improve performance, as new tools become available, you will find yourself needing to add new tools to your stack, and hopefully remove their predecessors. The past choices made sense at the time, but now you need to evaluate new choices and make the effort to improve your infrastructure. Here’s what we’ve learned about the process.

The lifecycle of a tool: how we experiment with new infrastructure, upgrade existing things, and get rid of old infrastructure.

Maggie Zhou

October 27, 2015
Tweet

More Decks by Maggie Zhou

Other Decks in Technology

Transcript

  1. Melissa Santos | @ansate Maggie Zhou | @zmagg Evaluating Change

    Have a process. ! ! ! ! It will probably suck at first. MareBearCrafts.etsy.com
  2. Melissa Santos | @ansate Maggie Zhou | @zmagg Evaluating Change

    Enumerate requirements ! and advantages of potential solutions
  3. Melissa Santos | @ansate Maggie Zhou | @zmagg Evaluating: Architecture

    Review EventPipe - a new ETL process for analytics logging, backed by Kafka
  4. Melissa Santos | @ansate Maggie Zhou | @zmagg EventPipe Architecture

    Review •The problem: old architecture was brittle. GET requests, logrotate •Wins: distributed system - near real time metrics into event health, sets us up for streaming analytics later
  5. Melissa Santos | @ansate Maggie Zhou | @zmagg Evaluating: Operability

    Review Understand: •how the system will break •how we will know •how we will react
  6. Melissa Santos | @ansate Maggie Zhou | @zmagg Operability Review:

    Eventpipe ! • Scenario: Franz dies on a single beacon box • Scenario: Apache dies on a single beacon box • Scenario: Franz dies on all beacon boxes • Scenario: Franz dies on *most* boxes • Scenario: One Kafka box shuts down cleanly • Scenario: Cut power to one Kafka box • Scenario: Cut power to one chassis (4 Kafka nodes) • Scenario: Most Kafka boxes are unreachable (see how many we can shut down and still stay up) • Scenario: One ZooKeeper dies • Scenario: Two ZooKeepers die
  7. Melissa Santos | @ansate Maggie Zhou | @zmagg Comfortable upgrades

    Test, test, test! ! How do you gain confidence? !
  8. Melissa Santos | @ansate Maggie Zhou | @zmagg Comfortable upgrades

    How do you gain confidence? •slow ramp-up •run an a/b experiment •ramp up the things that could go badly by themselves (high traffic pages, unique features)
  9. Melissa Santos | @ansate Maggie Zhou | @zmagg Evaluation Takeaways

    •It’s ok to choose NOT to do something •But if you do something, you gotta explain what problem it solves (and how) •Process helps people see how you made the decision
  10. Melissa Santos | @ansate Maggie Zhou | @zmagg Upgrading Takeaways

    •testestestestestest •Be clear about what’s different •Publicize and celebrate how it is better!
  11. Melissa Santos | @ansate Maggie Zhou | @zmagg Retiring Takeaways

    •It takes longer than you think it will •It’s gonna hurt •It’s going to feel so good once you’re done
  12. Melissa Santos | @ansate Maggie Zhou | @zmagg Thanks! title

    slide image from TheWoodChopShoppe.etsy.com