Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Young Lady's Primer to Technical Decision-Makin...

Young Lady's Primer to Technical Decision-Making (with speaker notes)

Software is eating the world and probably your brain.

Over the last couple of years, we’ve seen an explosion of complexity in areas like polyglot storage, composable infrastructure, containerization and microservices, and coupling platforms (*aaS). Even five years ago, there were a set of fairly widely accepted best practices (virtualization, config management, RESTful services, and DBMS), but now every element of your stack is a never-ending rabbit hole of possibilities and questions.

Solid technical judgment is more important than ever. You can’t anticipate every problem, but you can identify and head off many of them in advance.

(Includes full speaker notes.)

Charity Majors

June 23, 2016
Tweet

More Decks by Charity Majors

Other Decks in Technology

Transcript

  1. A Young Lady’s Illustrated Primer to Architecture and Technical Decision-Making

    Charity Majors, honeycomb.io @mipsytipsy Good morning, Velocity!!! My name is Charity Majors. I’m the cofounder and engineer of a new little startup called honeycomb.io (up until last week we were called Hound), where we’re working on making impossible machine data problems exploratory and intuitive and accessible to everyone.
  2. The title of my talk comes from a book by

    Neal Stephenson called “The Diamond Age: A Young Lady’s Illustrated Primer”. How many of you have read it? It’s one of my absolute favorite books. <<CLICK>> It’s a cyberpunk story about a the future when the world and our bodies are literally teeming with nanotechnology and nanites and tiny little robots and all their buggy little software.
  3. “Software is eating the world” ~pmarca “and your brain, probably”

    ~me It’s really just the inevitable next step of of what every VC will tell you, which is that software is eating the world. <CLICK> And as *I* will tell you, it will soon be eating our brains. So in preparation for that day, let’s talk about technical judgment, also known as making good choices with software. We’ll talk about the problem, where the world is headed, and some quick tips for survival.
  4. Making better choices with software. Technical judgment is a thing

    that we all do every day. I don’t care where you are in the org chart, or where you are in the software lifecycle, whether green field or brown field. We’re all constantly threading the needle between innovation and stability, between making things nimble for development and stable for our users. Making good choices with software is more important than ever, but it’s also getting harder and more complicated, very very quickly. To prove it I did some science, since we’re all scientists here, and produced an extremely sciencey graph for you. ((2:15))
  5. By my precise calculations, you can see that, the complexity

    of infrastructure and storage options will be incomprehensible by any human on earth in mmmm six months, give or take. and you KNOW IT’S TRUE, I HAVE A GRAPH. How did we get here?
  6. 2000 “u can haz LAMP stack” Well 15 years ago

    we had the LAMP stack, where architectural judgment meant choosing wisely between mysql and postgres. and php or python or perl, right? Who here remembers cgi-bin? Good times.
  7. 2005 “Would you care to come over and discuss service

    oriented architectures and REST APIs? Over tea, perhaps.” “Splendid! And have you heard of this ‘NoSQL’ oddity?” “I am sure tis but a passing fad.” Ok, and then around 10 years ago we’re like this sucks, this is not maintainable. we started unpacking monolith applications into more loosely coupled services, connected by REST apis. At the time, this was confusing and complicated and kinda scary. We had to refactor immense amounts of infrastructure and code that was more or less WORKING. We got rid of maintenance windows! (unless you’re a bank.) Now this all seems completely basic, but at the time it was a challenging transition to a new way of thinking about infrastructure and software. We did it to increase resiliency, because doom was on the horizon.
  8. 2010-2013ish Infrastructure Virtualization Config management RDBMS Services with REST APIs

    Continuous integration/delivery Agile “DevOps” We got through it, we got used to it. Config management, virtualization, continuous delivery, all these things went from cutting edge to table stakes. Last time I was spinning up an infra from scratch was in 2012 for Parse, and there were a pretty reasonable set of best practices. But over the last couple years, man.
  9. 2016: welcome to the jungle even more logos I feel

    like 2014-2016 has been kind of a tipping point where a lot of crazy trends finally tipped over into viability. In January I sat down to design a new infrastructure from scratch for my new startup, and like, I feel like i’m pretty good at this stuff, but where do you even start?! so out of all this chaos, what kinds of patterns should we be paying attention to? (( 5:10))
  10. Accelerating trends in 2016 Polyglot storage Composable infrastructure Containerization, microservices

    Coupling platforms (*aaS) Storing more and more data forever There are some running themes. Polyglot storage is now something that basically everyone does. Composable infrastructure is a reality, not just a buzzword. Containers are growing up, kind of. We are gluing together more third party platforms than ever. There are more quality services, and fewer in-house engineering cycles, so we’re getting better about focusing on the critical path and outsourcing the rest to experts. What all of these trends have in common are that they increase complexity.
  11. Cambrian Explosion of technical complexity It’s actually really a exciting

    time to be in technology. The open source world is increasingly giving us access to the kind of tools that Google and Facebook etc have been using for years. You might think oh, more choices are great. And it will be! This explosion has so much creativity and energy and exciting new ways of modeling data and interacting with services. Eventually a lot of this stuff will mature, the weak will die off, all of this will become boring. We are headed to a better place. But in the meantime,
  12. the paradox of choice this is actually really hard on

    humans. This is actually really hard on humans who are trying to navigate the space. It can be cognitively overwhelming. There are not very many established best practices. A lot of the new solutions are *not* battle-hardened or production- ready. Many communities are immature. And there are incredibly few reliable narrators because everybody’s trying to sell you their own bleeding edge piece of crap so YOU get to be the one who battle-hardens it. (no judgment — i’m gonna be one of those assholes too pretty soon!)
  13. does this mean you shouldn’t use any of it? god

    no!!! This is the future, and the new ways ARE generally better than the old ways. it means we need to think carefully our tolerance for risk as an organization, and where to spend it for maximum impact.
  14. tips for making better technical decisions I have a few

    tips for thinking about new technology adoption that can be helpful. Most of it boils down to helping you figure out where to wisely spend your risk budget for maximum impact. These are the kinds of principles that your really good senior engineers have already internalized, and are often bringing to the table as intuition. So let’s unpack it a little bit, and figure out what feeds their intuition. Let’s start with the Prime Directive. This applies to every single one of us who work with technology, so let’s turn to the master himself and find out what Captain Picard has to say about it: ((8:31))
  15. Technology serves the mission Your technology exists to serve your

    mission. What is your mission? I don’t know. Maybe your mission is getting people health care, or making the world more open and connected, or empowering developers to build better mobile apps. Building software is not your mission. Even if your business is LITERALLY writing a database, building software is still not your mission. That’s just a layer of abstraction. Your mission is to do a better job of empowering your customers to achieve *their* goals in life. In fact, it may be more helpful to think of software as the enemy.
  16. Software is the enemy • Every piece of software adds

    fragility and points of failure • Everything you write will need to be debugged and maintained • It is easy to add software, and hard to remove it Every bit of software that you add to your systems adds more fragility and potential points of failure. Every bit of software you write yourself is something other people will be debugging and maintaining for years. And that is the BEST CASE outcome. Code is liability. The best code is no code. The second best code is code that someone else maintains, that lots of other people use, but you can still read and understand yourself if you absolutely have to (in other words, open source). The worst code is … anything else. It’s easy to lose sight of this. But As a technical leader, it is your responsibility to make sure that decisions are made to serve the mission. And generally that translates to, write as little software as possible.
  17. Resist software sprawl. Can you solve the problem with your

    existing tools? h/t @jessitron: http://blog.codeship.com/growing-tech-stack-say-no/ Which leads me straight in to the 1st principle (okay obviously i’m counting from zero here), which is that you should reuse solutions and components wherever possible. This *especially* applies to storage systems. The closer you get to the data layer, the heavier price you pay when supporting an additional component. PLEASE read @jessitron’s amazing blog post on this topic. For every storage component that you add to a production system, you need to monitor it, graph it, have people on call for it, upgrade it, maintain client compatibility with it, debug it, write libraries for it, scale it, deal with migrations and transformations and most importantly have multiple engineers who understand it. Another way of putting the same thing would be, as dan mckinley says: optimize globally rather than locally.
  18. Optimize globally, not locally If you pick the perfect language/storage

    solution for every local problem, you will have an unmanageable mess. If you optimize locally, picking the perfect language and storage system for every local problem, you end up with an incomprehensible nightmare. Now if you’re a baby startup, some local chaos is fine. You SHOULD be optimizing for experimentation and rapid iteration, because most startups fail and not because they moved too fast. But when your time horizon is longer, you need to shift your focus to maintainability. And how do we do that? Well, #2:
  19. Have a gating process for major new components • What

    is the relative gain? • Manufacture friction if necessary • Don’t micromanage outside the critical path … you must have SOME process for adding any major new components to production, like a new language or a new database. It shouldn’t be impossible, but it should have enough friction to make it *intentional*. The point of adding friction is forcing people to ask themselves questions, like “Can this be solved using our existing tools?“ Or like, what is the relative gain of the new tech? If you have ubuntu and someone wants to add redhat for no reason, fuck you NO. If you have concurrency issues with ruby and want to rewrite in go, it may be worth the pain. It is NOT worth the pain if somebody is just like “oh Go is cool, let’s rewrite it!” And don’t micromanage outside the critical path. But if you HAVE to add new components, start here: (( 11:38-12:14))
  20. Choose boring technology! • Failure modes are well understood •

    Rich library support for languages • For databases, extensive production hardening • Tooling and support for observability, debugging h/t @mcfunley, http://mcfunley.com/choose-boring-technology Choose boring technology. Dan McKinley wrote one of my favorite blog posts ever, about choosing boring technology. New software is going to break in lots of ways that are unknown and unpredictable, so you should limit your exposure to those unknown unknowns. Boring does not mean bad. Boring software runs the world. Boring just means its failure modes are well understood, the ecosystem is rich and mature. For a language, it means rich library support. For databases, it’s extensive stress testing in production and a robust user base. Use boring technology when you can. BUT. the corollary:
  21. Understand your appetite for risk • Early startups have massively

    greater tolerance for risk. • Use that risk! But spend it on your core differentiators. The earlier-stage your startup, the more risks you can and should take. This is actually your competitive advantage. This is HOW you disrupt the status quo -- by taking more risks, being more nimble, leveraging newer technology, having fewer sunk costs in the old profitable way of doing things. You have fewer humans, so you can get away with more localized chaos and less discipline. You have fewer customers to affect, and frankly they have lower expectations of you. Low expectations are great! Use it to your advantage while you can
  22. more considerations: • Can you pay someone to do it

    better, for cheaper? Value your own team’s time. • Replacing a thing? Great: define a timeline and get rid of the old thing. • You *should* give preference to things your team has expertise with. • Fuck hacker news. Some good reasons for supporting new components: It’s critical to your mission. if you’re replacing or upgrading — okay, but have a timeline and commit to actually getting rid of that old component, bc having to maintain both is the worst of all worlds. it is totally legit to give preference to something because you and your team already know how to use. You will have fewer unknown unknowns. There will always be gremlins in software, but you will know where they live and this helps a lot. Do not choose a new component because hacker news likes it or doesn’t like it, or because of vendor benchmarks. All vendors lie, you should know that by now. And don’t forget to ask: how good is the community?
  23. Are they friendly and welcoming? Do they have a code

    of conduct, do they deal with assholes effectively? Do they value new contributors or are they tribal and snobby? It is totally legitimate to make software choices that are influenced by the quality of the community. Quality communities make software more likely to succeed, and stick around, and improve over the long run.
  24. Operational Impact The more mature your company becomes, the more

    your technical choices must be driven by operational impact. Corollary: make as many ops problems as possible not your problem. Lastly: almost all of this advice all boils down to considering the operational impact and lifecycle of the technical decisions you make. The best engineers are the ones who do this instinctively. Outsource as many ops problems as you can. Anything that is not core to the success of your business, try to avoid doing it yourself Because problems that are core to your business, you are always going to need to keep in house. Save your precious engineering cycles for those. And remember: ((15:00))
  25. Celebrate the engineers who remove code, deprecate, and refactor, as

    much as those who add features. Culture can do a ton of heavy lifting for you here. If you celebrate developers who remove code, deprecate features, simplify and refactor, reduce duplication, just as much as you do developers who build features and add new things, you’ll get more of it. Remember: — i feel like i say this in every single talk i ever give — but the patterns you call out and celebrate in your culture are the patterns that will get repeated.
  26. Manifesto: 1. Technology serves the mission. 2. Reuse solutions. 3.

    Create friction for adding new components. 4. Choose boring technology, when you can. 5. Spend your risk tokens on key differentiators. 6. The longer you survive, the more operational impact trumps all. So here is your technical decision making manifesto: But … good news:
  27. Most startups fail! But they mostly fail for lack of

    traction or product/market fit, not because you chose the wrong language. 15 years ago someone was giving this same presentation, except they were complaining about how now they had to choice between C and C++, perl and python and php, java, oracle vs mysql, etc. However, the choices you make can have a huge impact on how quickly you can move, how much technical debt you accumulate, how happy the engineers on your team are, whether they stay, and your ability to recruit.
  28. remember, it’s gonna be worse by 2017. <3 But cheer

    up; enjoy it while you can, because these are already the good old days.