The Paradox of Software Craftsmanship

The Paradox of Software Craftsmanship

Craftsmanship in software tends to erode as team sizes increase. This can be due to a large variety of reasons, but is often dependent on code base size, team size, and autonomy. In this session I'll talk about some of the challenges companies face as these things change and how to manipulate teams, architectures and how people work to maintain software craftsmanship will still delivering product.

565250c4b8bbc8db56d434a482029a6d?s=128

Theo Schlossnagle

April 25, 2014
Tweet

Transcript

  1. Software Engineering and the Paradox of Craftsmanship Craſtsmanship Computer systems

    are broken from top to bottom, yet ! …we speak of craftsmanship.
  2. The many faces of Theo Schlossnagle @postwait ! CEO Circonus

  3. The nature of the problem Software Sucks Once you’ve run

    software at scale, you have a deep understanding of how it is all tied together with
 loose string and hope.
  4. To write good software is exceptionally hard • Specifications are

    hard to write • Projects can be long • requirements change • stakeholders change • resources change • Environments change • People think laziness is a virtue
  5. Technical debt is non-linear. Large projects have more code, !

    higher learning curve, ! develop dangerous inertia. components
  6. Large monoliths are more likely to fail. Rule .1 Big

    projects have more risk.
  7. An existing tool should be used instead of a new

    tool. Strawman .h1 or you’ll end up with a sprawling, unmaintainable architecture ! in languages no one knows
  8. Diversity is both
 pre-existing and emergent. Diversity • 30 git

    repos • 30 fixes to open-source (external)
 projects per month • 6 programming languages • 4 database technologies • 2 networking vendors • 3 server vendors • 3 operating systems http://www.bonkersworld.net/building-software/
  9. It happens in the cloud just the same. Diversity •

    30 git repos • 30 fixes to open-source (external)
 projects per month • 6 programming languages • 4 database technologies • 2 networking vendors • 3 server vendors • 3 operating systems • 4 cloud “services”
 (RDS,ElasticCache,etc.) http://www.bonkersworld.net/building-software/
  10. Diversity is an emergent property of scale. Rule .2 Different

    languages & different architectures ! exist because of
 different problems.
  11. Engineers are a weird folk. Engineers Different temporal universe. !

    World is flat mentality. ! Interesting rituals.
  12. None
  13. Learn ❖ Learn to hit a fucking deadline 
 Deadlines

    are arbitrary things…
 so is your product, your selected solution, & your job. ❖ Learn to do a fucking paper search
 Masturbation might be fun…
 not at work, not with your colleagues. ❖ Learn to fucking balance 
 Interruptions that derail your work may just add more value to the team than they cost you.
  14. Engineers function better when autonomous. Rule .3 Not all perhaps,

    just the ones you want to hire. ! Autonomy of approach,
 not purpose.
  15. Society in toto Social Contract open source projects companies teams

  16. The social contracts of software APIs Application Programming Interfaces
 set

    expectations between any two components in an architecture. ! Life is simple
 when expectations are simple. ! Strive to minimize the surface area.
  17. You shall be judged by your word API; be terse.

    Rule .4 Society is held together by a social contract; ! software architectures by an interface contract.
  18. Components shall be right sized. Corollary .1 Too large violates

    .1 ! Just right leverages .3 ! Too small violates .4
  19. Data storage backend replacement Anecdote .α1 From PostgreSQL ! To

    a custom time series DB
  20. APIs are about minimizing surface area. SQL is not an

    API SQL’s surface area is so large ! it is daunting to replace
  21. Storage growth on our PostgreSQL clusters Out of control Three

    events of
 removing retained data
  22. Snowth design ❖ Need: zero-downtime ❖ Know: Agreement is hard.

    ❖ Know: Consensus is expensive. ❖ CAP theorem tradeoffs suck. ❖ CRDT (Commutative Replicated Data Type) n1 n2 n3 n4 n5 n6
  23. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2

    n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4
  24. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2

    n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 o1
  25. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2

    n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 o1
  26. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2

    n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4
  27. n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2

    n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 Availability
 Zone 1 Availability
 Zone 2
  28. o1 n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1

    n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 Availability
 Zone 1 Availability
 Zone 2
  29. Availability
 Zone 1 Availability
 Zone 2 o1 n1-1 n1-2 n1-3

    n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4
  30. 6 nodes, 85 vnodes per node. A real ring Keep

    it simple, stupid. ! We actually don’t do split AZ
  31. Storage growth on our Snowth cluster Rethinking it all

  32. Production deployment overlap of ~12 months Time & Safety Small,

    well-defined API ! allowed for low-maintenance, concurrent operations and ongoing development.
  33. Snowth has its own issues. Craftsmanship Where it was needed

    most. ! And even there… ! Snowth is composed of
 several subsystems that allow for
 different levels of scrutiny
  34. Message queue replacements Anecdote .α2 From RabbitMQ ! To Fq

  35. A love affair with the rabbit… let’s just be friends

    RabbitMQ RabbitMQ is awesome for certain things ! not for everything.
  36. Learning the hard way Oops…
 we broke it Once we

    passed 5-digit volumes,
 the world came crashing down. ! Rabbit simply turned into an
 operational nightmare for us at
 > 60,000 messages/second ! We thought it would do better,
 but we had tested with
 the wrong message sizes.
  37. Replacing RabbitMQ is actually quite daunting. AMQP is complex Large

    surface area. Many features. ! Finding a better “thing”
 is almost impossible. https://www.flickr.com/photos/booleansplit/3782726220/
  38. Step back, rethink. Separate Separate requirements to
 isolate used surface

    area. ! In this case, we separate the: ! • control plane • data plane https://www.flickr.com/photos/jeffk/156713228/
  39. Leaving the control plane on RabbitMQ RabbitMQ lived Reduced message

    volumes stabilized our use.
  40. Fucking Queues Fq was born F@#$*&%Q… ! Fast, brokered, in-memory

    and on-disk ! Doesn’t have XA transactions Doesn’t have acknowledgments Doesn’t have delivery confirmations ! It’s really fast… for us.
  41. Deeper Realizations ❖ Autonomy: ❖ API deliberation ❖ API openness

    ❖ data serialization stability ❖ Right sizing components: ❖ reduces maintenance ❖ increases agility ❖ legitimizes rewrites
  42. CRAFT Thank you.