Slide 1

Slide 1 text

Software Engineering and the Paradox of Craftsmanship Craſtsmanship Computer systems are broken from top to bottom, yet ! …we speak of craftsmanship.

Slide 2

Slide 2 text

The many faces of Theo Schlossnagle @postwait ! CEO Circonus

Slide 3

Slide 3 text

The nature of the problem Software Sucks Once you’ve run software at scale, you have a deep understanding of how it is all tied together with
 loose string and hope.

Slide 4

Slide 4 text

To write good software is exceptionally hard • Specifications are hard to write • Projects can be long • requirements change • stakeholders change • resources change • Environments change • People think laziness is a virtue

Slide 5

Slide 5 text

Technical debt is non-linear. Large projects have more code, ! higher learning curve, ! develop dangerous inertia. components

Slide 6

Slide 6 text

Large monoliths are more likely to fail. Rule .1 Big projects have more risk.

Slide 7

Slide 7 text

An existing tool should be used instead of a new tool. Strawman .h1 or you’ll end up with a sprawling, unmaintainable architecture ! in languages no one knows

Slide 8

Slide 8 text

Diversity is both
 pre-existing and emergent. Diversity • 30 git repos • 30 fixes to open-source (external)
 projects per month • 6 programming languages • 4 database technologies • 2 networking vendors • 3 server vendors • 3 operating systems http://www.bonkersworld.net/building-software/

Slide 9

Slide 9 text

It happens in the cloud just the same. Diversity • 30 git repos • 30 fixes to open-source (external)
 projects per month • 6 programming languages • 4 database technologies • 2 networking vendors • 3 server vendors • 3 operating systems • 4 cloud “services”
 (RDS,ElasticCache,etc.) http://www.bonkersworld.net/building-software/

Slide 10

Slide 10 text

Diversity is an emergent property of scale. Rule .2 Different languages & different architectures ! exist because of
 different problems.

Slide 11

Slide 11 text

Engineers are a weird folk. Engineers Different temporal universe. ! World is flat mentality. ! Interesting rituals.

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Learn ❖ Learn to hit a fucking deadline 
 Deadlines are arbitrary things…
 so is your product, your selected solution, & your job. ❖ Learn to do a fucking paper search
 Masturbation might be fun…
 not at work, not with your colleagues. ❖ Learn to fucking balance 
 Interruptions that derail your work may just add more value to the team than they cost you.

Slide 14

Slide 14 text

Engineers function better when autonomous. Rule .3 Not all perhaps, just the ones you want to hire. ! Autonomy of approach,
 not purpose.

Slide 15

Slide 15 text

Society in toto Social Contract open source projects companies teams

Slide 16

Slide 16 text

The social contracts of software APIs Application Programming Interfaces
 set expectations between any two components in an architecture. ! Life is simple
 when expectations are simple. ! Strive to minimize the surface area.

Slide 17

Slide 17 text

You shall be judged by your word API; be terse. Rule .4 Society is held together by a social contract; ! software architectures by an interface contract.

Slide 18

Slide 18 text

Components shall be right sized. Corollary .1 Too large violates .1 ! Just right leverages .3 ! Too small violates .4

Slide 19

Slide 19 text

Data storage backend replacement Anecdote .α1 From PostgreSQL ! To a custom time series DB

Slide 20

Slide 20 text

APIs are about minimizing surface area. SQL is not an API SQL’s surface area is so large ! it is daunting to replace

Slide 21

Slide 21 text

Storage growth on our PostgreSQL clusters Out of control Three events of
 removing retained data

Slide 22

Slide 22 text

Snowth design ❖ Need: zero-downtime ❖ Know: Agreement is hard. ❖ Know: Consensus is expensive. ❖ CAP theorem tradeoffs suck. ❖ CRDT (Commutative Replicated Data Type) n1 n2 n3 n4 n5 n6

Slide 23

Slide 23 text

n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4

Slide 24

Slide 24 text

n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 o1

Slide 25

Slide 25 text

n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 o1

Slide 26

Slide 26 text

n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4

Slide 27

Slide 27 text

n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 Availability
 Zone 1 Availability
 Zone 2

Slide 28

Slide 28 text

o1 n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4 Availability
 Zone 1 Availability
 Zone 2

Slide 29

Slide 29 text

Availability
 Zone 1 Availability
 Zone 2 o1 n1-1 n1-2 n1-3 n1-4 n2-1 n2-2 n2-3 n2-4 n3-1 n3-2 n3-3 n3-4 n4-1 n4-2 n4-3 n4-4 n5-1 n5-2 n5-3 n5-4 n6-1 n6-2 n6-3 n6-4

Slide 30

Slide 30 text

6 nodes, 85 vnodes per node. A real ring Keep it simple, stupid. ! We actually don’t do split AZ

Slide 31

Slide 31 text

Storage growth on our Snowth cluster Rethinking it all

Slide 32

Slide 32 text

Production deployment overlap of ~12 months Time & Safety Small, well-defined API ! allowed for low-maintenance, concurrent operations and ongoing development.

Slide 33

Slide 33 text

Snowth has its own issues. Craftsmanship Where it was needed most. ! And even there… ! Snowth is composed of
 several subsystems that allow for
 different levels of scrutiny

Slide 34

Slide 34 text

Message queue replacements Anecdote .α2 From RabbitMQ ! To Fq

Slide 35

Slide 35 text

A love affair with the rabbit… let’s just be friends RabbitMQ RabbitMQ is awesome for certain things ! not for everything.

Slide 36

Slide 36 text

Learning the hard way Oops…
 we broke it Once we passed 5-digit volumes,
 the world came crashing down. ! Rabbit simply turned into an
 operational nightmare for us at
 > 60,000 messages/second ! We thought it would do better,
 but we had tested with
 the wrong message sizes.

Slide 37

Slide 37 text

Replacing RabbitMQ is actually quite daunting. AMQP is complex Large surface area. Many features. ! Finding a better “thing”
 is almost impossible. https://www.flickr.com/photos/booleansplit/3782726220/

Slide 38

Slide 38 text

Step back, rethink. Separate Separate requirements to
 isolate used surface area. ! In this case, we separate the: ! • control plane • data plane https://www.flickr.com/photos/jeffk/156713228/

Slide 39

Slide 39 text

Leaving the control plane on RabbitMQ RabbitMQ lived Reduced message volumes stabilized our use.

Slide 40

Slide 40 text

Fucking Queues Fq was born F@#$*&%Q… ! Fast, brokered, in-memory and on-disk ! Doesn’t have XA transactions Doesn’t have acknowledgments Doesn’t have delivery confirmations ! It’s really fast… for us.

Slide 41

Slide 41 text

Deeper Realizations ❖ Autonomy: ❖ API deliberation ❖ API openness ❖ data serialization stability ❖ Right sizing components: ❖ reduces maintenance ❖ increases agility ❖ legitimizes rewrites

Slide 42

Slide 42 text

CRAFT Thank you.