Building Composable Services (with notes)

Slide 1

Slide 1 text

Building Composable Services Noah Kantrowitz

Slide 2

Slide 2 text

What? Why? How? Perils? Press Start This talk is going to cover four aspects of composable services. What they are, why to build them, how to build them, and what common pitfalls to avoid.

Slide 3

Slide 3 text

What? World 1-1

Slide 4

Slide 4 text

What is a Function? Let's start at the beginning.

Slide 5

Slide 5 text

f(x, y) = z A function, in the mathematical sense is some operation that takes inputs and produces an output.

Slide 6

Slide 6 text

What is Idempotence? So then what is an idempotent function?

Slide 7

Slide 7 text

f(x) = f(f(x)) Idempotence means that a function's output doesn't change when you run it twice. This is important when talking about operations that might be run 1-or-more times. If you click "log in" twice, you should get the same result as if you clicked it once, this is idempotence.

Slide 8

Slide 8 text

What is Composable? Then what does composable mean?

Slide 9

Slide 9 text

f(x) g(x) f(g(x)) Composability is a property of functions, where you can use the output of one function as an input to another.

Slide 10

Slide 10 text

req_user(x) user_id(x) user_id(req_user(x)) This is already a common pattern in many applications, and maps nicely to an object oriented style.

Slide 11

Slide 11 text

post('http://login') get('http://search') The fun part is applying this idea across a network. Rather than having one big application with code modules for each task, build lots of little services that talk to each other.

Slide 12

Slide 12 text

Why? World 1-2

Slide 13

Slide 13 text

Availability The single biggest reason to use this style is the fault tolerance. A failure can bring down some of the services, but the rest will continue to operate as best they can. If your search service goes down, the search box will be disabled, but users should still be able to log in.

Slide 14

Slide 14 text

Scaling You can also scale up services more easily. If your search service gets overloaded, simply launch more of them behind a load balancer.

Slide 15

Slide 15 text

Testing The difficulty of testing a service goes up exponentially as the service gets bigger and has more interacting features. Small, isolated services lead to easier testing and thus often better test coverage.

Slide 16

Slide 16 text

Logistics As long as APIs are agreed upon between services, the deployment and operations of one need not impact the others. In many organizations, these are handled by different teams so this leads to a natural separation of concerns.

Slide 17

Slide 17 text

How? World 1-3

Slide 18

Slide 18 text

Frameworks Storage Rich Data Discovery Resilience Containers Level Up Select a Skill There are a lot of little things that contribute to successful microservices.

Slide 19

Slide 19 text

µ-frameworks While anything can be used to build a small, self-contained service, some tools are easier than others. I will focus on HTTP and REST-ish tools for most of this talk as they are the most common.

Slide 20

Slide 20 text

Flask (Python) Sinatra (Ruby) Express (JavaScript) The three most popular frameworks in their respective languages are Flask, Sinatra, and Express. These all share a simple API, basic URL routing, and minimal integration with things like an ORM or HTML rendering library. As most of our services will be making HTTP queries to other services and rendering results as JSON, this saves on unnecessary complexity.

Slide 21

Slide 21 text

ZeroMQ nanomsg ProtoBufs Cap'nProto While HTTP and JSON are the most common formats used, you should know about a few of the alternatives. ZeroMQ and nanomsg provide a more compact wire protocol than HTTP, and Protocol Buffers and Cap'n Proto provide a more compact message serialization than JSON.

Slide 22

Slide 22 text

Data Storage (aka state) I mentioned most of your services will consume data from other services, but eventually some information does need to be stored somewhere. Just as we build models to wrap the database to control database access, in a composable world we make model services. This helps keep the surface area between the services and the databases to minimum.

Slide 23

Slide 23 text

AP Database While the speciﬁcs of different databases are beyond the scope of this talk, the decentralized nature of this style does mesh very well with AP databases like Riak and Cassandra.

Slide 24

Slide 24 text

Cache is the enemy The conventional wisdom in many web development circles is to cache early and cache often. I am here to tell you down this path lies madness. Each of those caches is really a new database to worry about, and all the earlier issues with service/storage interactions apply again. If something must be cached, perhaps due to being very slow to compute but too big to store ahead of time, just like with the database there should be a model service that wraps the cache and hides it from the rest.

Slide 25

Slide 25 text

Rich Data {id: me, cart: http...} Rather than passing around very large data structures, you can break it into more manageable chunks and include links to them. Having the links included in the data rather than simply included tokens or opaque identiﬁers helps keep all the logic for accessing that bit of data in the service that manages it.

Slide 26

Slide 26 text

Hypermedia APIs And as a natural extension of rich data, hypermedia APIs provide some structures for common problems, like related objects and pagination.

Slide 27

Slide 27 text

Service Discovery Now you have two services that want to communicate. How do they know where to ﬁnd each other? Service discovery provides a way for cooperating services to locate each other.

Slide 28

Slide 28 text

Self-Organization One of the most important properties in any distributed system is self-organization, the ability of the system to shape itself to some extent. This allows the system to effectively route around minor failures, like a load balancer removing servers that are failing a health check.

Slide 29

Slide 29 text

DNS nslookup('login') DNS is one of the earliest forms of service discovery. In modern systems this can take one of two approaches, either use multiple records and round-robin on the client side or have the name map to a load balancer like HAProxy and have services register themselves with it. For the former approach, often this means that new services must be registered manually by an admin, and the latter means you need some other system to register with the load balancer. In cloud platforms that offer DNS or load balancer APIs, this can still be quite powerful.

Slide 30

Slide 30 text

ZooKeeper CP Database If you want more ﬁne-grained control over service registration and discovery, ZooKeeper is the most widely used tool for cluster management. It also enables higher-level operations like leader election and ephemeral registration.

Slide 31

Slide 31 text

Etcd Serf Consul Archaius Many services have grown in the shadow of ZooKeeper. As with databases, the speciﬁcs are beyond this talk, but be sure to check out all the options as each makes different tradeoffs and offers different APIs.

Slide 32

Slide 32 text

Resilience Building microservices doesn't automatically make them fault tolerant, but it does make it a lot easier than with a single, monolithic application.

Slide 33

Slide 33 text

Timeouts Idempotent Retries The two most important things in making a resilient services are careful control over network timeouts and ensuring that operations are idempotent. This allows you to detect failure quickly, and then simply repeat the failed operation as needed. Some operations can be naturally idempotent, like deleting a record.

Slide 34

Slide 34 text

post('chpw', nonce: 314) Others need explicit idempotence checks, such as checking update times or nonces to avoid race conditions.

Slide 35

Slide 35 text

Any service can be down Above all else, always have a strategy for dealing with any service being down. If it was a critical dependency of your service then perhaps you just send back an error message, but degrade gracefully where possible. As before, better to have the search box disabled than the whole site be down. This is the very essence of composable systems.

Slide 36

Slide 36 text

Async Messaging Queues When possible, use asynchronous messages instead of direct calls. This allows the queue to serve as a buffer between producer and consumer during failures and keeps latency down on the response to the user.

Slide 37

Slide 37 text

AMQP Kafka Two quick tool recommendations, AMQP and RabbitMQ in particular are the gold standard in queueing systems. Kafka is newer but has an impressive feature set and user base.

Slide 38

Slide 38 text

Containers On the operational side, microservices have some unique requirements.

Slide 39

Slide 39 text

Less RAM Less Problems Compared to traditional applications, these tend to use far fewer resources.

Slide 40

Slide 40 text

LXC Jails Zones Docker/Mesos This pairs very nicely with low-overhead virtualization systems like Linux's LXC, FreeBSD's Jails, and Solaris' Zones. Docker and Mesos both offer higher-level interfaces to these technologies, though both are complex topics to say the least.

Slide 41

Slide 41 text

Security Isolation Using these containerization systems allows putting very hard boundaries between each service, which helps with a defense-in-depth strategy. A single vulnerable service is less likely to cascade to others.

Slide 42

Slide 42 text

Immutable Deployment While not a requirement for it, containers also play nicely with immutable deployment. This is the idea that once launched, a container is a read-only object. In turn, this allows for powerful techniques like rolling deploys and rapid-response auto-scaling.

Slide 43

Slide 43 text

Logical Boundaries As your graph of services gets bigger, it is common to ﬁnd clusters of logically-related services that share an internal support service. Just as public/private functions work in a monolithic application, you can use subnet boundaries and other network-level controls to enforce system boundaries with microservices.

Slide 44

Slide 44 text

Perils? World 1-4

Slide 45

Slide 45 text

Cascade Failures One common issue is cascade failures and overloads. This is especially prevalent when caching slow operations, after a deploy the cache will get ﬂooded with requests and may not be able to handle them all. To address this you can build back-pressure in to the system. If a service can't handle incoming requests it can unregister from service discovery or signal things calling it to wait before trying again.

Slide 46

Slide 46 text

Poor Visibility The move from one application to many can impact visibility in to the status of the system and hinder debugging.

Slide 47

Slide 47 text

Health & Metrics Central Logging Dashboards While important in any infrastructure, having solid monitoring and log aggregation is absolutely critical with microservices. Specialized reporters and dashboards like Sentry and Graphite can provide important overviews of system health.

Slide 48

Slide 48 text

Complex Deployments Similarly deploying many microservices requires a lot more coordination than a single codebase. Orchestration tools like Fabric and RunDeck can help with this, though training and documentation are still important, especially for cross-team services.

Slide 49

Slide 49 text

Minimalist Self-organizing Fault-tolerant Just keep these three goals in mind and you'll be well on your way to building better services and better APIs.