Maintaining Simplicity in Log Based Architectures

Reactive Amsterdam February 2017 Maintaining Simplicity in Log Based Architectures

About Friso van Vollenhoven Mostly worked in software dev and
related roles. Former CTO at a (big) data analytics and machine learning company. Now CTO at FashionTrade. I am the proud owner of a three character Twitter handle: @fzk. I have 19 endorsements for Awesomeness on LinkedIn.

FashionTrade A B2B platform for fashion wholesale. Fashion brands and
retailer can connect and do business. E-commerce for (fashion) businesses. Tagline: “We simplify wholesale so you can Connect, Trade & Grow.” About

Reactive?

State and Truth

The bigger picture

Product information enters the brand integration API Validation Product information
is merged with existing data that applies Price lists, stock levels, existing images, etc. Product enters search engine Only if complete (i.e. it has a known price, availability, etc.) Product information is used for orders, confirmations, etc. The life of a product

The life of a product

State and Truth Examples Truth: Product Information (master data, stock
levels, prices) ElasticSearch index Merged (derived) product information Product image sets (thumbnails) Calculated product metadata

Truth: Brand reached out to retailer to connect State: Match
Making in progress between brand and retailer On successful match making workflow: New truth: connection established between brand and retailer State and Truth Examples

State vs. Truth Derived state is cheap Can be reconstructed
based on truth Allows for agility E.g. build new ElasticSearch index based on existing product information Truth is hard to reconstruct Comes from external sources Invest more in design thinking

Services

Why services Separates business concerns Allows people to work on
many things concurrently Works well with log based data architecture Makes for organisational scalability at the cost of added complexity in delivery

However...

Services maintain local state Managed autonomously by the service internally
99% use CRUD Some use a light version of Event Sourcing Services and Dependencies: local state

Technologies Google Cloud Datastore ElasticSearch The former is often authoritative
The latter is not allowed to be (can only contain derived state) Services and Dependencies: local state

Services and Dependencies: communication Services depend on other services’ state
Delivery managed by the transport Contents provided by the producing service Synchronous call (RPC) Asynchronous event (messaging)

A Word on Sync vs. Async It is sometimes said
that sync dependencies are a bad thing It’s so bad that Google released a open source framework for creating sync / RPC dependencies based on their internal experiences (gRPC, http://www.grpc.io/) Perhaps it’s not about sync vs. async It’s about autonomy of clients with respect to servers Update client and server side independently This equally holds for producers / consumers

Technologies HTTP based RPC (based on Swagger and client code
generation) Messages on Kafka Payload is JSON for both (more on that later) Typical path for a service is to use RPC first to get things done and later move to Kafka based dependency and maintain local state Services and Dependencies: communication

Enter Kafka

Kafka protocol is pipelined over TCP Outstanding requests queue up
in the TCP send buffer TCP already has back pressure (congestion control, slow start) Kafka producer Async in the happy flow Becomes blocking under upstream contention (back pressure again) Kafka consumer Poll based, not push (client requests next batch of messages) Why do we all like Kafka so much?

Also... Utilizes zero copy send for file transfer (can saturate
a 1Gb link) Configurable, durable, retention Configurable per message durability (# of ack nodes) Queue Compaction Saves the last seen message for each key Local ordering guarantees Important when using idempotent messages

Distributed, Persistent, Reactive… BINGO! Kafka is as close to a
reactive stream as it gets Use that; no need for an additional abstraction Use it for streaming only; keep other paradigms for local state

Things we do with Kafka A service that consumes products,
stock levels, price lists and produces enriched product information on a separate topic to be used for search, orders, product details display, etc. When we deploy a new version of search, build a new index by deploying the new search service with a new consumer group and resetting the consumer offset We can translate any business entity creation, update or deletion into a business metric using a simple consumer that translates events into a metrics aggregation to Datadog

Maintaining Autonomy

Server side needs to support all versions that ever existed
Client sticks to a specific version of the service + Easy to understand - Simple / small changes are costly Solution 1: Strict Versioning

Solution 2: Schema evolution When adding a new field to
an entity, it must be optional Removing fields from the schema can’t be done But a producer can stop populating optional fields Readers / consumers / clients must have sensible handling of empty optionals Usually default values Sometimes different behaviour

+ Adding fields is relatively simple + Clients / consumer
are concerned with schemas and projections, not versions + Allows for contextual interpretation of older versions - Requires discipline - Sometimes you want things not to be optional Solution 2: Schema evolution

There is some middleware that knows how to translate any
version to any other version Sometimes used in event sourcing + Transparent to the services - Doesn’t allow for contextual translation between versions Solution 3: Schema Translation

Autonomy: Conclusion You will end up with a combination of
versioning and schema evolution Schema translation doesn’t scale in practice Sometimes, you just want to delete the Kafka topic and start over But you can’t, so you start a new topic and use a new version on there Consumer can move to the new topic gradually Requires coordination, just like monolithic deployments

Real Life Concerns

Backup the truths as part of service local state Restore
service local state Trigger authoritative services to replay local state on Kafka Builds up all derived state Backup solved external to the service Hard to do when all services use different persistence Still work in progress... Back and Restore

This is when you switch to a new version of
something E.g. in the new design our representation of a brand doesn’t match the old one Schema evolution makes things overly complex here (union types, etc.) Create new entities instead, change to new API version / new Kafka topic These kind of change requires coordination This is normal Breaking Business Changes

Currently using JSON for everything Schemas defined in code Code
attracts logic; schemas shouldn’t have logic JSON is more troublesome than anticipated Really easy to publish evolution incompatible messages on a queue Conscious decision to lower learning curve while bootstrapping development Will move to binary message format with formal schema definitions as soon as possible Most likely Avro gRPC looks very promising for synchronous dependencies Schema discipline

High level metric for the health of consumers Determines overall
staleness in the system Be worried when it goes up Using Burrow (https://github.com/linkedin/Burrow) to keep track Custom integration with Datadog for metrics / monitoring / alerting In Kubernetes we could use consumer lag as a auto scaling trigger Doesn’t solve stuck consumers, of course Consumer lag

Some Observations

Warning: shameless “we’re hiring” slide coming up...

Vacancies Back end engineer (JVM, Python) Core Platform Customer Success
Solutions Front end engineer (JavaScript, React / Redux) Infrastructure / deployment engineer Responsible for infra, Kafka + ES clusters and the build + deployment pipeline

Conclusions

Mix paradigms, but use idiomatic implementation Understanding of concepts is
more important than implementations People can read API docs Think harder about the models of the sources of truth Kafka allows any derived state to be cheaply recomputed The former requires cheap deployment, schema evolution, and queue compaction Remember: no single abstraction deals with all production aspects of a system To keep it simple

Questions?

THANKS ! www.fashiontrade.com | [email protected]

Maintaining Simplicity in Log Based Architectures

Maintaining Simplicity in Log Based Architectures

More Decks by FashionTrade.com Engineering

Other Decks in Programming

Featured

Transcript