Toward Predictability and Stability

@helenaedelson Toward Predictability and Stability At the edge of chaos

@helenaedelson Helena Edelson • Principal Engineer @ Lightbend • Member
of the Akka team • Former: Apple, Crowdstrike, VMware, SpringSource, Tuplejump • github.com/helena • twitter.com/helenaedelson • speakerdeck.com/helenaedelson Data, Analytics & ML Platform Infrastructure and Cloud Engineer Former biologist

@helenaedelson Word Salad Behind the buzzwords ©

@helenaedelson When systems reach a critical level of dynamism we
have to change our way of modeling and designing them • Stateful in a stateless world • Automation of everything - Ops, *aaS platforms • Persistence strategies across DCs, zones and regions • Data and query optimization • System availability and stability in all states of deployment and rolling restarts • Leveraging AI / ML to Rethinking Strategies

@helenaedelson Computational model embracing non-determinism - Actor Model of Computation,
Carl Hewitt • Mathematical theory treating "Actors" as primitives of concurrent computation • Framework for a theoretical understanding of concurrency • Asynchronous communication • Stateful isolated processes • Non-observable state within • Decoupling in space and time The Network and Autonomous Processes

@helenaedelson Principles that Akka stands on can be traced back
to the ’70s and ’80s • Carl Hewitt invented the Actor Model, early 70s • Jim Gray and Pat Helland on the Tandem System, 80s • Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986 Look Back Before Looking Forward

@helenaedelson • From the ’40s and still being heavily developed
today across many fields of research and application in industry. • 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John von Neumann, Los Alamos National Laboratory • 1970s: Conway's Game of Life • Asynchronous Cellular Automaton Complex Adaptive Systems, Systems Theory, early AI

@helenaedelson Can solve problems difficult or impossible for an individual
agent or a monolithic system to solve • The foundations for artificial neural networks and NLP • Composed of multiple autonomous agents, interacting to achieve common goals • Decentralized, no control point of decisions making • More fault tolerant, no single point of failure • Reach higher degrees of dependability Multi-Agent Systems (MAS)

@helenaedelson @helenaedelson Complex Adaptive Systems (CAS) Self-Organization Theory Emergence Synchronization
Ampliﬁcation Distributed Networks cellular automata Feedback Loops Systems Evolution Swarming local Asynchronous Unpredictable Non-Linear Adaptive Versatile

@helenaedelson Akka ActorSystem Message Message Actor Actor Actor

@helenaedelson Actor Task Delegation & Supervision ActorSystem Hierarchy

@helenaedelson Akka Cluster: Distributed & Multi-DC JVM JVM ActorSystem ActorSystem
Message Message Actor Actor Actor Message

@helenaedelson • Stateful - in-memory yet durable and resilient state
• Long-lived - lifecycle is not bound to a specific session, context available until explicitly destroyed • Virtual - location transparent and not bound to a physical location • Addressable - referenced through a stable address Akka Actors Also Happen To Be

@helenaedelson Consistency vs Availability Strong Consistency Always Available Operational Complexity
Total Cost of Ownership (TCO)

@helenaedelson Consistency vs Availability Strong Consistency Always Available Node 1
Node 2 Partition Tolerance Conflicting goals to weigh against each other

@helenaedelson Finding Balance CAP, Operational Complexity and the Network

@helenaedelson Everything We Do Is About Data

@helenaedelson Everything We Do Delivering Meaning Is Data

@helenaedelson Stream Processing Event Sourcing CQRS A few patterns and
approaches to event processing

@helenaedelson • Complex Event Processing (CEP) - developed 1989-1995 to
analyze event-driven simulations of distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large, distributed, time-sensitive systems • Stream Processing - mid-1990s research in real-time event data analysis, internet companies processing large number of events • Event Sourcing (ES) - from domain-driven design and enterprise development, processing very complex data models with often smaller datasets than internet companies • Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES • Also - CDC Structuring data as a stream of events

@helenaedelson • How data from system behavior is structured •
Capture all changes as a sequence of events in time • Store events as an immutable event log / append-only storage • Preserves the happened-before causality of events • Replay event log to reconstruct state within a given time window or all Event Sourcing

@helenaedelson Requirements - forensics • Auditable - what is the
current state and how it arrived there • Causality - observe and analyze a system's causal structure Applications For ES In Distributed Asynchronous Systems For example • Cybersecurity and Vulnerability Detection • Banking - what is the account balance and how did it arrive at that • Click stream • Accounting & Ledgers • Shopping Cart • Anything with a sequence of events that lead to X which must be preserved

@helenaedelson A pattern decoupling the write path (commands) from the
read path (queries) • Different access patterns and differing ratios of reads to writes is typical • Different schemas / data structures • Typically different teams around orgs owning the write and using/owning the read • No reason to share structure and bad practice (no monolith, loose coupling, etc.) • Command - Writers / Publishers publish without having awareness who needs to receive it or how to reach them (location, protocol...) • Query - Readers / Subscribers should be able to subscribe and asynchronously receive from topics of interest Command Query Responsibility Segregation (CQRS)

@helenaedelson My old diagram from 3 years ago: Kafka Summit:
Real Time Bidding (RTB) The write path and model is naturally separate and differs from the read:

@helenaedelson • Ingest large amounts of data, from multiple sources,
sometimes bursty so it can't overload the system • Write the raw data to a store so that • when algorithms change I can run the data stream over for new meaning • when nodes or applications fail I can replay data from a checkpoint to recover • Route the event streams to my ML/Analytics streams It Doesn't Matter What We Call It or Whether It's Microservices Or A Streaming Data Pipeline • Process and aggregate inbound data and store aggregates for querying historical against the stream • Not loose data • Be secure, probably encrypt/decrypt everything • Not pay massive cloud and data storage fees • Be sure my team can handle infrastructure TOC

@helenaedelson Buzzwords Are For Analysts

@helenaedelson Boundaries between Microservices and Stream Processing are gone

@helenaedelson Akka Persistence Stateful Actors • Enables stateful actors to
persist their state for recovery and replay from failure and error • Events persisted to storage, nothing is mutated (no read-modify-write) • Allows higher transaction rates and efficient replication • Only events received by the actor are persisted • Snapshotting for checkpoint replay • At least once message delivery semantics Event Stream As Replication Fabric

@helenaedelson Connect different event logs with Event-sourced processors for event
processing pipelines or graphs • Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and more • Built-in: in-memory heap based journal, local file-system based snapshot-store and LevelDB based journal Storage Plugins

@helenaedelson • Your algorithms have changed, you need to replay
historic data against the new logic • Rolling upgrade, restart, cluster migration • Error, e.g. after a JVM crash • Failure, e.g. cluster nodes or a DC went down, a network outage or partition • Cloud compute layer planned maintenance restarts • Application throws exception, if a persistent Actor is configured to restart by a supervisor Replay Reasons

@helenaedelson Akka out of the box gives us tooling for
each of these steps: • Failure awareness and lifecycle • Save state of failed node before failure • Load state that was in flight at time of failure (define time slice) • Replay from a checkpoint in a snapshot or run the full history • Resume operations Failure And Recovery

@helenaedelson Stateful Clusters • Cluster Singleton • Distributed Data •
Cluster Sharding • Split Brain Resolver • Distributed Lock & Kubernetes • Multi-DC • Cluster Bootstrapping & Service Discovery • Cluster Management APIs

@helenaedelson • Decentralized peer-to-peer • Cluster Formation and membership service
• Communication and Consensus • Leader and Roles • Cluster Lifecycle and Events • Failure Detector • Self-Healing • CoordinatedShutdown Akka Cluster: Quick Premise

@helenaedelson Cluster User API • What roles am I in,
what is my address • Join, Leave, Down • Programatic membership control • Register listeners to cluster events • Startup when configurable cluster size reached • Highly tunable behavior

@helenaedelson Cluster Communication S S S S S (leader)

@helenaedelson Heartbeats & Failure Detection A is unreachable! S S
S S S A (leader)

@helenaedelson Failure Detector

@helenaedelson S S S S A A is unreachable (leader)
Failure Detector

@helenaedelson A is reachable again S S S S S
A (leader) Failure Detector

@helenaedelson • ClusterDomainEvent: base type • MemberUp: member status changed
to Up • UnreachableMember: member considered unreachable by failure detector • MemberRemoved: member completely removed from the cluster • MemberEvent: member status change Up, Removed • Leader events • Reachability events Cluster Events

@helenaedelson • CurrentClusterState: current snapshot state of the cluster, sent
to new subscribers, unless InitialStateAsEvents specified • InitialStateAsEvents to receive messages which replay events to restore the current snapshot of the cluster state Cluster State

@helenaedelson Gossip Protocol

@helenaedelson Gossip Convergence The cluster state is a CRDT which
can be deterministically merged

@helenaedelson (leader) • Masterless • No Leader Election • Role
of the leader: only one who can change status • joining to up • exiting to removed Leader decisions are local to DC Cluster Leader

@helenaedelson Cluster Leader

@helenaedelson [api] [api] [worker, backend] [worker] [worker] Cluster Roles

@helenaedelson Cluster Membership State A CRDT which can be deterministically
merged Joining Up Leaving Exiting removed Down User Action Join Leader Action User Action Leave Leader Action Leader Action User Action Down

@helenaedelson Cluster Member Node Lifecycle Node Lifecycle: failure Node Lifecycle:
clean startup and graceful , coordinated shutdown

@helenaedelson Network Partitions Split Brain A, E & D Unreachable
A E B S S S B & C Unreachable B C D

@helenaedelson Network Partition: Split Brain Cluster State Cluster State

@helenaedelson developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver Split Brain Resolver (SBR) Strategies

@helenaedelson developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver SBR Strategy: Keep Majority Keep Majority: keep =
3

@helenaedelson Cluster Singleton Single point of cluster-wide decisions or coordination
ClusterSingletonManager ClusterSingletonManager (oldest) SingletonActor ClusterSingletonManager

@helenaedelson Cluster Singleton ClusterSingletonProxy Message ClusterSingletonManager ClusterSingletonManager (oldest) ClusterSingletonManager SingletonActor

@helenaedelson Cluster Singleton

@helenaedelson Cluster Singleton: On Failure (oldest) Failover Message ClusterSingletonManager SingletonActor
Downed or Network Partition ClusterSingletonProxy ClusterSingletonManager

@helenaedelson Strong Consistency Always Available Guarantees one instance of a
particular actor type per cluster Cluster Singleton doc.akka.io/docs/akka/current/scala/cluster-singleton

@helenaedelson Distributed Data, CRDTs & Eventual Consistency Partition and delay
tolerant data availability with multi-master replication

@helenaedelson An approach to eventual distributed consistency • Replicate data
across the network • Concurrent updates from different nodes without coordination • Mathematical properties guarantee eventual consistency • Updates execute immediately, unaffected by network faults • Consistency without consensus • Highly scalable and fault tolerant Conflict-Free Replicated Data Types (CRDT) A comprehensive study of Convergent and Commutative Replicated Data Types

@helenaedelson A replicated counter, which converges because the increment /
decrement operations commute • Service Discovery • Shopping Cart • Priority on low latency and full availability • Computation in delay-tolerant networks • Data aggregation • Partition-tolerant cloud computing • Collaborative text editing Application Of CRDTs A few implementations: • Riak Data Types • SoundCloud Roshi • Akka Distributed Data

@helenaedelson 1976: The maintenance of duplicate databases, Paul Johnson, Robert
Thomas 1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein 1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, M. West 1988: Commutativity-based concurrency control for abstract data types, W. Weihl 1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs 1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek 1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern 1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura 1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura 2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia 2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski Not New

@helenaedelson • Low latency and high availability • Data availability
despite network partitions • Nodes concurrently update as multi-master • Async state replication across the cluster • Granular control of consistency level for reads and writes • Key-value store like API Akka Distributed Data doc.akka.io/docs/akka/current/scala/distributed-data Replicated in-memory data store using CvRDT to share data between cluster nodes

@helenaedelson Concurrent updates from different nodes resolve via the monotonic
merge function,. Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement Registers Flag toggle boolean, LWWRegister - Last Write Wins register Sets GSet grow-only merge by union, ORSet observer-remove version vector Maps ORMap, ORMultiMap, LWWMap, PNCounterMap Graphs DAG Composable For More Advanced Types A comprehensive study of Convergent and Commutative Replicated Data Types

@helenaedelson Delta State CRDTs (δ-CRDTs) • A way to reduce
the need for sending the full state for updates • Sending only what changed • Merging done on the receiving side • Eventually consistent by default, and supports opt-in causal consistency Delta State Replicated Data Types GCounter GSet PNCounter PNCounterMap LWWMap ORMap ORMultiMap ORSet LWWRegister

@helenaedelson Replicator Replicator Replicator Replicator Replicator Replicator in memory key-value
store

@helenaedelson Replicator Replicator Replicator Replicator Replicator Update(key, ddata) Get(key) Subscribe(key,
actor) Update(key, delta) Replicator Protocol Delete(key)

@helenaedelson Simple Replicated Counter Monotonic sequence: increment / decrement

@helenaedelson Custom CvRDTs

@helenaedelson Granular Consistency Levels • strong consistency • highest latency
• lowest availability Majority is N/2 + 1 (nodes_written + nodes_read) > N

@helenaedelson Granular Consistency Levels • eventual consistency • low latency
• high availability (nodes_written + nodes_read) > N

@helenaedelson Capacity Tracker } put in common trait

@helenaedelson CDC Capacity Listener } put in common trait

@helenaedelson • By default the data is only kept in
memory and replicated to other nodes • If all nodes are stopped the data is lost • You can configure it to store on the local disk on each node (LMDB) • Or implement your own to another store via the trait • It will be loaded the next time the replicator is started Configurable Durable Storage

@helenaedelson Strong Consistency Always Available doc.akka.io/docs/akka/current/distributed-data Distributed Data Eventually consistent
- always accepts writes

@helenaedelson • Needing high consistency over availability and low latency
• Big Data - not currently intended for billions of entries • When a new node is added to the cluster all entries are propagated to it, hence top level entries should not exceed 100000 • Data is held in memory • If not using a delta-CRDT, when a data entry is changed the full state of that entry may be replicated to other nodes. Not Designed For

@helenaedelson Cluster Sharding Scale, Resilience & Consistency • Automatically distribute
entities of the same type over several nodes • Balance resources (memory, disk space, network traffic) across multiple nodes for scalability • Location transparency: Interact by logical ID • Increased fault tolerance - relocation on failure Life beyond Distributed Transactions Node 1 SR1 S1 S2 S3

@helenaedelson Each Entity Is A Consistency Boundary Sender on Node
1 Local ShardRegion Shards: groups of entities Node 1 SR 1 S1 S2 S3 Your Code, Supervised By Shards Message(gid)

@helenaedelson • Creates entity actors on demand • Supervises group
of entities - defined by the shard ID extraction N-Shards Per Cluster Node Entity B-1 SR2 SC SR1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C SR3 ShardCoordinator ShardRegion 1 ShardRegion 2 ShardRegion 3

@helenaedelson • Creates and supervises its shards • Knows how
to route messages by routing key ShardRegion Per Cluster Node Envelope(“c-1”) Entity B-1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C ShardCoordinator ShardRegion 1 ShardRegion 2 ShardRegion 3 Node 1 Node 2 Node 3

@helenaedelson • Stores Shard to Region mappings with Akka Persistence
• Monitors all cluster node status • If the SC goes down it starts up on another node and replays the state Shard Coordination Entity B-1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C ShardCoordinator (Cluster Singleton) ShardRegion 1 ShardRegion 2 ShardRegion 3

@helenaedelson Start Cluster Sharding On Node Sending data Your Entity
ID Extraction function Your Shard ID Extraction function Your custom shard allocation strategy Your Envelope type Or use built-in HashExtractor

@helenaedelson Cluster Sharding: Failover Entity B-1 Shard A Shard B
Entity A-1 Entity A-2 ShardCoordinator Downed Location Transparency Failover Entity C-1 Shard C ShardRegion 1 ShardRegion 2 Envelope(“c-1”)

@helenaedelson Strong Consistency Always Available Each entity is a boundary
of consistency Guarantees one instance per entity type at a time per cluster doc.akka.io/docs/akka/current/scala/cluster-sharding Cluster Sharding

@helenaedelson "Serverless is a new generation of platform-as-a-service offerings where
the infrastructure provider takes responsibility for receiving client requests and responding to them, capacity planning, task scheduling, and operational monitoring. Developers need to worry only about the logic for processing client requests." - Adzic et al Serverless computing: economic and architectural impact Serverless

@helenaedelson • Automated infrastructure running in a container pool •
A classic data-shipping architecture - we move data to the code, not the other way round • Pay be execution time • Autoscales with load • Event driven • Stateless • Ephemeral (5-15 minutes) FaaS

@helenaedelson Message In A FaaS Serverless Deployment User Function Deployment
Message Out

@helenaedelson • Load and event spikes needing massive parallelism •
Scaling from 0 to 10000s requests and down to zero • Simplifies delivery of scale and availability • As integration layer between various (ephemeral and durable) data sources • Processing stateless intensive workloads • As data backbone moving data from A to B and transforming it • Can work well for event-driven use cases What Is FaaS Good At Currently?

@helenaedelson Message In User Function Deployment Database Message Out Not
Serverless In An Ideal World FaaS With CRUD

@helenaedelson • Functions handle only one event source • Functions
are stateless, ephemeral, and short-lived • Computational context easily lost • Limited options for managing and coordinating distributed state • Limited options for the right consistency guarantees • Limited options for durable state, that is scalable and available • Expensive to load and store state from storage repeatedly Limitations With Serverless Distributed state is not well supported for complex distributed data workflows

@helenaedelson • No direct communication which means applications must pub-sub
all data over a storage medium • Too high latency for general purpose distributed computing problems For a discussion on this, and other limitations with FaaS read the paper, “Serverless Computing: One Step Forward, Two Steps Back” by Joe Hellerstein, et al. FaaS Does Not Have Addressability

@helenaedelson Stateful Serverless Knative, Akka Cluster, gRPC, CRDT

@helenaedelson Stateful Serverless Message In User Function Deployment Message Out
State In State Out We Need Better Models For Distributed State

@helenaedelson Serverless Event Sourcing Command In User Function Deployment Reply
Out Event Log In Events OUt

@helenaedelson Message In User Function Deployment Message Out States/Deltas IN
States/deltas OUT Serverless CRDTs

@helenaedelson Kubernetes Pod Kubernetes Pod Kubernetes Pod Knative stateful serving
Knative Events User Function (JavaScript, Go, Java,…) KNative Serving of Stateful Functions User Function (JavaScript, Go, Java,…) User Function (JavaScript, Go, Java,…) Distributed Datastore (Cassandra, DynamoDB, Spanner,…) gRPC

@helenaedelson Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes
Pod Kubernetes Pod Knative stateful serving User Function (JavaScript, Go, Java,…) Powered by Akka Cluster Sidecars User Function (JavaScript, Go, Java,…) User Function (JavaScript, Go, Java,…) Akka Sidecar Akka Sidecar Akka Sidecar Akka Cluster Distributed Datastore (Cassandra, DynamoDB, Spanner,…)

@helenaedelson Get Involved github.com/lightbend/stateful-serverless bit.ly/stateful-serverless-intro

@helenaedelson Find Out More • akka.io/docs • developer.lightbend.com - sample
distributed workers project • github.com/akka/akka-samples - many sample projects • discuss.akka.io - forums • academy.lightbend.com • developer.lightbend.com/docs/akka- commercial-addons • lightbend.com/videos-and-webinars • lightbend.com/learn

@helenaedelson Thank you speakerdeck.com/helenaedelson @helenaedelson github.com/helena Slides

Toward Predictability and Stability

Toward Predictability and Stability

More Decks by Helena Edelson

Other Decks in Technology

Featured

Transcript