Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Toward Predictability and Stability

Toward Predictability and Stability

Data stream processing platforms and microservices platform infrastructure and strategies are converging. As we edge towards larger, more complex and decoupled systems, combined with the continual growth of the global information graph, our frontiers of unsolved challenges grow equally as fast. Central challenges for distributed systems include persistence strategies across DCs, zones or regions, network partitions, data optimization, system stability in all phases.

How does leveraging CRDTs and Event Sourcing address several core distributed systems challenges? What are useful strategies and patterns involved in the design, deployment, and running of stateful and stateless applications for the cloud, for example with Kubernetes. Combined with code samples, we will see how Akka Cluster, Multi-DC Persistence, Split Brain, Sharding and Distributed Data can help solve these problems.

Avatar for Helena Edelson

Helena Edelson

May 16, 2019
Tweet

More Decks by Helena Edelson

Other Decks in Technology

Transcript

  1. @helenaedelson Helena Edelson • Principal Engineer @ Lightbend • Member

    of the Akka team • Former: Apple, Crowdstrike, VMware, SpringSource, Tuplejump • github.com/helena • twitter.com/helenaedelson • speakerdeck.com/helenaedelson Data, Analytics & ML Platform Infrastructure and Cloud Engineer Former biologist
  2. @helenaedelson When systems reach a critical level of dynamism we

    have to change our way of modeling and designing them • Stateful in a stateless world • Automation of everything - Ops, *aaS platforms • Persistence strategies across DCs, zones and regions • Data and query optimization • System availability and stability in all states of deployment and rolling restarts • Leveraging AI / ML to Rethinking Strategies
  3. @helenaedelson Computational model embracing non-determinism - Actor Model of Computation,

    Carl Hewitt • Mathematical theory treating "Actors" as primitives of concurrent computation • Framework for a theoretical understanding of concurrency • Asynchronous communication • Stateful isolated processes • Non-observable state within • Decoupling in space and time The Network and Autonomous Processes
  4. @helenaedelson Principles that Akka stands on can be traced back

    to the ’70s and ’80s • Carl Hewitt invented the Actor Model, early 70s • Jim Gray and Pat Helland on the Tandem System, 80s • Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986 Look Back Before Looking Forward
  5. @helenaedelson • From the ’40s and still being heavily developed

    today across many fields of research and application in industry. • 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John von Neumann, Los Alamos National Laboratory • 1970s: Conway's Game of Life • Asynchronous Cellular Automaton Complex Adaptive Systems, Systems Theory, early AI
  6. @helenaedelson Can solve problems difficult or impossible for an individual

    agent or a monolithic system to solve • The foundations for artificial neural networks and NLP • Composed of multiple autonomous agents, interacting to achieve common goals • Decentralized, no control point of decisions making • More fault tolerant, no single point of failure • Reach higher degrees of dependability Multi-Agent Systems (MAS)
  7. @helenaedelson @helenaedelson Complex Adaptive Systems (CAS) Self-Organization Theory Emergence Synchronization

    Amplification Distributed Networks cellular automata Feedback Loops Systems Evolution Swarming local Asynchronous Unpredictable Non-Linear Adaptive Versatile
  8. @helenaedelson • Stateful - in-memory yet durable and resilient state

    • Long-lived - lifecycle is not bound to a specific session, context available until explicitly destroyed • Virtual - location transparent and not bound to a physical location • Addressable - referenced through a stable address Akka Actors Also Happen To Be
  9. @helenaedelson Consistency vs Availability Strong Consistency Always Available Node 1

    Node 2 Partition Tolerance Conflicting goals to weigh against each other
  10. @helenaedelson • Complex Event Processing (CEP) - developed 1989-1995 to

    analyze event-driven simulations of distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large, distributed, time-sensitive systems • Stream Processing - mid-1990s research in real-time event data analysis, internet companies processing large number of events • Event Sourcing (ES) - from domain-driven design and enterprise development, processing very complex data models with often smaller datasets than internet companies • Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES • Also - CDC Structuring data as a stream of events
  11. @helenaedelson • How data from system behavior is structured •

    Capture all changes as a sequence of events in time • Store events as an immutable event log / append-only storage • Preserves the happened-before causality of events • Replay event log to reconstruct state within a given time window or all Event Sourcing
  12. @helenaedelson Requirements - forensics • Auditable - what is the

    current state and how it arrived there • Causality - observe and analyze a system's causal structure Applications For ES In Distributed Asynchronous Systems For example • Cybersecurity and Vulnerability Detection • Banking - what is the account balance and how did it arrive at that • Click stream • Accounting & Ledgers • Shopping Cart • Anything with a sequence of events that lead to X which must be preserved
  13. @helenaedelson A pattern decoupling the write path (commands) from the

    read path (queries) • Different access patterns and differing ratios of reads to writes is typical • Different schemas / data structures • Typically different teams around orgs owning the write and using/owning the read • No reason to share structure and bad practice (no monolith, loose coupling, etc.) • Command - Writers / Publishers publish without having awareness who needs to receive it or how to reach them (location, protocol...) • Query - Readers / Subscribers should be able to subscribe and asynchronously receive from topics of interest Command Query Responsibility Segregation (CQRS)
  14. @helenaedelson My old diagram from 3 years ago: Kafka Summit:

    Real Time Bidding (RTB) The write path and model is naturally separate and differs from the read:
  15. @helenaedelson • Ingest large amounts of data, from multiple sources,

    sometimes bursty so it can't overload the system • Write the raw data to a store so that • when algorithms change I can run the data stream over for new meaning • when nodes or applications fail I can replay data from a checkpoint to recover • Route the event streams to my ML/Analytics streams It Doesn't Matter What We Call It or Whether It's Microservices Or A Streaming Data Pipeline • Process and aggregate inbound data and store aggregates for querying historical against the stream • Not loose data • Be secure, probably encrypt/decrypt everything • Not pay massive cloud and data storage fees • Be sure my team can handle infrastructure TOC
  16. @helenaedelson Akka Persistence Stateful Actors • Enables stateful actors to

    persist their state for recovery and replay from failure and error • Events persisted to storage, nothing is mutated (no read-modify-write) • Allows higher transaction rates and efficient replication • Only events received by the actor are persisted • Snapshotting for checkpoint replay • At least once message delivery semantics Event Stream As Replication Fabric
  17. @helenaedelson Connect different event logs with Event-sourced processors for event

    processing pipelines or graphs • Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and more • Built-in: in-memory heap based journal, local file-system based snapshot-store and LevelDB based journal Storage Plugins
  18. @helenaedelson • Your algorithms have changed, you need to replay

    historic data against the new logic • Rolling upgrade, restart, cluster migration • Error, e.g. after a JVM crash • Failure, e.g. cluster nodes or a DC went down, a network outage or partition • Cloud compute layer planned maintenance restarts • Application throws exception, if a persistent Actor is configured to restart by a supervisor Replay Reasons
  19. @helenaedelson Akka out of the box gives us tooling for

    each of these steps: • Failure awareness and lifecycle • Save state of failed node before failure • Load state that was in flight at time of failure (define time slice) • Replay from a checkpoint in a snapshot or run the full history • Resume operations Failure And Recovery
  20. @helenaedelson Stateful Clusters • Cluster Singleton • Distributed Data •

    Cluster Sharding • Split Brain Resolver • Distributed Lock & Kubernetes • Multi-DC • Cluster Bootstrapping & Service Discovery • Cluster Management APIs
  21. @helenaedelson • Decentralized peer-to-peer • Cluster Formation and membership service

    • Communication and Consensus • Leader and Roles • Cluster Lifecycle and Events • Failure Detector • Self-Healing • CoordinatedShutdown Akka Cluster: Quick Premise
  22. @helenaedelson Cluster User API • What roles am I in,

    what is my address • Join, Leave, Down • Programatic membership control • Register listeners to cluster events • Startup when configurable cluster size reached • Highly tunable behavior
  23. @helenaedelson A is reachable again S S S S S

    A (leader) Failure Detector
  24. @helenaedelson • ClusterDomainEvent: base type • MemberUp: member status changed

    to Up • UnreachableMember: member considered unreachable by failure detector • MemberRemoved: member completely removed from the cluster • MemberEvent: member status change Up, Removed • Leader events • Reachability events Cluster Events
  25. @helenaedelson • CurrentClusterState: current snapshot state of the cluster, sent

    to new subscribers, unless InitialStateAsEvents specified • InitialStateAsEvents to receive messages which replay events to restore the current snapshot of the cluster state Cluster State
  26. @helenaedelson (leader) • Masterless • No Leader Election • Role

    of the leader: only one who can change status • joining to up • exiting to removed Leader decisions are local to DC Cluster Leader
  27. @helenaedelson Cluster Membership State A CRDT which can be deterministically

    merged Joining Up Leaving Exiting removed Down User Action Join Leader Action User Action Leave Leader Action Leader Action User Action Down
  28. @helenaedelson Cluster Singleton Single point of cluster-wide decisions or coordination

    ClusterSingletonManager ClusterSingletonManager (oldest) SingletonActor ClusterSingletonManager
  29. @helenaedelson Cluster Singleton: On Failure (oldest) Failover Message ClusterSingletonManager SingletonActor

    Downed or Network Partition ClusterSingletonProxy ClusterSingletonManager
  30. @helenaedelson Strong Consistency Always Available Guarantees one instance of a

    particular actor type per cluster Cluster Singleton doc.akka.io/docs/akka/current/scala/cluster-singleton
  31. @helenaedelson Distributed Data, CRDTs & Eventual Consistency Partition and delay

    tolerant data availability with multi-master replication
  32. @helenaedelson An approach to eventual distributed consistency • Replicate data

    across the network • Concurrent updates from different nodes without coordination • Mathematical properties guarantee eventual consistency • Updates execute immediately, unaffected by network faults • Consistency without consensus • Highly scalable and fault tolerant Conflict-Free Replicated Data Types (CRDT) A comprehensive study of Convergent and Commutative Replicated Data Types
  33. @helenaedelson A replicated counter, which converges because the increment /

    decrement operations commute • Service Discovery • Shopping Cart • Priority on low latency and full availability • Computation in delay-tolerant networks • Data aggregation • Partition-tolerant cloud computing • Collaborative text editing Application Of CRDTs A few implementations: • Riak Data Types • SoundCloud Roshi • Akka Distributed Data
  34. @helenaedelson 1976: The maintenance of duplicate databases, Paul Johnson, Robert

    Thomas 1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein 1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, M. West 1988: Commutativity-based concurrency control for abstract data types, W. Weihl 1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs 1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek 1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern 1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura 1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura 2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia 2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski Not New
  35. @helenaedelson • Low latency and high availability • Data availability

    despite network partitions • Nodes concurrently update as multi-master • Async state replication across the cluster • Granular control of consistency level for reads and writes • Key-value store like API Akka Distributed Data doc.akka.io/docs/akka/current/scala/distributed-data Replicated in-memory data store using CvRDT to share data between cluster nodes
  36. @helenaedelson Concurrent updates from different nodes resolve via the monotonic

    merge function,. Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement Registers Flag toggle boolean, LWWRegister - Last Write Wins register Sets GSet grow-only merge by union, ORSet observer-remove version vector Maps ORMap, ORMultiMap, LWWMap, PNCounterMap Graphs DAG Composable For More Advanced Types A comprehensive study of Convergent and Commutative Replicated Data Types
  37. @helenaedelson Delta State CRDTs (δ-CRDTs) • A way to reduce

    the need for sending the full state for updates • Sending only what changed • Merging done on the receiving side • Eventually consistent by default, and supports opt-in causal consistency Delta State Replicated Data Types GCounter GSet PNCounter PNCounterMap LWWMap ORMap ORMultiMap ORSet LWWRegister
  38. @helenaedelson Granular Consistency Levels • strong consistency • highest latency

    • lowest availability Majority is N/2 + 1 (nodes_written + nodes_read) > N
  39. @helenaedelson Granular Consistency Levels • eventual consistency • low latency

    • high availability (nodes_written + nodes_read) > N
  40. @helenaedelson • By default the data is only kept in

    memory and replicated to other nodes • If all nodes are stopped the data is lost • You can configure it to store on the local disk on each node (LMDB) • Or implement your own to another store via the trait • It will be loaded the next time the replicator is started Configurable Durable Storage
  41. @helenaedelson • Needing high consistency over availability and low latency

    • Big Data - not currently intended for billions of entries • When a new node is added to the cluster all entries are propagated to it, hence top level entries should not exceed 100000 • Data is held in memory • If not using a delta-CRDT, when a data entry is changed the full state of that entry may be replicated to other nodes. Not Designed For
  42. @helenaedelson Cluster Sharding Scale, Resilience & Consistency • Automatically distribute

    entities of the same type over several nodes • Balance resources (memory, disk space, network traffic) across multiple nodes for scalability • Location transparency: Interact by logical ID • Increased fault tolerance - relocation on failure Life beyond Distributed Transactions Node 1 SR1 S1 S2 S3
  43. @helenaedelson Each Entity Is A Consistency Boundary Sender on Node

    1 Local ShardRegion Shards: groups of entities Node 1 SR 1 S1 S2 S3 Your Code, Supervised By Shards Message(gid)
  44. @helenaedelson • Creates entity actors on demand • Supervises group

    of entities - defined by the shard ID extraction N-Shards Per Cluster Node Entity B-1 SR2 SC SR1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C SR3 ShardCoordinator ShardRegion 1 ShardRegion 2 ShardRegion 3
  45. @helenaedelson • Creates and supervises its shards • Knows how

    to route messages by routing key ShardRegion Per Cluster Node Envelope(“c-1”) Entity B-1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C ShardCoordinator ShardRegion 1 ShardRegion 2 ShardRegion 3 Node 1 Node 2 Node 3
  46. @helenaedelson • Stores Shard to Region mappings with Akka Persistence

    • Monitors all cluster node status • If the SC goes down it starts up on another node and replays the state Shard Coordination Entity B-1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C ShardCoordinator (Cluster Singleton) ShardRegion 1 ShardRegion 2 ShardRegion 3
  47. @helenaedelson Start Cluster Sharding On Node Sending data Your Entity

    ID Extraction function Your Shard ID Extraction function Your custom shard allocation strategy Your Envelope type Or use built-in HashExtractor
  48. @helenaedelson Cluster Sharding: Failover Entity B-1 Shard A Shard B

    Entity A-1 Entity A-2 ShardCoordinator Downed Location Transparency Failover Entity C-1 Shard C ShardRegion 1 ShardRegion 2 Envelope(“c-1”)
  49. @helenaedelson Strong Consistency Always Available Each entity is a boundary

    of consistency Guarantees one instance per entity type at a time per cluster doc.akka.io/docs/akka/current/scala/cluster-sharding Cluster Sharding
  50. @helenaedelson "Serverless is a new generation of platform-as-a-service offerings where

    the infrastructure provider takes responsibility for receiving client requests and responding to them, capacity planning, task scheduling, and operational monitoring. Developers need to worry only about the logic for processing client requests." - Adzic et al Serverless computing: economic and architectural impact Serverless
  51. @helenaedelson • Automated infrastructure running in a container pool •

    A classic data-shipping architecture - we move data to the code, not the other way round • Pay be execution time • Autoscales with load • Event driven • Stateless • Ephemeral (5-15 minutes) FaaS
  52. @helenaedelson • Load and event spikes needing massive parallelism •

    Scaling from 0 to 10000s requests and down to zero • Simplifies delivery of scale and availability • As integration layer between various (ephemeral and durable) data sources • Processing stateless intensive workloads • As data backbone moving data from A to B and transforming it • Can work well for event-driven use cases What Is FaaS Good At Currently?
  53. @helenaedelson • Functions handle only one event source • Functions

    are stateless, ephemeral, and short-lived • Computational context easily lost • Limited options for managing and coordinating distributed state • Limited options for the right consistency guarantees • Limited options for durable state, that is scalable and available • Expensive to load and store state from storage repeatedly Limitations With Serverless Distributed state is not well supported for complex distributed data workflows
  54. @helenaedelson • No direct communication which means applications must pub-sub

    all data over a storage medium • Too high latency for general purpose distributed computing problems For a discussion on this, and other limitations with FaaS read the paper, “Serverless Computing: One Step Forward, Two Steps Back” by Joe Hellerstein, et al. FaaS Does Not Have Addressability
  55. @helenaedelson Stateful Serverless Message In User Function Deployment Message Out

    State In State Out We Need Better Models For Distributed State
  56. @helenaedelson Kubernetes Pod Kubernetes Pod Kubernetes Pod Knative stateful serving

    Knative Events User Function (JavaScript, Go, Java,…) KNative Serving of Stateful Functions User Function (JavaScript, Go, Java,…) User Function (JavaScript, Go, Java,…) Distributed Datastore (Cassandra, DynamoDB, Spanner,…) gRPC
  57. @helenaedelson Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes

    Pod Kubernetes Pod Knative stateful serving User Function (JavaScript, Go, Java,…) Powered by Akka Cluster Sidecars User Function (JavaScript, Go, Java,…) User Function (JavaScript, Go, Java,…) Akka Sidecar Akka Sidecar Akka Sidecar Akka Cluster Distributed Datastore (Cassandra, DynamoDB, Spanner,…)
  58. @helenaedelson Find Out More • akka.io/docs • developer.lightbend.com - sample

    distributed workers project • github.com/akka/akka-samples - many sample projects • discuss.akka.io - forums • academy.lightbend.com • developer.lightbend.com/docs/akka- commercial-addons • lightbend.com/videos-and-webinars • lightbend.com/learn