Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The trouble with data sharing

The trouble with data sharing

Separation of system to discrete services with their own storage brings challenges when data owned by remote service are needed to process a request.

The talk shows several common approaches to achieve sharing of data, and the effort needed to guarantee consistency of the data.

And the demonstrates a less common one, which tends to require least effort to get data replication right.

Avatar for Patrik Duditš

Patrik Duditš

May 29, 2019
Tweet

More Decks by Patrik Duditš

Other Decks in Technology

Transcript

  1. Domain Domain Domain Domain Domain Domain The Trouble with Data

    Sharing Technical approaches Consistency and Scaling Domain evolution Business evolution
  2. Desired features • Loose coupling • Consumer should not need

    to update when Producer updates and vice versa • Producer should not need to update when new Consumer is created • Third system (infrastructure) should not need to change on these occasions • (Eventual) consistency • Both parties should eventually arrive at same view of the world • Also in face of outages, or concurrent processing • Resiliency • Failure of consumer or producer should not block others
  3. Technical approaches by effort for parties* • Pull replication •

    Push replication Producer effort Consumer effort • DB replication • Messaging • Event store • Remote call • Low • Medium • High Infrastructure effort * Arbitrary subjective classification
  4. Remote call Implementations HTTP RPC Cache Positives • (Too) Easy

    and straightforward • Consistent (if not cache)
  5. Remote call Implementations HTTP RPC Cache Positives • (Too) Easy

    and straightforward • Consistent (if not cache) Negatives • Not self-sufficient • Limited query options • High Network bandwidth Consumer Efforts • Latency of remote call • Needs retries and fallback • Know your producer Producer Efforts • Endpoint per query • Scale with consumer
  6. Push replication Implementations HTTP RPC / Web hooks SOAP Positives

    • (Too) Easy and straightforward • Enterprise compliant
  7. Push replication Implementations HTTP RPC / Web hooks SOAP Positives

    • (Too) Easy and straightforward • Enterprise compliant Negatives • Not consistent • Lost, duplicate, out of order messages Consumer Efforts • Handle out of order messages • Prevent concurrent processing • Handle duplicate messages Producer Efforts • Know your consumers • Handle consumer unavailability • Initial load for new consumer • Prevent concurrent pushes
  8. Database replication Implementations DB read replicas Debezium DB triggers Positives

    • Transactionally consistent • Full query posibilities Negatives • Gives out impl. details • Failures can block producer • Lots of infrastructure work Consumer Efforts • Know and adapt to producer’s data model Producer Efforts • Evolve data model carefully • Or maintain appropriate views
  9. Content Delivery Account RequestStreaming (videoId, userId) Messaging ChangeSubscription (userId, subscription)

    Message Broker Subscribe (user.subscriptions) Send (userSubscriptionChanged) Receive (userSubscriptionChanged)
  10. Messaging Implementations JMS (ActiveMQ, OpenMQ) RabbitMQ Cloud messaging Positives •

    Decoupled • Scalable • Consumer-side filtering Negatives • Out of order / duplicate messages • Failures can block producer • Poison pills can block consumer • Retry / drop rules in the infrastructure Consumer Efforts • Handle out of order / duplicate messages • Limit consumption concurrency Producer Efforts • Maintain transactional consistency (if not XA Transaction) • Extra replication means for new consumers
  11. Content Delivery Account RequestStreaming (videoId, userId) Event streaming ChangeSubscription (userId,

    subscription) Event storage Read (user.subscriptions, cursor) Store (userSubscriptionChanged)
  12. Event streaming Implementations Kafka Relational DB Positives • Decoupled •

    Scalable • Preserving history (solving initial load) Negatives • Storage costs / compaction algorithms • Retention handled by infrastructure • Not easy to operate • Evolution of event payloads difficult Consumer Efforts • Handle payload versioning • Limit consumption concurrency Producer Efforts • Backwards-compatible event evolution
  13. Content Delivery Account RequestStreaming (videoId, userId) Pull replication ChangeSubscription (userId,

    subscription) <<schedule>> ReadEvents (lastSequence) Store (sequenceNo, event) Subscription ChangedEvent AccountCursor (lastSequence)
  14. Pull replication Implementations None (?) Positives • Simple • Transactionaly

    consistent • Preserves sequence • Data retention controlled by producer Negatives • Storage costs / events compaction • High latency Consumer Efforts • Handle cross-shard updates • Limit consumption concurrency • Handle cursor storage Producer Efforts • Handle event storage • Compact stores • Scale with number of events / producers
  15. Things needed for production level • Event evolution concerns •

    Up/Down casting of events • Event stream rebuilds • Event storage concerns • Other serialization means • Compact stores (change events are not the only possible representation) • Polling improvements • Adaptable poll frequency • Websocket notification
  16. Conclusion • Think how your replication setup can loose consistency

    • Impossible things tend to happen on 17:00 on Friday • Keep you replication code paths single threaded • In your entire system • Pull replication is underrated
  17. How to kill data exchange meeting HOW SHALL WE WHEN

    Prevent duplicate messages Guarantee update ordering Guarantee transactional consistency of update Restart replication Handle new kinds of data Keep deployments independent Multiple threads or instances are involved New domain enters the system Data representation changes Middleware / endpoint fails Exception in processing occurs There are concurrent updates to same entity