Alternatives to XA 2PC Transactions

Alternatives to XA Two-Phase Commit Transactions Paul Done | Executive
Solutions Architect | MongoDB Inc. | @TheDonester

Agenda Global Transactions Spanning Multiple Heterogeneous Systems The Problem With
XA Alternative Patterns For Updating Heterogeneous Systems Comparing Approaches

Global Transactions Spanning Multiple Heterogeneous Systems

Database ACID Transactions A single unit of logic composed of
multiple different database operations, which exhibits the following properties • Atomic: either completes in its entirety or has no effect whatsoever (rolls back and is not left only partially complete) • Consistent: each transaction observes the latest current database state in the correct write ordering • Isolated: the state of an inﬂight transaction is not visible to other concurrent inﬂight transactions (and vice versa) • Durable: changes are persisted and cannot be lost if there is a system failure Achieving complete consistency & complete isolation, simultaneously, is effectively mutually exclusive for real world databases and workloads - the reasons why are described by me in a different presentation

Database ACID Transactions - Local or Global? ACID Transactions are
commonly provided by all mainstream Relational DBs + just a very few NoSQL DBs (notably MongoDB) • Where all of a single transaction’s operations are performed against the same database instance - known as ‘Local Transactions’ • Not exclusive to just database technologies - many implementations of other types of stateful resources, like message brokers, also support ACID transactions But things become much more challenging if a single transaction is composed of operations which update two or more different databases and/or message brokers - known as ‘Global Transactions’

Two-Phase Commit (2PC) Pattern for enabling an application to perform
a distributed global transaction across 2 or more [often heterogeneous] stateful resources • Resources are usually databases or message brokers (hosting queues or topics) • Controlled by a Transaction Manager (a.k.a Transaction Monitor) ◦ This might be a responsibility embedded within one of the participant resources or may be a separate standalone system When an application calls commit(), two key phases are then instigated by the Transaction Manager 1. PREPARE phase: Each resource votes Yes or No to indicate if it can fulﬁl the transaction (ensuring all related intermediary state is ﬁrst persisted, in case it fails in the meantime) (if all vote Yes, the Transaction Manager persists the decision to a durable log, in case of system failure) 2. COMMIT phase: Each resource is then instructed to persist its changes & can’t renege on this

Two-Phase Commit (2PC) Examples There are various different implementations of
the 2PC pattern out there, with examples including: • IBM CICS ◦ “Customer Information Control System” - middleware for managing online transaction processing on a Mainframe, where a CICS transaction may span multiple datasources (e.g. IMS, DB2, MQ Series, VSAM) using the SNA “LU6.2” systems communications protocol • Oracle DBLINKs ◦ A multi-statement transaction initiated and managed via one Oracle DB instance, with some of the updates applied there, but the rest of the updates applied in a separate Oracle DB instance, which is linked into by the ﬁrst DB, under the covers ◦ To the client application, it just appears that it is updating a single database within the transaction and it just uses the normal database driver & API • The X/Open XA standard ◦ a.k.a. XA/2PC or XA ◦ Described in later slides...

Two-Phase Commit Flow Diagrams copied from: https://ﬁzalihsan.github.io/technology/transaction.html ALL RESOURCES COMMITTED
EXAMPLE ALL RESOURCES ROLLED BACK EXAMPLE Each resource is a different database instance or message broker

XA - a standard for 2PC A specification released in
1991 by the X/Open consortium (which later merged with The Open Group) Defines a protocol for coordinating transactions between heterogeneous technologies • Defines global transactions which it calls XA Transactions • Defines a Transaction Manager / Monitor which it calls the XA Coordinator • Defines the one or more stateful Resource Managers (e.g. a DB) which it calls XA Resources Example XA Transaction Manager implementations: • Tuxedo, CORBA servers, Java Enterprise Edition application servers (e.g. WebLogic, WebSphere, JBoss) Example XA Transaction Resource implementations: • Databases: Oracle, IBM DB2, MS SQL Server, Postgres, MySQL (but NOT MongoDB or other NoSQL) • Message Brokers: IBM MQSeries, ActiveMQ, Java EE app svrs (but NOT Kafka or RabbitMQ) Resource vendors need to implement XA in their own Drivers which they provide to users • Example: Oracle provides a version of its JDBC driver which supports the XA API for Java applications to use and a version of its ODBC driver which supports the XA API for C/C++ applications to use

XA & Java Enterprise Edition (Java EE*) XA is programming
language agnostic, but is most commonly seen in combination with Java * Formerly called J2EE If you used to build Java EE apps running on Java application servers, you probably came across XA • Java Transaction API (JTA) was the way your Java code said to start, commit or abort a global XA transaction • You might have integrated with different databases via their provided XA JDBC Drivers • You might have conﬁgured message brokers via their provided XA Java Message Service (JMS) Drivers Java EE Application Server Diagram copied from: https://ﬁzalihsan.github.io/technology/transaction.html

The Problem With XA

XA - The Challenges 1. Eventual Consistency 2. Reduced High
Availability 3. Poor Performance 4. Operational Complexity 5. Interoperability Issues

XA is only Eventually Consistent (1 of 5) Real world
example An enterprise Java app running on a WebLogic application server, placing a message on an IBM MQSeries queue and inserting a record in an Oracle database within a single distributed transaction Oracle DB (XA Resource Manager) MQSeries Message Queue (XA Resource Manager) 1. ENQUEUE Msg 2. INSERT Record WebLogic (XA Transaction Manager) XA Transaction #1

example An enterprise Java application running on a WebLogic application server, receiving a message off an IBM MQSeries queue and then reading a record from an Oracle database within a single distributed transaction Oracle DB (XA Resource Manager) MQSeries Message Queue (XA Resource Manager) 1. DEQUEUE Msg 2. READ Record WebLogic (XA Transaction Manager) XA Transaction #2

example 2 consecutive transactions - the second transactions receives the message that the ﬁrst transaction put on the queue and reads the record that the ﬁrst transaction put into the database Oracle DB (XA Resource Manager) MQSeries Message Queue (XA Resource Manager) 1. ENQUEUE Msg 2. INSERT Record WebLogic (XA Transaction Manager) XA Transaction #1 1. DEQUEUE Msg 2. READ Record WebLogic (XA Transaction Manager) XA Transaction #2 SO EVERYTHING WORKS FINE, RIGHT?

example 2 consecutive transactions - the second transactions receives the message from the queue and attempts to read the record from the database but sometimes this record is missing SO EVERYTHING WORKS FINE, RIGHT? WRONG! XA IS EVENTUALLY CONSISTENT! Oracle DB (XA Resource Manager) MQSeries Message Queue (XA Resource Manager) 1. ENQUEUE Msg 2. INSERT Record WebLogic (XA Transaction Manager) XA Transaction #1 1. DEQUEUE Msg 2. READ Record FAILURE WebLogic (XA Transaction Manager) XA Transaction #2 SOMETIMES THE RECORD DOES NOT EXIST YET

example - so what happened? Surely this should always work? 1. 1st transaction puts the data in the DB as part of the same transaction that puts the message on the queue 2. Only when the message was successfully committed to the queue, can the 2nd transaction be kicked off 3. Yet, the subsequent listener code can’t always ﬁnd the data in the DB, that was inserted by the previous transaction So why is the database record inserted by the 1st transaction sometimes missing for the 2nd transaction? • The ﬁnal commit action performed against each of the 2 resources is initiated in parallel • This is asynchronous because a resource may have temporarily gone down - the system keeps retrying • The 2 resources each take non-deterministic durations to make their change durable (time to persist to disk is variable) • There's no way of guaranteeing that they both achieve this in exactly the same instant - THEY NEVER WILL • So the solution you’ve just seen is subject to RACE CONDITION* failures! * Oracle even acknowledges this in the section “Avoiding the XA Race Condition” of its documentation at: https://www.oracle.com/technetwork/products/clustering/overview/distributed-transactions-and-xa-163941.pdf

XA suffers from reduced High Availability The Transaction Manager is
typically a single point of failure • If it goes down, it will have to be recovered (which may need to be on a different host, if the host has irrevocably failed) • Whilst down, the databases will invariably have pending transactions, stuck and holding locks, thus preventing other DB operations from proceeding if attempting to access the same locked data Risk of deadlocks grinding the overall system to a halt • XA doesn’t allow the system as a whole to detect deadlocks to automatically & safely back these out • When deadlock occurs, some of the data will be inaccessible indeﬁnitely due to the held record locks • Has a cascading effect on other transactions trying to take locks on the same records, which will in turn back up (even happens for ‘livelocks’ where one resource only is temporarily down/blocked - cascading upwards)

XA leads to poor Performance Chatty protocol between transaction coordinator
and participants • More network hops to negotiate preparedness, adding latency • More writes to disk, adding latency (transaction manager commit log, each resource’s pending transactions log) Requires resources (e.g. DBs) to adopt a pessimistic locking approach • Database locks held for longer, causing backup of work and decreasing throughout • Messages held pending longer on queues before able to be delivered • Each component moves in rigid lock step with every other component - the system is as fast as the slowest component

XA results in increased Operational Complexity More moving parts •
More technologies to learn, install, conﬁgure, monitor, patch & build disaster recovery procedures for Harder root cause analysis • Challenging for a single expert to hunt down the ‘rogue’ transaction issue without help from more domain experts • Almost impossible to ﬁx if heuristics exceptions occur More frequent emergencies to resolve • Restore a failed Transaction Manager and its commit log Consistent backups are hard / impossible • To take a consistent snapshot across multiple data & log stores, requires periodically taking the whole system down • Also will face data loss between those snapshots

XA requirement for Interoperability has compromises Interoperability matrix hell •
Very hard to determine a line of XA compatibility through a set of technologies & their different versions • Hard to achieve zero compatibility issues due to complexity & ambiguity in the speciﬁcation and/or implementation bugs • Technology sprawl must be in lock-step: transaction manager, databases, message brokers, XA drivers A Microservices architecture becomes almost impossible to achieve • When transaction boundaries cross domain boundaries • Example: In eCommerce, how do you take a new customer order for a product using the Orders Microservice and then decrement the stock quantity using the Inventory Microservice, as one atomic operation?

Alternative Patterns For Updating Heterogeneous Systems

Sender Resubmissions + Receiver Dups-Detection: Via DBs Updating two resources
as a “Single Unit” to achieve Exactly-Once delivery Sender Code (pool of threads each running continuously taking new jobs when available - naturally performs resubmissions) Sender’s Local DB - Business Data - OPTIONAL Jobs Metadata Receiver Code (performs duplicables detection) Receiver’s Local DB - Business Data - OPTIONAL Receipts Metadata RPC (e.g. REST, GraphQL) If the request or response is lost in transit, that’s ﬁne Characteristics: • Exactly-once delivery of messages (atomic) • Eventually consistent • Doesn’t require either DB to be compliant with a speciﬁc 2PC protocol

as a “Single Unit” to achieve Exactly-Once delivery Sender Code (pool of threads each running continuously taking new jobs when available - naturally performs resubmissions) Sender’s Local DB - Business Data - OPTIONAL Jobs Metadata Receiver Code (performs duplicables detection) Receiver’s Local DB - Business Data - OPTIONAL Receipts Metadata RPC 1. Regular code performs business data updates in local DB & puts job with unique id in local DB as part of one normal local DB transaction 2. Pool thread code reads an outstanding job from local DB and makes RPC call to receiver component passing the appropriate business info + job id 3. For each RPC call which responds with success, mark job as completed (or delete its record) in local DB 4. If no response received, the call times out and the job is then eligible to be processed again by a subsequent thread because it has not been marked as complete 1. For each received RPC call check in local DB if the job’s id has already been processed - if it has, make no DB changes and just return success 2. If not a duplicate, perform business data updates in local DB & record job id in local DB as part of one normal local DB transaction, before returning success

as a “Single Unit” to achieve Exactly-Once delivery Sender Code (pool of threads each running continuously taking new jobs when available - naturally performs resubmissions) Sender’s Local DB - Business Data - OPTIONAL Jobs Metadata Receiver Code (performs duplicables detection) Receiver’s Local DB - Business Data - OPTIONAL Receipts Metadata RPC If business data records naturally track their own ‘state’ in one more more business ﬁelds (e.g. ‘order status’), then no need for jobs metadata to also be stored and the job id can just be the business record’s unique DB id Sound familiar? This is what mongod does for its Retryable Writes feature! If inserted business data records and its data model are naturally idempotent then no need to record what jobs have been processed in this local DB - duplicates will be naturally ﬁltered out by the data model If updating more than 2 databases, see Orchestration / Choreography patterns later

Sender Resubmissions + Receiver Dup-Detection: Via Queues Variant using Queues
from a Messaging Broker Message Broker Message Queue A Message Queue B Message Queue C Application A’s Code Application A’s DB Dequeue Msg Enqueue Msg Application B’s Code Application B’s DB Dequeue Msg Enqueue Msg “Micro-transaction” 1 “Micro-transaction” 2 “Business-transaction” Characteristics: • Exactly-once delivery of messages (atomic) • Eventually consistent • Doesn’t require DBs or Queues to be compliant with a speciﬁc 2PC protocol

Sender Resubmissions + Receiver Dup-Detection: Via Queues Variant using Queues
from a Messaging Broker Message Broker Message Queue A Message Queue B Message Queue C Application A’s Code Application A’s DB Dequeue Msg Enqueue Msg Application B’s Code Application B’s DB Dequeue Msg Enqueue Msg “Micro-transaction” 1 “Micro-transaction” 2 “Business-transaction” If the same message broker is used for both queues, a local messaging native transaction can span both the enqueue & dequeue operations atomically - if inserting the message in the database fails or times-out, the local enqueue/dequeue operation will be rolled back and the messages will automatically be re-delivered soon afterwards Like the previous pattern, the business data inserts are either naturally idempotent, or otherwise, it is also required for the code to do an insert of the unique msg id into receipts metadata records in the local DB too, to ﬁlter out duplicate business data updates (if the application code detects a duplicate it should just commit the local message broker enqueue/dequeue transaction without inserting a new record)

Formalised “Outbox” Pattern Essentially a combined variation of the previous
two patterns Taken from “Pattern: Transactional outbox” @ Microservices.io 1. An application service updates data in regular tables of the local database and encapsulates the remaining changes in a command it puts in a special Outbox table in the same local database, as part of one single local database transaction 2. A separate Message Relay component/process reads commands from Outbox table 3. The Message Relay process keeps trying to send the command to the target system via a message broker - this can produce duplicates so the command must be inherently idempotent from the receiver’s perspective

Business Transaction Choreography Vs Orchestration Distributed changes can be implemented
by either approach DB Code DB Code DB Code DB Code DB Code OR Orchestrator DB Orchestrator Logic DB Code DB Code DB Code DB Code DB Code 1 2 3a 4a 3b Choreography Orchestration

Business Transaction Choreography Vs Orchestration Both of the mentioned examples
(DB oriented vs Queue oriented) for the retries + dups-detection pattern can be implemented by Choreography or by Orchestration For Orchestration, the Orchestrator performs the job management, tracking and workflow coordination • In fact, this is essentially what Business Process Management (BPM) tools do (i.e. manage potentially long-lived business transactions - important: make sure the BPM tool can treat each process step as a local transaction) Both also allow for real world compensation workflows to be included in the business transaction - example: • If an eCommerce order for a product cannot be fulfilled because it transpires, later on, that the inventory database didn’t accurately reflect the product’s stock quantity in the physical warehouse, a compensation action can then be executed, asynchronously, to to cancel the order and to reimburse the customer

There Is Precedent For This: “Sagas” A term and approach
coined in an industry whitepaper from 1987 English dictionary definition of the word ‘Saga’: • “A very long story with dramatic events or parts” The whitepaper’s definition of a ‘Saga’: • “Long lived transaction that can be broken up into transactions, but still executed as a unit” Documents approach for enabling long running transactions with compensating actions to revert state • Was originally designed for use against a single database to break up a long running transaction that would otherwise hold database locks for too long • However, many microservices related articles now discuss using a distributed adaption of the pattern, formalised for coordination where transactional boundaries span multiple microservices • Potentially overkill if you just need a simple integration between just 2 different DBs - if that’s the case just follow the simple application specific patterns highlighted earlier

Comparing Approaches

Comparison: Transactional Patterns For Updating Multiple Resources XA Protocol Choreography
Patterns Orchestration Patterns Throughput Performance Low High Medium Mandates Support for a Native 2PC Protocol In All Resources Yes No No Complexity of Tracing & Issue Diagnosis Medium Medium Easy Ability to Fix State When Things Go Wrong Very hard (nearly impossible) OK (compensation) OK (compensation) High Availability of Typical Implementations Low Medium Medium Interoperability with Microservices HTTP APIs (e.g.. REST/GraphQL) No Yes Yes Supports Long-running Business Transactions No Yes Yes Supports Compensation Workﬂows No Yes Yes Application Code Complexity to Manage ‘Transaction’ Lifecycle Low Medium Medium ACID Properties Atomic Yes Yes Yes Consistency Eventual Eventual Eventual Isolation Strong (using locking) Weak Weak Durable Yes Yes Yes

Hang on, doesn’t MongoDB itself use 2PC internally? Yes, for
distributed transactions across multiple shards in the same cluster, but... • It’s not addressing the same use cases as XA ◦ Transactions across multiple heterogeneous technologies Vs transactions across a single distributed DB • Many issues are related to the XA protocol speciﬁcally rather than the 2PC pattern generally • Many issues exacerbated when multiple vendor technologies involved in a distributed transaction MongoDB’s distributed transactions within single cluster (compared to XA’s issues): • Eventual Consistency MongoDB favours providing a “snapshot” read concern for a synchronized view of the data across shards (and without requiring locks for isolation) • Reduced High Availability Automated replica failover for each shard & its set of inﬂight transactions • Poor Performance Locking not used - higher concurrency (fail fast) & lower latency (no blocking) • Operational Complexity Single cluster so easy to manage • Interoperability Issues None - all distributed elements are part of same technology stack & version

Further Reading • Designing Data-Intensive Applications book, Chapter 9, section
‘Distributed Transactions and Consensus’ by Martin Kleppmann (O'Reilly, 2016). • Your Coffee Shop Doesn’t Use Two-Phase Commit by Gregor Hohpe • Myth: Why Banks Are BASE Not Acid - Availability Is Revenue by Eric Brewer • The Hardest Part About Microservices: Your Data by Christian Posta • SHOCKER: XA Distributed Transactions are only Eventually Consistent! by Paul Done • It’s Time to Move on from Two Phase Commit by Daniel Abadi • Pattern: Transactional outbox by Chris Richardson • Sagas by Hector Garcia-Molina and Kenneth Salem (1987 whitepaper) • Pattern: Saga by Chris Richardson • Patterns for distributed transactions within a microservices architecture by Keyang Xiang

That’s all folks Paul Done @TheDonester

Alternatives to XA 2PC Transactions

Alternatives to XA 2PC Transactions

More Decks by Paul Done

Other Decks in Programming

Featured

Transcript