Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle

apidays
December 15, 2023

Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle

Apidays Paris 2023 - Software and APIs for Smart, Sustainable and Sovereign Societies
December 6, 7 & 8, 2023

Productizing AsyncAPI for Data Replication and Changed Data Capture
Julien Testut, Senior Principal Product Manager, Oracle

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

December 15, 2023
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. Julien Testut Senior Principal Product Manager, Oracle Development with thanks

    to Jagdev Dhillon & Tianshu Li Productizing AsyncAPI for Data Replication / CDC Copyright © 2023, Oracle and/or its affiliates
  2. The following is intended to outline our general product direction.

    It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. The materials in this presentation pertain to Oracle Health, Oracle, Oracle Cerner, and Cerner Enviza which are all wholly owned subsidiaries of Oracle Corporation. Nothing in this presentation should be taken as indicating that any decisions regarding the integration of any EMEA Cerner and/or Enviza entities have been made where an integration has not already occurred. Safe harbor statement Copyright © 2023, Oracle and/or its affiliates
  3. Agenda 1. Brief background on CDC & GoldenGate 2. Why

    AsyncAPI? 3. AsyncAPI with GoldenGate 4. Roadmap Copyright © 2023, Oracle and/or its affiliates
  4. Databases, Change Data Capture (CDC) & Data Replication Copyright ©

    2023, Oracle and/or its affiliates • In databases, the most important system events are Transactions (Tx’s) • DML (data manipulation language) – inserts, updates, deletes • DDL (data definition language) – schema changes, alter table, etc. • All OLTP databases, and most databases overall, have centralized logging • Users/applications can open short/long-running Tx’s, affecting single rows or billions • Committed transactions are when these database events achieve “durability” • Change Data Capture (CDC) is fundamentally about “capturing” the Tx’s from the source • Typically, at the moment of “commit” in the logs – when the Tx’s become durable • Replication is about transmitting the “captured” Tx’s to other places (e.g.; Targets) • Some databases have their own CDC & Replication layers (e.g.; proprietary to only that DB) • CDC/Replication tools are built to work with many different databases
  5. CDC Tools and Oracle GoldenGate (GG) Copyright © 2023, Oracle

    and/or its affiliates • CDC tools are a long-time part of the enterprise software domain • GoldenGate was one of the first tools in this area, dating back from the mid-1990’s • Open-source tools like Debezium have gained popularity since ~2020 • GoldenGate CDC/Replication technology was acquired by Oracle in 2009 • To provide a solution for “logical replication” supporting High Availability in distributed DBs • To replace older technologies: Oracle Streams, Oracle CDC • To become the foundation of Oracle’s data integration portfolio • Today, GoldenGate is ~$1B global ecosystem for mission-critical systems • ~10,000 customers, 180+ countries, historically mostly for large multi-nationals and Gov’s • GG runs in most of your banks, payments systems, ecommerce retailers, telcos, airlines, etc. • GG supports 100’s of different databases, clouds, warehouses, lakehouses, messaging, etc. • GG is more than CDC/Replication, it also includes Data Integration, Streaming Data, Cloud Pipelines, Data Governance, and Real-time Observability
  6. History of GoldenGate and “Why AsyncAPI Now?” Copyright © 2023,

    Oracle and/or its affiliates • 1995 – 2005 – Decade of database replication • Use cases focus on DML/DDL replication between databases • 2005 – 2015 – Emergence of MPP and Big Data • Massive expansions into Data Warehousing and eventually Hadoop-based Big Data tech • 2015 – today – Shift to Microservices, Cloud and distributed data architecture • GG is refactored to a microservices architecture and massive growth in cloud delivery • Adoption of AsyncAPI is a natural part of the evolution for GoldenGate • CDC/Replication is inherently an asynchronous activity • More and more use cases featuring “Event Sourcing” designs (Tx Outbox, Saga Patterns) • Event streams are becoming a valued part of a “Data Product” architecture • Kafka is non-transactional (eg; DMLs) and difficult to maintain “C” (consistency) in ACID • Kafka is sometimes “overkill” / too much overhead for many use cases
  7. AsyncAPI with GoldenGate Copyright © 2023, Oracle and/or its affiliates

    Automated, machine-generated client applications to a stream of exactly-once transaction events – JSON formatted, via REST pub/sub AsyncAPI Standardized Pub/Sub APIs CDC/Replication of Transactions Real-time data events Message Data Tx’s • Inserts • Updates • Deletes • GET/PUT • Schema Changes YAML descriptor Real-time data events to any data consumers Automated code generators: Bypass need for Kafka for data consumers Benefits Simple event sourcing & transaction outbox patterns AsyncAPI is the future of event-driven architecture
  8. Two important “big picture” use cases for CDC/Replication + AsyncAPI

    Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Data Product Producers Tx’s Apps JSON etc. Application Microservice Consumers are App Microservices, for CQRS/Outbox type design patterns Analytic or data science consumers, or for bespoke clients Why bind from data tier? (a) commit point for durable data, (b) lowest latency transmission, and (c) very high levels of automation streaming data products Parquet etc.
  9. DB Event Streams with CDC and AsyncAPI Copyright © 2023,

    Oracle and/or its affiliates Data Product Producers Apps DB Transactions/Commits Base Tables (Application Schema) GoldenGate Microservices Data Product Consumers Apps Transform data, react to changes Tx’s JSON as Tx’s Data and Schema changes are in stream Pros: • Easy for Producer (highly automated, very low effort to publish) • No application changes are required • Schema metadata in payloads (ie: the Consumer can decide how to handle schema changes) Cons: • Consumer binding is to Base Tables, exposing some implementation details such as Structure DML events JSON
  10. Transaction Outbox (without CDC or AsyncAPI tooling) Copyright © 2023,

    Oracle and/or its affiliates Data Product Producers Apps (code) Base Tables Data Product Consumers Apps Outbox Table JSON *JSON may be in consumer or producer formats JSON as Biz Objects Pros: • Outbox pattern ensures data consistency at commit-point • JSON schema may be defined by either Producer or Consumer Cons: • Latency & load – when using a polling-based relay service • Burden of change lifecycle is on the Producer JSON DB Transactions/Commits JSON commit dev code Broker Message Relay Read: polling for changes Publish: distribute
  11. Transaction Outbox with CDC & AsyncAPI Copyright © 2023, Oracle

    and/or its affiliates Data Product Producers Apps (code) Base Tables GoldenGate Microservices Data Product Consumers Apps Outbox Table JSON *JSON may be in consumer or producer formats JSON as Biz Objects in CDC payload Pros: • CDC + AsyncAPI provides very low latency and less impact on source DB • Easy for Consumer (eg; in many cases, Consumer may define format of the JSON) • Outbox pattern may be favored by Producer application developers (for Tx consistency) Cons: • Burden of change lifecycle is on the Producer JSON DB Transactions/Commits JSON commit dev code DML events
  12. Using AsyncAPI with GoldenGate Copyright © 2023, Oracle and/or its

    affiliates Data Product Consumers Data Product Producers 1. Decide which Databases, Tables & Columns to publish 2. Use GoldenGate Admin Microservice to setup the “capture” trail (GG’s ledger) 3. Use GoldenGate Distribution Microservice to define the AsyncAPI Channel (and associate it to GG Trail) 4. Use REST/GUI to browse AsyncAPI Channels 5. Authenticate using GG user/role and download YAML document 6. Build or generate client to receive and parse the Tx payload from GG Data Streams 7. Consume transactions Tx’s Apps JSON
  13. “Data Producer” using GoldenGate to create AsyncAPI Channels Copyright ©

    2023, Oracle and/or its affiliates Data Product Producers Part of GG microservice called “Distribution Service” Create “Data Streams” Associated to a “GG Trail”
  14. “Data Producer” can filter payloads, per Channel Copyright © 2023,

    Oracle and/or its affiliates Data Product Producers Filtering can happen at Object level… Tables, Columns, Data Values (eg; sensitive data, or JSON payloads etc.)
  15. GoldenGate (typically) publishes via WebSocket Secure (WSS) Copyright © 2023,

    Oracle and/or its affiliates Oracle objective is to have WSS client template to be contributed back to AsyncAPI for all to use
  16. “Data Producer” using GoldenGate to create AsyncAPI Channels Copyright ©

    2023, Oracle and/or its affiliates Individual Data Streams consist of a GoldenGate payload, any of 6 possible schema types Data Product Producers
  17. GoldenGate generated client code Copyright © 2023, Oracle and/or its

    affiliates Data Product Consumers Example JavaScript to define and initialize the WebSocket for streaming
  18. GoldenGate Data Streams Payload Copyright © 2023, Oracle and/or its

    affiliates Record consist of before/after images and op_type information (type of transaction) Data Product Consumers
  19. Options for the service Copyright © 2023, Oracle and/or its

    affiliates Data Product Consumers • Connection protocol (set by the Producer) • ws – WebSocket or wss – WebSocket Secure • Payload service levels (set by Producer) • Exact-once – GoldenGate will handle all tasks for deduplication of records to guarantee that DML/DDL events are only sent exactly one time • At-most-once – Will tolerate gaps in streaming data records, e.g.; gaps in data that may have been purged by the Producer • At-least-once – Service Producer may from time-to-time re- process source DML/DDL and this SLA may send duplicates • Start position (set by Consumer) • Current – will begin streaming Tx’s from current position • Earliest – will fetch Tx’s starting from earliest available in the GoldenGate Trail (retention is defined by Data Producers) Data Product Producers
  20. Roadmap – what’s on the horizon Copyright © 2023, Oracle

    and/or its affiliates • Formatters • For App Consumers: JSON (default), Avro, XML, Protobuf, etc. • For Analytic Consumers: Parquet, Iceberg, Delta, etc. • CloudEvents payload format option • Adds more overhead, latency, etc. • May help simplify how some clients can parse the transactions • Business object semantics • When integrated with Oracle JSON-Relational duality (producers may choose to share Business Object structure, rather than the physical tables) • Stream processing sink • AsyncAPI channels as output of streaming data pipelines (pipelines enable data integration/prep or analytic actions)
  21. Our mission is to help people see data in new

    ways, discover insights, unlock endless possibilities. Copyright © 2023, Oracle and/or its affiliates