Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Postgres to OpenSearch in No Time

From Postgres to OpenSearch in No Time

You've been tasked with implementing a data streaming pipeline for propagating data changes from your operational Postgres database to a search index in OpenSearch. Data views in OpenSearch should be denormalized for fast querying, and of course there should be no noticeable impact on the production database.

In this session we'll discuss how to build this data pipeline using two popular open-source projects: Debezium for log-based change data capture (CDC) and Apache Flink for stream processing. Join us for this talk and learn about

* Setting up change data streams with Debezium
* Efficiently building nested data structures from 1:n joins
* Deployment options: Kafka Connect vs. Flink CDC

Gunnar Morling

December 01, 2023
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Image © massmatt https://flic.kr/p/25eF9D3 (CC BY 2.0)
    From Postgres To OpenSearch
    In No Time
    Gunnar Morling
    Software Engineer, Decodable
    @gunnarmorling

    View full-size slide

  2. From Postgres to OpenSearch | @gunnarmorling
    Today’s Mission
    Learn About…

    View full-size slide

  3. From Postgres to OpenSearch | @gunnarmorling
    ● Software engineer at Decodable
    ● Former project lead of Debezium
    ● kcctl 🧸, JfrUnit, ModiTect,
    MapStruct
    ● Spec Lead for Bean Validation 2.0
    ● Java Champion
    Gunnar Morling

    View full-size slide

  4. From Postgres to OpenSearch | @gunnarmorling
    Updating the Search Index
    One Idea?
    🤔

    View full-size slide

  5. © Kai Schreiber https://flic.kr/p/uecg (CC BY-SA 2.0)

    View full-size slide

  6. From Postgres to OpenSearch | @gunnarmorling
    Debezium
    Log-Based Change Data Capture

    View full-size slide

  7. From Postgres to OpenSearch | @gunnarmorling
    Debezium in a Nutshell
    Open-Source Change Data Capture
    ● A CDC Platform
    ○ Based on transaction logs
    ○ Snapshotting, filtering, etc.
    ○ Outbox support
    ○ Web-based UI
    ● Fully open-source, very active
    community
    ● Large production deployments

    View full-size slide

  8. From Postgres to OpenSearch | @gunnarmorling
    Change Data Capture
    Liberation for Your Data

    View full-size slide

  9. From Postgres to OpenSearch | @gunnarmorling
    Change Data Capture
    Liberation for Your Data

    View full-size slide

  10. From Postgres to OpenSearch | @gunnarmorling
    ● Core
    ○ MySQL, MariaDB
    ○ Postgres
    ○ SQL Server
    ○ MongoDB
    ○ Db2, Informix
    ○ Oracle
    ● Community-led:
    ○ Vitess, Cassandra, Spanner
    ● External: ScyllaDB, Yugabyte
    Debezium
    Supported Databases

    View full-size slide

  11. From Postgres to OpenSearch | @gunnarmorling
    Debezium: Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View full-size slide

  12. From Postgres to OpenSearch | @gunnarmorling
    Debezium: Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View full-size slide

  13. From Postgres to OpenSearch | @gunnarmorling
    Debezium: Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View full-size slide

  14. From Postgres to OpenSearch | @gunnarmorling
    Becoming the De-Facto CDC Standard
    https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-connector-scylla-cdc-source-connector/
    Debezium

    View full-size slide

  15. Apache Flink
    Colin Howley https://flic.kr/p/698F5j (CC BY-ND 2.0)

    View full-size slide

  16. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stateful Computations over Data Streams
    https://flink.apache.org/

    View full-size slide

  17. From Postgres to OpenSearch | @gunnarmorling
    ● Real-time reporting/dashboards
    ● Low-latency alerting, notifications
    ● Materialized view maintenance, caches
    ● Real-time cross-database sync, lookup joins,
    windowed joins, aggregations
    ● Machine learning: model serving, feature
    engineering
    ● Change data capture, data integration
    Apache Flink
    Common Use Cases
    https://flink.apache.org/poweredby.html

    View full-size slide

  18. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    APIs for Application Development
    Image source: “Change Data Capture with Flink SQL and Debezium” by Marta Paes at DataEngBytes
    (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)

    View full-size slide

  19. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    APIs for Application Development
    Image source: “Change Data Capture with Flink SQL and Debezium” by Marta Paes at DataEngBytes
    (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)

    View full-size slide

  20. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View full-size slide

  21. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View full-size slide

  22. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View full-size slide

  23. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View full-size slide

  24. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View full-size slide

  25. From Postgres to OpenSearch | @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View full-size slide

  26. From Postgres to OpenSearch | @gunnarmorling
    Debezium and Apache Flink
    Integration Options

    View full-size slide

  27. Demo
    © Luke Jones https://flic.kr/p/sEq4MA (CC BY-SA 2.0)

    View full-size slide

  28. From Postgres to OpenSearch | @gunnarmorling
    Driving Full-Text Search
    Propagating Joined Data to OpenSearch

    View full-size slide

  29. Demo
    © Luke Jones https://flic.kr/p/sEq4MA (CC BY-SA 2.0)

    View full-size slide

  30. From Postgres to OpenSearch | @gunnarmorling
    Nested Data Structures
    UDFs to the Rescue

    View full-size slide

  31. From Postgres to OpenSearch | @gunnarmorling
    Nested Data Structures
    UDFs to the Rescue

    View full-size slide

  32. Demo
    © Luke Jones https://flic.kr/p/sEq4MA (CC BY-SA 2.0)

    View full-size slide

  33. From Postgres to OpenSearch | @gunnarmorling
    Nested Data Structures
    UDFs to the Rescue

    View full-size slide

  34. From Postgres to OpenSearch | @gunnarmorling
    Nested Data Structures
    UDFs to the Rescue

    View full-size slide

  35. From Postgres to OpenSearch | @gunnarmorling
    Nested Data Structures
    UDFs to the Rescue
    https://www.youtube.com/@decodable

    View full-size slide

  36. From Postgres to OpenSearch | @gunnarmorling
    Ingest Once...
    ...Process Multiple Times

    View full-size slide

  37. From Postgres to OpenSearch | @gunnarmorling
    Transactional Aggregation
    Correlating Events From Same Transaction

    View full-size slide

  38. From Postgres to OpenSearch | @gunnarmorling
    Wrap-Up

    View full-size slide

  39. From Postgres to OpenSearch | @gunnarmorling
    ● Debezium: Real-time change event streams for your data
    ● Debezium and Apache Flink: Power house of change stream
    processing
    ○ Data Integration
    ○ Data Cleansing
    ○ Denormalization
    ○ Aggregations
    ○ Pattern Matching
    Take Aways
    🤩

    View full-size slide

  40. From Postgres to OpenSearch | @gunnarmorling
    ● Provisioning and updating infrastructure
    ● Deployment and (auto-)scaling
    ● Observability
    ● State management
    ● Schema management and inference
    ● Developer experience
    ● CI/CD
    ● Security and access control
    Towards Production
    What To Consider

    View full-size slide

  41. From Postgres to OpenSearch | @gunnarmorling
    ● Debezium: @debezium | https://debezium.io/
    ● Apache Flink: @ApacheFlink | https://flink.apache.org/
    ● Getting started with Flink:
    github.com/decodableco/examples
    → flink-learn
    Learn More

    View full-size slide

  42. From Postgres to OpenSearch | @gunnarmorling
    [email protected]
    @gunnarmorling
    📧
    Thank You!
    Q & A

    View full-size slide