Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Query big data with DuckDB in PHP

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for asrar asrar
November 28, 2025

Query big data with DuckDB in PHP

Learn how to move data from MySQL to S3 using DuckDB embedded in it's PHP client.

Avatar for asrar

asrar

November 28, 2025

Other Decks in Programming

Transcript

  1. Query Big Data with DuckDB in PHP Introduction • Software

    engineer at Qbil Software B.V., an ERP company in the Netherlands • At Qbil we work extensively with MySQL, PHP and Symfony • Qbil Software is multi-tenant, with advanced querying and reporting needs SymfonyCon Amsterdam 2025 1 / 15
  2. Query Big Data with DuckDB in PHP The Problem 📊

    Database Growing Pains • MySQL as primary data store • Logging/analytics tables growing large (>100MB) • Slowing down backups and database dumps • Goal: Fit more databases on smaller instance sizes SymfonyCon Amsterdam 2025 2 / 15
  3. Query Big Data with DuckDB in PHP Solution Approach Move

    non-critical tables to Object Store (AWS S3) Challenge: How to sync data while maintaining fast, ergonomic querying? SymfonyCon Amsterdam 2025 3 / 15
  4. Query Big Data with DuckDB in PHP Option 1: Pure

    PHP Approach: • Schedule commands to load from MySQL • Convert to CSV • Write to S3 via AWS PHP SDK • Read back via S3 API ✓ Pros: High flexibility ✗ Cons: Lots of code, performance issues, memory concerns SymfonyCon Amsterdam 2025 4 / 15
  5. Query Big Data with DuckDB in PHP Option 2: Infrastructure

    Layer Approach: • AWS DMS to offload MySQL to S3 • AWS Athena for querying ✓ Pros: No app code ✗ Cons: Expensive, dedicated infrastructure SymfonyCon Amsterdam 2025 5 / 15
  6. Query Big Data with DuckDB in PHP Option 3: PHP

    + DuckDB ✓ Run DuckDB from PHP via Symfony command Scheduled periodically for data movement SymfonyCon Amsterdam 2025 6 / 15
  7. Query Big Data with DuckDB in PHP What is DuckDB?

    Developed in the CWI Netherlands 🇳🇱 • Columnar database query engine and file format • Perfect for large analytical data (CSVs > 1GB) • Easy to install, embed, and run • Fast with familiar SQL syntax • Extensions for MySQL, AWS S3, and more • Supports CSV/Parquet/JSON formats SymfonyCon Amsterdam 2025 7 / 15
  8. Query Big Data with DuckDB in PHP DuckDB in PHP

    Client: https://duckdb.org/docs/stable/clients/php Repository: https://github.com/satur-io/duckdb-php SymfonyCon Amsterdam 2025 9 / 15
  9. Query Big Data with DuckDB in PHP The SQL: Setup

    Extensions -- Install DuckDB core extensions INSTALL mysql; LOAD mysql; INSTALL httpfs; LOAD httpfs; INSTALL aws; LOAD aws; -- Setup AWS configuration CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN 'sso', PROFILE 'readonly', REFRESH auto ); SymfonyCon Amsterdam 2025 10 / 15
  10. Query Big Data with DuckDB in PHP The SQL: Copy

    to S3 COPY ( select * from mysql_query('qt_db', ' SELECT *, date_format(visited_at, ''%Y'') as visit_year, date_format(visited_at, ''%m'') as visit_month, date_format(visited_at, ''%d'') as visit_day FROM page_visits WHERE visited_at < current_date ORDER BY id ASC ') ) TO 's3://qt-analytics/page_visits' (FORMAT PARQUET, PARTITION_BY (visit_year, visit_month, visit_day, tenant_id), APPEND); SymfonyCon Amsterdam 2025 11 / 15
  11. Query Big Data with DuckDB in PHP The SQL: Clean

    Up MySQL -- Delete data from MySQL select * from mysql_execute('qt_db', ' DELETE FROM page_visits WHERE visited_at < current_date '); SymfonyCon Amsterdam 2025 12 / 15
  12. Query Big Data with DuckDB in PHP PHP: Create DuckDB

    Connection <?php $config = new \Saturio\DuckDB\DB\Configuration(); $config->set('extension_directory', $tempDir); $config->set('temp_directory', $tempDir); $config->set('memory_limit', \ini_get('memory_limit') ?: '1GB'); $config->set('max_temp_directory_size', '5GB'); // Initialize connection $duckdb = Saturio\DuckDB\DuckDB::create($config); SymfonyCon Amsterdam 2025 13 / 15
  13. Query Big Data with DuckDB in PHP PHP: Query S3

    Data <?php $sql = <<<SQL select count(*) as Count from read_parquet( 's3://qt-analytics/page_visits/**/*.parquet' ) where visit_year = 2025 and visit_month = 11 SQL; $result = $duckdb->query($sql); $firstRow = $result->rows(true)->current(); echo 'Count: ' . $firstRow['Count']; SymfonyCon Amsterdam 2025 14 / 15
  14. Query Big Data with DuckDB in PHP Thank You! Questions?

    AUH Nahvi GitHub: @aszenz Symfony Slack: @Asrar SymfonyCon Amsterdam 2025 15 / 15