Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PG-Strom - A FDW module utilizing GPU device

PG-Strom - A FDW module utilizing GPU device

slides on LT session of PGcon2012

Avatar for KaiGai Kohei

KaiGai Kohei

May 28, 2012
Tweet

More Decks by KaiGai Kohei

Other Decks in Technology

Transcript

  1. Page 2 PostgreSQL Conference 2012 FDW is fun Executor Regular

    Table Foreign Table Foreign Table Foreign Table MySQL FDW Oracle FDW PG-Strom FDW Exec Exec Exec Regular Table SELECT * FROM … Utilizing External Computing Resource! Run on single thread
  2. Page 3 PostgreSQL Conference 2012 Idea of Asynchronous Execution using

    CPU and GPU CPU vanilla PostgreSQL PostgreSQL with PG-Strom CPU GPU Synchronization Iteration of scan tuples and evaluation of qualifiers Larger “chunk” to scan the database at once Asynchronous memory transfer and execution Earlier than “Only CPU” scan : Red means, scan tuples from the database : Green means, execution of the qualifiers
  3. Page 4 PostgreSQL Conference 2012 World of CPU World of

    GPU Backend Process Architecture of PG-Strom Postmaster PG-Strom GPU Calculation Server shared chunks shared buffer shadow tables regular tables Backend Process GPU Kernel Function PCI-E x16 Gen2 (16GB/sec) Backend Process Executor PG-Strom GPU Device Memory Preload Exec Exec Data Exchange via shared chunk Massive Parallel Execution DMA Transfer
  4. Page 5 PostgreSQL Conference 2012 Data Density and Column-base structure

    Chunk Buffer of FT1 value a[] rowmap value b[] value c[] value d[] <not used> <not used> Table: my_schema.ft1.b.cs 10300 {10.23, 7.54, 5.43, … } Table: my_schema.ft1.c.cs {‘2010-10-21’, …} 10100 {2.4, 5.6, 4.95, … } {‘2011-01-23’, …} {‘2011-08-17’, …} ② Calculation ① Transfer ③ Write-Back 10100 10200 10300 Foreign Table FT1 (a, b, c, d) rowid map column store of A column store of B column store of C column store of D row store of FT1 Shadow Tables
  5. Page 6 PostgreSQL Conference 2012 Benchmark Result ▌CPU: Intel Xeon

    E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core) ▌rtbl and ftbl contains 5 million tuples, with same values. ▌All the tuples are already in the shared buffers, so seldom disk i/o happen. postgres=# SELECT COUNT(*) FROM rtbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 (1 row) Time: 3739.492 ms postgres=# SELECT COUNT(*) FROM ftbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 (1 row) Time: 227.023 ms X10 times Faster GPU Accelerated!
  6. Page 7 PostgreSQL Conference 2012 Future Development ▌Git URL https://github.com/kaigai/pg_strom

    ▌v9.3 development  Writable Foreign Table  Sort / Aggregate acceleration using GPU  Inheritance between regular and foreign tables ▌Need your help  Folks who can review the patches  Folks who can provide real-life big data  Folks who can know typical workload of analytic queries