CPU and GPU CPU vanilla PostgreSQL PostgreSQL with PG-Strom CPU GPU Synchronization Iteration of scan tuples and evaluation of qualifiers Larger “chunk” to scan the database at once Asynchronous memory transfer and execution Earlier than “Only CPU” scan : Red means, scan tuples from the database : Green means, execution of the qualifiers
Chunk Buffer of FT1 value a[] rowmap value b[] value c[] value d[] <not used> <not used> Table: my_schema.ft1.b.cs 10300 {10.23, 7.54, 5.43, … } Table: my_schema.ft1.c.cs {‘2010-10-21’, …} 10100 {2.4, 5.6, 4.95, … } {‘2011-01-23’, …} {‘2011-08-17’, …} ② Calculation ① Transfer ③ Write-Back 10100 10200 10300 Foreign Table FT1 (a, b, c, d) rowid map column store of A column store of B column store of C column store of D row store of FT1 Shadow Tables
E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core) ▌rtbl and ftbl contains 5 million tuples, with same values. ▌All the tuples are already in the shared buffers, so seldom disk i/o happen. postgres=# SELECT COUNT(*) FROM rtbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 (1 row) Time: 3739.492 ms postgres=# SELECT COUNT(*) FROM ftbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 (1 row) Time: 227.023 ms X10 times Faster GPU Accelerated!
▌v9.3 development Writable Foreign Table Sort / Aggregate acceleration using GPU Inheritance between regular and foreign tables ▌Need your help Folks who can review the patches Folks who can provide real-life big data Folks who can know typical workload of analytic queries