NeuroBlade’s SPU Accelerates Velox by 10x

1 NeuroBlade's SPU Accelerates Velox Krishna Maheshwari Chief Product Officer
VeloxCon 2024

2 ✦ Purposely built for Analytics ✦ Fits every server
with a free PCIe slot ✦ SW integration aligns with query engine architecture NeuroBlade - SQL Processing Unit (SPU) CUSTOMER ENGINE

3 NeuroBlade’s seamless integration approach ✓ >10x Performance Advancements ✓
Modified Hive Connector ✓ Seamless Integration

4 All Table Formats supported out of the box Table
formats ClickHouse Hive Iceberg Delta Lake Hudi File formats Proprietary columnar file format can be added Velox Block Memory Layouts Results Sent back to CPU in native layout The SPU is a native component in the customer composable infrastructure Support most common table & file formats Parquet ClickHouse Arrow

Logical flow from SQL to execution 5 Query physical plan
DAG S B F J P S J S Associate splits w/ task Task assignment E B F S Filter Join Build Exchange Scan Segregate operators into SPU & CPU parts Task planning E B F S SPU CPU Select A From Table B Where B.col1 = 5 Join Table C Join Table D SQL

Our Integration Approach 6 Velox/ DAXL SPU Send Splits Task
Plan S F P Segregate Plan SPU Operator Register SPU Operator SPU Operator Queue For Execution … SJ SJ a.parquet q.parquet z.parquet … Stream Results Results Batch … Execute Operator … Apply CPU Operators ShJ Results q1 q2 q3 qN Compute Server Coordinator Driven Optimizer Driven

Benchmarks setup – Prestissimo Clusters 7 NeuroBlade Prestissimo (Accelerated) Prestissimo
Cluster (Not Accelerated) Nodes 1 Coordinator , 4 Workers vCPU /node 64 vcores RAM /node 512GB Network /node 25Gbps SPU /worker 1 SPU SPU SPU SPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Coordinator (Lab Server) Workers (Lab Server) Coordinator (Lab Server) Workers (Lab Server) Nodes 1 Coordinator , 4 Workers vCPU /node 64 vcores RAM /node 512GB Network /node 25Gbps SPU /worker -

Prestissimo TPCH 10TB: Performance & Cost 8 Non-acceleration Total Execution
time [Sec] SPU Accelerated Total Execution time [Sec] 8-12x Performance Speedup 80% Cost Saving Non-acceleration Total Execution Cost [$] SPU Accelerated Total Execution cost [$] Improving Deployment efficiency by 20-30x

We are just getting started 9 • Detailed benchmark setup/results
• Ongoing software integrations • Test it yourself • #performance-baseline - Prestissimo / Velox Contact us

Thank You 10

NeuroBlade’s SPU Accelerates Velox by 10x

NeuroBlade’s SPU Accelerates Velox by 10x

Ali LeClerc

More Decks by Ali LeClerc

Other Decks in Technology

Featured

Transcript

1 NeuroBlade's SPU Accelerates Velox Krishna Maheshwari Chief Product Officer

2 ✦ Purposely built for Analytics ✦ Fits every server

3 NeuroBlade’s seamless integration approach ✓ >10x Performance Advancements ✓

4 All Table Formats supported out of the box Table

Logical flow from SQL to execution 5 Query physical plan

Our Integration Approach 6 Velox/ DAXL SPU Send Splits Task

Benchmarks setup – Prestissimo Clusters 7 NeuroBlade Prestissimo (Accelerated) Prestissimo

Prestissimo TPCH 10TB: Performance & Cost 8 Non-acceleration Total Execution

We are just getting started 9 • Detailed benchmark setup/results

Thank You 10