Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NeuroBlade’s SPU Accelerates Velox by 10x

NeuroBlade’s SPU Accelerates Velox by 10x

NeuroBlade collaborates with the Velox community to integrate its SPU (SQL Processing Unit), marking the first purpose-built acceleration solution for Velox, and achieving a 10x increase in computational speed and efficiency. This session will explore how this enables performance when examined through benchmarks. Attendees will be introduced to the integration of NeuroBlade’s DAXL (Data Analytics Acceleration) Framework which provides a high-level abstraction of the hardware and simplifies integration. Krishna Maheshwari will discuss the substantial benefits of hardware-accelerated analytics, demonstrating its potential to unlock innovative business opportunities.

Krishna Maheshwari
CPO at Neuroblade

Ali LeClerc

April 05, 2024
Tweet

More Decks by Ali LeClerc

Other Decks in Technology

Transcript

  1. 2 ✦ Purposely built for Analytics ✦ Fits every server

    with a free PCIe slot ✦ SW integration aligns with query engine architecture NeuroBlade - SQL Processing Unit (SPU) CUSTOMER ENGINE
  2. 4 All Table Formats supported out of the box Table

    formats ClickHouse Hive Iceberg Delta Lake Hudi File formats Proprietary columnar file format can be added Velox Block Memory Layouts Results Sent back to CPU in native layout The SPU is a native component in the customer composable infrastructure Support most common table & file formats Parquet ClickHouse Arrow
  3. Logical flow from SQL to execution 5 Query physical plan

    DAG S B F J P S J S Associate splits w/ task Task assignment E B F S Filter Join Build Exchange Scan Segregate operators into SPU & CPU parts Task planning E B F S SPU CPU Select A From Table B Where B.col1 = 5 Join Table C Join Table D SQL
  4. Our Integration Approach 6 Velox/ DAXL SPU Send Splits Task

    Plan S F P Segregate Plan SPU Operator Register SPU Operator SPU Operator Queue For Execution … SJ SJ a.parquet q.parquet z.parquet … Stream Results Results Batch … Execute Operator … Apply CPU Operators ShJ Results q1 q2 q3 qN Compute Server Coordinator Driven Optimizer Driven
  5. Benchmarks setup – Prestissimo Clusters 7 NeuroBlade Prestissimo (Accelerated) Prestissimo

    Cluster (Not Accelerated) Nodes 1 Coordinator , 4 Workers vCPU /node 64 vcores RAM /node 512GB Network /node 25Gbps SPU /worker 1 SPU SPU SPU SPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Coordinator (Lab Server) Workers (Lab Server) Coordinator (Lab Server) Workers (Lab Server) Nodes 1 Coordinator , 4 Workers vCPU /node 64 vcores RAM /node 512GB Network /node 25Gbps SPU /worker -
  6. Prestissimo TPCH 10TB: Performance & Cost 8 Non-acceleration Total Execution

    time [Sec] SPU Accelerated Total Execution time [Sec] 8-12x Performance Speedup 80% Cost Saving Non-acceleration Total Execution Cost [$] SPU Accelerated Total Execution cost [$] Improving Deployment efficiency by 20-30x
  7. We are just getting started 9 • Detailed benchmark setup/results

    • Ongoing software integrations • Test it yourself • #performance-baseline - Prestissimo / Velox Contact us