Theseus: A composable, distributed, hardware agnostic processing engine

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Three Mega Trends Compute We need data processing engines built to leverage system-level acceleration - from compute, to memory, networking, and storage. We call this accelerator-native. The time is now to run distributed ETL workloads on hardware-accelerated infrastructure. Composability Composability lets teams augment, adapt or replace components of their data systems to unlock new use cases. Sometimes you can augment or other times you can adapt for new paradigms. By embracing composability, you can be proactive and ﬂexible. Community Open Standards allow faster innovation. We are building standards just like internet protocols but for data systems. We build, maintain, and contribute to core Open Standards and Open Source Projects such as Arrow, Ibis, RAPIDS, Substrait, Velox, and more.

Slide 3

Slide 3 text

Compute 01 Community 02 Composability 03

Slide 4

Slide 4 text

The Wall Spark vs. Theseus: TPC-H 10TB Benchmark Total Runtime (seconds) Cluster Cost per Hour ($) Spark EMR Theseus CPU performance is capped. No amount of money will jump over this wall. Note: Theseus: 1 Node 8 x A100 80 GB, Spark: 1 Node r5.8xlarge (AWS) 32 VCPU 32 GB

Slide 5

Slide 5 text

Scaling Datasets ✓ Up to 10 DGX Servers ✓ Parquet Files ✓ Remote File System ✓ Lots of Spilling ✘ No Sorting ✘ No Indexing ✘ No Caching ✘ No Warm Up (Cold Queries) TPC-H (10TB, 30TB, 100TB) Note: Theseus: 1 Node 8 x A100 80 GB, Spark: 1 Node r5.8xlarge (AWS) 32 VCPU 32 GB 10 TB 30 TB 100 TB 30 TB 10 TB Spark EMR (10 TB) Theseus (10 TB) The Total Wall Theseus (30 TB) Theseus (100 TB) Spark EMR (30TB)

Slide 6

Slide 6 text

Scale to problems too big for Spark Support efficient spilling out of GPU memory The only distributed GPU engine 72x faster, 71x cheaper Linearly scale to massive data problems Seamlessly move from Dev to Prod with the same code Built from the ground up for accelerators Evolve as enterprise needs change Framework Database Data Lake Walled Garden Monolith SaaS Data Warehouse File System Cloud Support service What it’s NOT Upgrade and diversify hardware by leveraging GPUs, x86, ARM, Infiniband, RoCE, NVLink, and more Harness underutilized generally available GPUs to improve economics Reduce data center footprint by 6x Confidently leverage GPUs, x86, ARM, and future hardware innovations for data preprocessing Process analytics and AI workflows within the same semiconductor Use multiple programming languages (Python, R, Java, Rust, C++) Operate on data where it is Theseus A composable, accelerated data processing engine built on GPUs.

Slide 7

Slide 7 text

Voltron Data Theseus A Compute Mesh unifying hardware, languages, and applications CSV Avro JSON Parquet ORC Laptop Servers Cloud 1. Accelerator-Native: Distributed query engine built from the ground up to take advantage of full system hardware acceleration. 2. Petabyte Scale: Focusing on problems too big and time sensitive for Spark 3. Composable: Built on open source standards that enables interoperability from storage to application 4. Evolutionary: A composable engine that seamlessly adapts to new hardware and languages 4 1 2 3 4

Slide 8

Slide 8 text

Compute 01 Community 02 Composability 03

Slide 9

Slide 9 text

Arrow Everywhere Modular standards: Widely adopted at >80M monthly downloads ● Internally all data is Arrow ● All data is returned as Arrow Allows users to: ● interoperate with Arrow Native libraries ● compose eﬃcient data pipelines faster ● extend functionality for evolving needs For more information on our thoughts on Modularity, Interoperability, Composability, and Extensibility check out https://voltrondata.com/codex

Slide 10

Slide 10 text

Design Freedom with Theseus ENGINE ENGINE ENGINE ENGINE Write in your language of choice DATAFRAMES Operate seamlessly on the same data inside more than 22 data engines ENGINE ● One analytics interface, many data engines ● No lock in or rewrites ● Support multiple user skill-sets ● No proprietary formats ● Reads and writes all common big data formats ● Fully integrates with existing data lakes Ibis Python API

Slide 11

Slide 11 text

Compute 01 Community 02 Composability 03

Slide 12

Slide 12 text

A Composable Data System

Slide 13

Slide 13 text

A Composable Data System ADBC

Slide 14

Slide 14 text

A Composable Data System ADBC

Slide 15

Slide 15 text

Why make Theseus run on CPU? ● Provides Theseus with extra ﬂexibility to run on clusters without GPUs, making it easier & cheaper for users test their workloads prior to scaling up ● Helps us break down system assumptions in our architectural development, to assure the composability of our architecture, while maintaining performance ● Paves the way for running CPU only UDFs ● Opens the door to optimizations via hybrid compute, therefore running some operations on the CPU while running others on the GPU. Theseus: Velox Backend Hardware agnostic for greater ﬂexibility

Slide 16

Slide 16 text

Theseus: Query Graph SELECT n.n_name, count(c.c_custkey) FROM customer AS c INNER JOIN nation AS n ON c.c_nationkey = n.n_nationkey WHERE n.n_nationkey < 10 GROUP BY n.n_name A Query Becomes a Query Graph

Slide 17

Slide 17 text

Theseus-Velox Backend: Development cuDF Executor (compute resources and GPU memory management) Velox Executor (compute resources and CPU memory management) Implemented each SQL operator using Velox

Slide 18

Slide 18 text

Theseus - Velox Current Benchmarks

Slide 19

Slide 19 text

Theseus - Velox Current Benchmarks

Slide 20

Slide 20 text

Theseus - Velox Backend: Current Deﬁciencies Each SQL Operator is independent, therefore no internal pipelining Theseus Pipeline (without internal pipelines) Velox Pipeline

Slide 21

Slide 21 text

Theseus - Velox Backend: Current Deﬁciencies Unnecessary copies and conversions are resulting in 50%+ perf loss Currently need to convert between RowVectors and ArrowTables. ● Primitive Types -> Zero Copy ● String columns -> Copy ● Columns with dictionary encoding -> Flatten (Copy)

Slide 22

Slide 22 text

Apache Arrow - Velox Alignment Mar ’24 ● Announced partnership - see blog ● Goal was to align and converge Apache Arrow with Velox Meta / VoDa Partnership ● Drive alignment in Apache Arrow and Velox OSS communities for seamless interoperability at zero cost Prior Arrow Releases ● 3 new format layouts developed: ○ StringView, ListView & Run-End-Encoding ● Adding new layouts to PyArrow & DuckDB is in-progress (some of this code has already merged) Arrow 15 Release ● Velox & Apache Arrow uniﬁed together on composable data management systems design - see the blog ● Adding Apache Arrow ListView and StringView support in Velox bridge is coming soon! Key Accomplishments & Plan Jun ’22 Dec ’23 Jan ’24

Slide 23

Slide 23 text

We are hiring!

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Theseus - Infrastructure Recommendations What would an ideal data system look like? Minimum Hardware NVIDIA DGX V100 (32GB GPUs) Servers 64GB RAM per NVIDIA GPU 100 Gbps Ethernet Networking, 1 NIC shared b/w 2 GPUs Ideal Hardware NVIDIA DGX H100/A100 80GB Servers 160GB RAM per NVIDIA GPU 200+ Gbps Inﬁniband Networking Inﬁniband Network Attached Storage Better Hardware NVIDIA DGX A100 (40GB GPUs) Servers 80GB RAM per NVIDIA GPU 100 Gbps RoCE Networking, 1 NIC per GPU RoCE Network Attached Storage