Slide 1

Slide 1 text

Introduction to WEKA

Slide 2

Slide 2 text

The AI-Native Data Platform

Slide 3

Slide 3 text

275 OVER Customers 11 INCLUDING of the Fortune 50* $275M OVER Capital Raised 100% YoY Growth Backed by World-Class Investors 350 OVER Employees

Slide 4

Slide 4 text

Software Business, Unique RTM Cloud RESELLER (PRICE LISTED, COMPED & QUOTA’D) OEM MEET IN THE CHANNEL On-Premises AVAILABLE IN ALL HYPERSCALER MARKETPLACES

Slide 5

Slide 5 text

Containers & Microservices Our Long-Term Vision A Single, Highly Performant, Scalable Software Platform Solution for GenAI, Hybrid Cloud & Edge AI Data Pipelines Tier 1 Application (ERP & CRM) Data Lakes & Data Warehouses WEKA Data Platform Ubiquitous Storage & Data Services across Hybrid Cloud & Edge Scientific Computing Datacenter Core Multi-Cloud Near-Edge

Slide 6

Slide 6 text

The Infrastructure Triangle Compute Network Storage

Slide 7

Slide 7 text

“Vintage” Storage The Infrastructure Triangle 400/800 Gigabit 1000x Faster

Slide 8

Slide 8 text

Top technical inhibitors to AI/ML success… 20% 26% 32% Compute Performance Security Data Management

Slide 9

Slide 9 text

Hybrid Cloud Data Pipelines Data Liquidity Unified Access AI Requires a Radical New Approach to Data Management

Slide 10

Slide 10 text

From Data Silos To Data Pipelines The key to successfully deploying AI at scale Ingest DATA COPY DATA COPY DATA COPY DATA COPY DATA COPY Pre- Process Train Validate Infer Archive

Slide 11

Slide 11 text

The WEKA Data Platform 100x Performance >> 14 Exabyte Scale >> 25% Cost Ingest Pre- Process Train Validate Infer Archive GPU Direct NFS POSIX S3 SMB HDFS Zero Copy Architecture | Zero Tuning IO Algorithms

Slide 12

Slide 12 text

Four Pillars of a Data Platform Effectively leverage data across organization Eliminate data silos to accelerate innovation Multi-Workload Massive ingest bandwidth Mixed read/write handling Ultra-low latency Multi-Performant Across on-premises, cloud, or hybrid environments Easy data mobility between locations Multi-Location Easily scale along with project Up & down, elastically Without disruption or degradation Multi-Scale Data Source Data Source Data Source Data Source Data Source Data Source Data Platform

Slide 13

Slide 13 text

One Architecture Delivers on FOUR Promises Mindbending Speed • Unbeatable file and object performance • No tuning required Effortless Sustainability • Reduce carbon emissions • Cut data pipeline idle time • Extend the usable life of your hardware • Move workloads to the cloud. Seductive Simplicity • Eliminates storage silos across on-prem and cloud. • Single, easy-to-use data platform for whole pipeline Infinite Scale • Linear scale • Scale compute and storage independently • Trillions of files of all data types and sizes.

Slide 14

Slide 14 text

CV/AI Pipeline IOPS/BANDWIDTH FILE SIZE READ/WRITE NUMBER OF FILES NLP Pipeline Sequencing Pipeline Ingest Pre- Process Train Validate Infer Archive GPU Direct NFS POSIX S3 SMB HDFS Zero Copy Architecture | Zero Tuning IO Algorithms Generative AI Pipeline HPDA Pipeline

Slide 15

Slide 15 text

Designed to Run Concurrent Workloads Can run diverse concurrent workloads Industry-leading metadata performance Operational efficiency with storing billions of small files Read and write performance on small and large file operations

Slide 16

Slide 16 text

How Much Faster is WEKA at AI & Deep Learning? 1PB WEKA 1PB Other vs 88x More Read IOPS 14x More Write Bandwidth 7x More Read Bandwidth 34x More Write IOPS WEKA Data Platform DDN Lustre AI400X2 IBM Spectrum Scale Pure Flashblade NetApp AFF800 Vast Data IOPS: 1PB of Storage More IOPS = Faster Training Traditional Parallel FS All Flash NAS & Cloud Native Storage Source: WEKA comparisons based on publicly available data from other vendors

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

And We Guarantee It 2x Performance vs All Flash NAS ½ Price Cloud Guarantee vs Market

Slide 19

Slide 19 text

Top Use Cases AI / ML Generative AI GPU Cloud Financial Services & Trading Pharma, Health & Life Sciences Media & Entertainment

Slide 20

Slide 20 text

WEKA Customers

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

160,000SQ/FT LED Display 4X 16K Video displays 1.5PB Data 412GB/S Video streaming data 16K Video Resolution 167,000 Independent audio channels

Slide 24

Slide 24 text

Rethink Infrastructure >1EB in Single Dataset Multiple Models Pushing the Frontier of Generative AI

Slide 25

Slide 25 text

CPU Cores NVMe SSDs WEKAFS Global Filesystem Saturate Existing GPU Resources Single High-Performance Pool for Apps and Data Lower Costs and Energy 80% GPU Utilization 40x Data Performance Pushing the Frontier of Generative AI Combining LLM, text, images, video, and audio into immersive generative experiences

Slide 26

Slide 26 text

Cloud-based GPU Data Pipelines Data Volumes of 30M Files Recognition of 4,000 Targets AI for Faster Drug Discovery

Slide 27

Slide 27 text

Solving the Challenge of Sluggish Data 3X faster small file access 12X faster epoch times 2x faster modal training “ WEKA running in Cloud is game- changing for us. We’re able to run experiments in less than a week instead of three months or more. ” Jon Sorenson, PhD, VP of Technology Development, Atomwise

Slide 28

Slide 28 text

GPU Clouds

Slide 29

Slide 29 text

GPU Cloud "WEKA is the only data platform that delivers the speed, scale, and simplicity required by large-scale GPU clouds.” - Cory Hawkvelt, CTO NexGen Cloud

Slide 30

Slide 30 text

GPU Cloud "...One of the things I think WEKA excels at, is they deliver really, really quick data to those GPUs so that they can be utilized to the max.” - Mike Maniscalco, CTO Applied Digital

Slide 31

Slide 31 text

Confidential Embargo Notice The following two sections of this presentation are under embargo and strictly confidential until made public Embargo dates are marked throughout - or will be confirmed by WEKA later next week.

Slide 32

Slide 32 text

IT Press Tour Exclusive: Performance Benchmarks Confidential: Under Embargo Until March 14.

Slide 33

Slide 33 text

Performance overview • WEKA’s “zero-tuning” architecture is highly performant across all types of IO. • Massive parallelization of all data and metadata operations • Use of kernel bypass technologies • Patented data layout, protection and more • Software defined allows it to run on-prem or in the cloud • Flexibility to use any hardware (physical or virtual) and utilize latest clients • Rapid adoption of new HW as needed • Performance Benchmarking strategy • Use published, audited, repeatable and transparent benchmarks to show performance leadership • Performance may be raw #, synthetic metric ($/IOP, #jobs/TB, #cores/GB/sec, etc), efficiency, and more • Mix On-prem with cloud benchmarks to showcase flexibility Confidential: Under Embargo Until March 14.

Slide 34

Slide 34 text

Results Benchmark Infrastructure Target Results SPEC_ai_image Azure 64x ls8v3 backends and 16x d64sv3 clients Show efficiency in Azure for AI workloads Beat Qumulo SPEC_ai run by 175% in raw performance, while only being 64% of the infrastructure cost. When a latency factor is put in, WEKA effectively can do 2.5x the number of jobs in the same time that Qumulo takes, and our effective cost per job is only 25% of Qumulo. SPEC_ai_image AWS 40x i3en.24xlarge backends and 40x c5n.18xlarge clients Dominate Qumulo result, #1 position in category 6x higher load count than Qumulo. Infrastructure costs impacted the effective cost per job, but still only76% the cost per job of Qumulo. Big or small, in multiple clouds, WEKA is faster and has a better cost-per-job than a competitor. #1 result SPEC_eda_blended AWS 40x i3en.24xlarge backends and 40x c5n.18xlarge clients Beat NetApp on-prem result. #1 position in category NetApp:6300 jobs at 1.39ms ORT. WEKA in AWS: 6310 jobs at 0.87ms ORT. WEKA is 60% faster response time. In the cloud. Against the NetApp fastest 8-node on-prem system they've got (A900 NVMe). Effective result is that WEKA can process over 10,000 jobs in the same time they can do 6300. #1 result Confidential: Under Embargo Until March 14.

Slide 35

Slide 35 text

Results Benchmark Infrastructure Target Results SPEC_vda AWS 40x i3en.24xlarge back ends and 40x c5n.18xlarge clients #1 position in category WEKA still owns the #1 spot from 2 years ago. (8000 streams) We beat it with 12000 streams. On-prem or in cloud, WEKA is the highest performing video platform around. #1 result SPEC_genomics AWS 40x i3en.24xlarge back ends and 40x c5n.18xlarge clients #1 position in category No direct competitor in category, so take over #1 spot from niche DAS player (UBIX technology). 2200 jobs achieved for the #1 result SPEC_swbuild AWS 40x i3en.24xlarge back ends and 40x c5n.18xlarge clients Beat NetApp, #1 position in category Raw, WEKA achieved 3500builds with a ORT of 0.74ms. So, #2 overall. BUT WAIT... NetApp 8-node A900 NVMe system did 6120 builds at 1.58ms ORT. WEKA's advantage of being ½ the latency means an effective 7472 builds in the same time. So, we have an effective #1 result. Confidential: Under Embargo Until March 14.

Slide 36

Slide 36 text

Performance Summary § Effective cost leadership in the cloud for AI workloads, Ability to scale up for AI workloads as needed § Overall performance against ANY workload with no ongoing tuning (Placement groups, number of FE’s) § Metadata/mixed IO/latency performance matters: It’s not just about throughput § Cloud Performance that can beat On-Prem § #1 in all SPEC categories, whether it's a raw number or effective# • Futures • STAC-M3, ML-Perf, STAC-ML, IO-500 and others • Mix of cloud and on-prem benchmarking including WEKApod base config. • Transparent, publicly documented and repeatable synthetic results (FIO, El Bencho, VDbench, etc.) Confidential: Under Embargo Until March 14.

Slide 37

Slide 37 text

WEKA & NVIDIA Partnership News Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 38

Slide 38 text

2020 2021 2023 2024 Implement and Qualify NVIDIA GPUDirect® Storage (GDS) One of the first NVIDIA DGX BasePOD-certified datastores Reference architecture for NVIDIA DGX BasePOD with DGX H100 Systems WEKApod certified for DGX SuperPOD with DGX H100 Systems + 2019 NVIDIA Invests in WEKA Series C Round Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 39

Slide 39 text

Meet WEKApod for NVIDIA DGX SuperPOD™ Systems Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 40

Slide 40 text

The WEKApod Clustered storage appliance for the WEKA Data Platform Certified for NVIDIA DGX SuperPOD™ Systems Integrated with NVIDIA Base Command Manager From 8 to hundreds of nodes Simplifies the WEKA experience Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 41

Slide 41 text

8-node WEKApod Base Configuration 4-node Expansion Performance & Capacity 1PB* 765 GB/s 18.3M IOPS +0.5PB* +382 GB/s +9.1M IOPS Best-in-Class Performance *Usable capacity with 5+2 striping and 1 virtual hot spare Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 42

Slide 42 text

…and Performance Density 1 Rack Unit >95 GB/s Read Bandwidth >23 GB/s Write Bandwidth 2.3M IOPs Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 43

Slide 43 text

The World’s Fastest and Most Sustainable Data Infrastructure for AI Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 44

Slide 44 text

Read Performance Efficiency 54.5 kW draw 1.0 TB/s R BW 4.8M IOPS 7.0 kW draw 1.0 TB/s W BW 25.0M IOPS 11 Rack Units 2 Racks 80 Rack Units Same Bandwidth 5x More IOPS 7x Less Rack Space 8.4X Less Power Draw 1/4 Rack Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 45

Slide 45 text

Write Performance Efficiency 327 kW draw 1.0 TB/s W BW 28.8M IOPS 25.4 kW draw 1.0 TB/s W BW 91.3M IOPS 40 Rack Units 10 Racks 420 Rack Units 1 Rack Same Bandwidth More IOPS 10x Less Rack Space 12X Less Power Draw Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 46

Slide 46 text

Checkmate on Checkpoints Fit, Eval Checkpoint Fit, Eval Checkpoint Checkpoint Save, Deploy … Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 47

Slide 47 text

Ultimate Choice of Deployment NVIDIA DGX SuperPOD Sustainable AI at Scale World’s Fastest AI Infrastructure The WEKA Data Platform on WEKApod Performant infrastructure for highly parallelized workloads On-Premises, Cloud, Hybrid, GPU Cloud, and Turnkey Solutions Better power efficiency, GPU efficiency Certified high performance datastore Confidential: Under Embargo Until Made Public (WEKA TO CONFIRM)

Slide 48

Slide 48 text

Thank You! @wekaio /wekaio @wekaio