Groq - IT Press Tour 53 Jan 2024 CA

© 2023 Groq, Inc. Groq Proprietary Under NDA Making AI
Real 1

© 2023 Groq, Inc. Groq Proprietary Under NDA 2 Agenda
02 How Groq is Different 03 Demo 01 Genesis of Groq® Use Cases 05 Go To Market 04 How Groq Works 08 Leadership & Investors 07 Case Study 06

Groq Proprietary Under NDA Genesis of Groq® 3

Groq Proprietary Under NDA The New Oil Wood Coal Oil
Bytes Tokens Iron Age Industrial Age Transportation Age Information Age Generative Age 4

© 2023 Groq, Inc. Groq Proprietary Under NDA 5 “AI
is just at the beginning of the S-curve. The near-term and long-term opportunities are enormous.” Satya Nadella - Microsoft CEO “The generative AI market is poised to explode over the next 10 years, expanding at a CAGR of 42%.” Bloomberg Intelligence “We have reached the tipping point of a new computing era.” Jensen Huang “Generative AI could add $2.6 trillion to $4.4 trillion annually…” “AI inferencing hardware alone in the data center will be 2x that for AI training hardware by 2025.” McKinsey & Company “Artiﬁcial intelligence could unlock $6 trillion….” Morgan Stanley “Spending on GenAI solutions will reach $143B in 2027 with a 5-Year CAGR of 73.3%” IDC 5 LLM Demand: Proven & Growing

© 2023 Groq, Inc. Groq Proprietary Under NDA 6 Training
vs Inference Run Time Weeks or months Milliseconds or seconds Key Requirement Parallelism is enough for training Low latency inference is sequential Best Architecture Graphics Processor Groq LPU™ Accelerator Frequency Periodic Persistent Inference is Not Training What got us here, won’t move us forward.

© 2023 Groq, Inc. Groq Proprietary Under NDA 7 7
From its start, Groq understood that the most valuable AI/ML problems would require multi-rack scale Groq LPU™ Inference Engine: Scale by Design …but in volume it’s quicker and cheaper to use an assembly line You can build a car in one location… …but at scale it’s quicker and cheaper to use an LPU Inference Engine You can compute an inference with a GPU that has a lot of external memory…

© 2023 Groq, Inc. Groq Proprietary Under NDA 8 https://www.youtube.com/watch?v=MRPK1rBl_rI&t=5s
IN ACTION Dataﬂow and Determinism

© 2023 Groq, Inc. Groq Proprietary Under NDA 9 Purpose:
Groq’s purpose is to preserve human agency while building the AI economy. Mission: Groq is on a mission to drive the cost of compute to zero.

© 2023 Groq, Inc. Groq Proprietary Under NDA © 2023
Groq, Inc. How Groq is Different 10

© 2023 Groq, Inc. Groq Proprietary Under NDA Focused on
Software Since the Beginning 11 11

© 2023 Groq, Inc. Groq Proprietary Under NDA GPU/CUDA Groq
12 12 Developer Productivity vs GPU Hand Coding Auto-Optimized in Days GroqIt (Compile) Business Problem Model Team Optimized Only After Months of Iteration System Team CUDA Kernel Developers Design Deployment Business Problem Hand Code

© 2023 Groq, Inc. Groq Proprietary Under NDA Small Software
Teams Thrive With Great Tools Groq Compiles Fast From Download to Running Llama 65B 5 days Llama 7B 5 days Llama 13B 4 days Llama-2 70B 5 days Code Llama 34B 4 days and many more ↦ 13 https://thenewstack.io/add-it-up-how-long-does-a-machine-learning-deployment-take/ https://www.scribd.com/document/616304173/Algorithmia-2020-State-of-Enterprise-ML

© 2023 Groq, Inc. Groq Proprietary Under NDA 14 https://www.semianalysis.com/p/google-we-have-no-moat-and-neither
“We Have No Moat” THE G00GLE MEM0

Groq Proprietary Under NDA How it Works 15

Groq Proprietary Under NDA 16 *The latency and relative cost
are based on a representative performance comparison of: 514 token input, 2014 token output with LLaMA2, 70B, int8, with the GPU running in a low latency mode (batch of 8). See, e.g., https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices Groq’s LPU™ Inference Engine ~100 Person.days to deploy ~10-30 Joules per token 10 Time to complete* 1-3 1 ~5

Groq Proprietary Under NDA 17 *The latency and relative cost
are based on a representative performance comparison of: 514 token input, 2014 token output with LLaMA2, 70B, int8, with the GPU running in a low latency mode (batch of 8). See, e.g., https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices ~100 Person.days to deploy ~10-30 Joules per token 10 Time to complete* 1-3 1 ~5 Low Latency That Users Demand Low Cost the Market Requires

Groq Proprietary Under NDA 18 18 Source: https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices; team analysis
↤ GPUs hit a high latency wall at high cost. ↤ Groq LPU™ systems are the best option for low latency and low cost. LPU™ System: Better Latency & Lower Cost Groq (2023) Groq (2024)

Groq Proprietary Under NDA 19 19 Token as a Service
Benchmarks Source: https://github.com/ray-project/llmperf-leaderboard Anyscale’s Benchmark GPU GPU GPU GPU GPU GPU GPU LPU

Groq Proprietary Under NDA 20 20 Groq LPU Architecture is
the Inference Answer Typical Graphic Processor (GPU) SRAM SRAM GroqChip™ 1 (LPU)

Groq Proprietary Under NDA 21 AllReduce Comparison Results ▪ Only
a handful of cycles to Read(vector) → Send(vector) enables ﬁne-grained communication across the 16 directly connected links on each TSP ▪ A100 system has approximately 3x higher network channel bandwidth ▪ When normalized, Groq TSP matches the bandwidth at large tensor size while signiﬁcantly improving bandwidth at intermediate tensor size Results for A100 were measured on an 8 A100 GPU system with 300 GB/s of NVlink bandwidth per GPU connected through NVSwitch. Nvidia results are from publicly available data on https://github.com/NVIDIA/nccl-tests/ Comparison made with an eight GPU A100 system with NCCL Typical LLM Inference Tensor Size Typical LLM Training Tensor Size (Batch-2048) GPU A100 Normalized A100 Perf Groq

Groq Proprietary Under NDA 22 22 ~100 Person.days to deploy
~10–30 Joules per token 10 Time to complete* 1–3 1 ~5

Groq Proprietary Under NDA 23 Source: https://www.nytimes.com/2023/10/10/climate/ai-could-soon-need-as-much-electricity-as-an-entire-country.html?searchResultPosition=1 130 TW.hrs/year 13
TW.hrs/year 10x Less Energy with Groq The less energy consumed, the less cooling required

Groq Proprietary Under NDA 24 24 Ease of Deployment for
Developer Efﬁciency ~100 Person.days to deploy ~10–30 Joules per token 10 Time to complete* 1–3 1 ~5

© 2023 Groq, Inc. Groq Proprietary Under NDA 25 Value
Proposition: The Groq LPU™ Inference Engine has demonstrated that it’s better, faster, and more affordable than the GPU for generative AI language inference.

Groq, Inc. Go-to-Market 26

© 2023 Groq, Inc. Groq Proprietary Under NDA 27 Sell
rack-scale compute clusters Run world-class open source models. Sell tokens-as-a-service Two-Prongs

© 2023 Groq, Inc. Groq Proprietary Under NDA Customers, Partners,
& more Target Market LLMs/Generative AI applications Enterprises building AI solutions Federal government Partners OEMs, Distributors, & Resellers 31

Groq Proprietary Under NDA Use Cases 32

Source: “The Economic potential of Generative AI: The next productivity frontier. McKinsey, 2023 LLM Use Cases: SAM Expansion RESPONSIVENESS IMPACT, $ BILLIONS 0 25 50 75 100 Millisecs 10ths of Secs Seconds Batch No alternative to Groq Batch translation Text classiﬁcation Document analysis Batch marketing Code analysis, doc generation Emergency Response Voice Assistants Content Creation and Mass Customization Customer Service: Real-time Coaching Immersive Gaming and NPC Fraud Prevention Chatbots and Virtual Assistants Software Co-pilots Database Generation

Groq, Inc. Case Study 34

Groq, Inc. Leadership & Investors 36

© 2023 Groq, Inc. Groq Proprietary Under NDA Tobi Crabtree
People & Talent Previously Intel 37 Michelle Donnelly Chief Revenue Officer Previously Salesforce Mark Heaps Brand & Creative Previously Apple | Google | Duarte Adam Tachner Corp Dev & CLO Previously Google | InvenSense | Qualcomm Jim Miller Engineering Previously Amazon | Qualcomm | Intel CLICK NAMES TO VIEW LINKEDIN PROFILES Estelle Hong Operations Previously Intel | US Army Leadership team prepared to navigate and maximize return on this incredible opportunity Jonathan Ross CEO, Founder Previously Google John Barrus Product Previously Google | Amazon 37 Knowledge & Experience

© 2023 Groq, Inc. Groq Proprietary Under NDA 38 Aileen
Black President, Groq Public Sector Previously Google | VMWare Leadership team prepared to navigate and maximize return on this incredible opportunity Andy Cunningham CMO (Advisory) Previously Apple | Regis McKenna Oskar Mencer CEO, Maxeler Previously Bell Labs | Stanford Dinesh Maheshwari Chief Technology Advisor Previously Silicon Catalyst | Cypress Igor Arsovski Head of Silicon Previously Google | Marvell | IBM Andrew Ling Compiler Lead Previously Intel Yaniv Shemesh Cloud and Systems Lead Previously Meta | F5 38 Value-Add Advisors & Outstanding Technical + Subsidiary Leads

Board of Directors Jay Zaveri Social Capital Founder of Dropbox-acquired CloudOn Jonathan Ross CEO & Founder Inventor of the Google TPU Andy Rappaport Independent Well-known technology strategist and investor Youngme Moon Independent Harvard Business School Ford Tamer Independent Sold Inphi to Marvell for $10B, current Marvell Board member

Groq - IT Press Tour 53 Jan 2024 CA

Groq - IT Press Tour 53 Jan 2024 CA

More Decks by The IT Press Tour

Other Decks in Technology

Featured

Transcript