Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hopsworks - IT Press Tour #57 Sep. 2024

Hopsworks - IT Press Tour #57 Sep. 2024

The IT Press Tour

September 03, 2024

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Hopsworks Origin Story Founded in 2018, Hopsworks was founded by

    a research team from KTH, RISE, and MySQL. Considered the pioneering platform for data and AI, Hopsworks integrates data science, data engineering, and machine learning into a cohesive platform - an AI Lakehouse. Raised €12.25m over 3 funding rounds Investors
  2. Jim Dowling / CEO Co-Founder. Prior MySQL and KTH Professor.

    Feature Store Community Leader. O’Reilly Author Team: 36 Offices; ⭐ Stockholm London Silicon Valley 3 Mikael Ronström / VP Data Inventor of NDB Cluster, MySQL/ Oracle. World Renowned Database Expert. Fabio Buso / VP Engineering Co-founder and Engineering Lead. Community Speaker. Technical Leadership Business Leadership Lars Nordwall / Board Chair Former President and COO of Neo4j. Silicon Valley + 20 years Darryl Salas /VP Americas Former Leader at Neo4j, Legal and Technical Background. Rik Van Bruggen /VP EMEA Former VP at Neo4j, Community leader in AI. World Class Team
  3. TREND Analytical Data is moving to the Lakehouse for lower

    cost and less vendor lock-in 4 DISAGGREGATION OF THE ANALYTICAL DATA STACK The Lakehouse opens up the data stack - separating query engines from storage. PROBLEM WITH AI However, Real-Time AI and Python are second class citizens in the Lakehouse.
  4. VISION Better Models in Production Faster through a Unified Data

    Layer for AI MISSION Build the highest performance, open and modular AI Lakehouse that will power all classes of AI system. 5
  5. // What Do We Solve THE REAL AI CHALLENGES Hard

    to Operationalise Data Sources are Disparate, Teams are Siloed, Systems are not Unified, Frameworks not compatible Tough to Extract Value From 48% of Models Never Make it to Production [Gartner 24] Slow to Build your own AI Platform Est. 12 to 24 month to build an AI Platform, even with state-of-the art open source tooling. Expensive to Maintain FTEs in the Hundreds of Thousands of USD per year, Cloud Costs in the Millions, Timelines in Years.
  6. // Lakehouse and MLOPs are not “Connected” AI Assets AI

    Pipelines & AI Apps Query Engines Data Integration (Fivetran, Airbyte,etc) BI Tools (Tableau, Looker, etc) Event Bus (Kafka, Kinesis, Red Panda, etc) Steaming Engine (Flink, Spark, Feldera, etc) SQL Query Engine (Spark, DuckDB, BigQuery, Trino, Snowflake, etc) Catalog (Hive, Unity Catalog, Polaris, Iceberg REST API) Table Format (Delta, Iceberg, Hudi) Storage (S3, ADLS, etc) Fine-Tunin g RAG Sklearn Pytorch Polars Pandas Lakehouse DISCONNECTED Monitoring Feature Serving & Registry Model Serving & Registry Vector Index MLOps Platforms
  7. // The Hopsworks AI Lakehouse AI Pipelines Engines Event Bus

    (Kafka) Steaming Engine (Flink, Spark Streaming, Feldera, etc) SQL / Batch Query Engine (Spark, DuckDB, Polars, Pandas, etc) Catalog (Hive, Hopsworks) Table Format Delta, Hudi, (Iceberg coming soon) Storage (S3) Fine-Tuning RAG/Agentic Sklearn Pytorch Polars Pandas Unified Data/Model Monitoring AI Query Engine (Hopsworks Query Service) Data Integration (Fivetran, Airbyte,etc) BI Tools (Tableau, Looker, etc) AI Infrastructure Services Feature Serving DB Model Serving Vector Index Unstructured Datasets
  8. // What Do We Solve Effort Required 3 month 1

    year 2 year More Models in Prod More Models in Production Models producing Value Productivity 6 month Build Hopsworks 9 month 15 month 18 month 21 month 1st Model In Production ≈ 500k - 1m USD 1st Model In Production Predictable Cost & Output With Better Performance
  9. // Use Cases Model and pipelines in production within few

    weeks reaching ROI faster than any other platform LLMs Fraud Predictive Analytics Customer Service RecSys Model Inference transform Data Input Model Train Prediction Data Warehouse Databases Realtime Documents Vector DB Graph DB ANY SOURCE USE CASES
  10. The Leading Real-Time AI Platform 11 High Performance & Unbreakable

    Extreme Performance at Any Scale & 6 9s of Availability Only Feature Store with Cross Region Replication *Published at SIGMOD VLDB 2024 Online Read 1ms Latency Offline Read Highest Throughput Offline Write Highest Throughput Compared to competitors Link to Benchmark (Table 3) Compared to competitors Link to Benchmark (Table 1) Compared to competitors Link to Benchmark (Table 2) 7x 9x* 12x*
  11. // Case Studies “We have improved the productivity of our

    Data Scientists by a factor of 50% with Hopsworks, who have been a great partner bringing AI to production at Allstate.” “We train and run large language models in production thanks to Hopsworks. LLMs analyze CVs and advertisements for vulgarity and bias. Thanks to Hopsworks, we are the only AI-powered Government Body in Sweden”. “We have seen 3-4x productivity gain for our Data Scientists and an ROI within 6 Months of using Hopsworks.”
  12. 13 Hopsworks widely adopted Cancer Research Karolinska Institute is one

    of the world’s foremost medical universities. Financial institution ~1.2 million members and 120 offices worldwide. Recruitment ~700,000 new job seekers per year. The largest public employment service in Sweden. Banking Largest bank in Ecuador. Fraud and money laundering using AI. Human Health & Exposome HEAP provides an open access, technical research platform to assess the impact of the exposome on human health. Sports Betting Part of Flutter Entertainment, the 3rd Biggest sport betting company in the world. Annual turnover ~$7Bn. SERVERLESS / COMMUNITY +6000 Global Users Hopsworks Serverless is the free and managed version of Hopsworks that allows users to build their ML powered APPs for free. Users can start with Serverless Hopsworks without any infrastructure costs or management and deploy their model to production in minutes. Trusted by Data Scientists From & more
  13. Supporting LLMs with Lakehouse Data Engines Steaming Engine SQL /

    Batch Query Engine Catalog Table Format Storage Fine-Tuning (FT) Retrieval Augmented Generation (RAG) Vector Index LAKEHOUSE DATA INSTRUCTION DATASET Large Language Model (LLM) ANN SEARCH Create vector embeddings from Lakehouse Data using a vector embedding model. Function Calling QUERY
  14. 16 // Roadmap Open Source Serverless Enterprise Open Source Self

    service version of Hopsworks, No security, No user Management, No High Availability, Limited resources Management. Aimed at internal teams and side projects. No SLA Serverless Managed Hopsworks, No compute (possible future paid options). Data is on Hopsworks infrastructure. Medium SLA. Enterprise Fully Edition of Hopsworks, High Levels of SLAs and Support. Always first to access new releases and features only edition with infrastructure management tooling and support available.
  15. 18 // Growth Path 1 Model Pipeline Real-time & >10

    pipelines Scalability & >10k pipelines Feature engineering infrastructure Full ML Lifecycle Ingestion and storage Community Enterprise PAID Open Source Market Penetration Through Value Machine Learning segment relies heavily on open-source. Our release Kubernetes release will fast track the conversation with more enterprise customers; bypass procurements and team politics to start building on Hopsworks OS and generate value faster. 12 month
  16. Thought Leadership Enterprise Webinars & Events, Exclusive Roundtables. Resource &

    Knowledge Sharing Use Cases, Blueprints, Whitepapers, Case Studies with Major Clients. Targeted Outreach Automated and Account-Based Outreach, Tailored Demos. 19 // Growth Strategy Enterprise Experienced GtM Leveraging the massive experience of our GtM team in Enterprise Sales we are building relationships with major enterprises worldwide. Grassroots Engagement Practitioners, Blogs, Tutorials, User Forums, Q&A Sessions. Content & Collaboration Articles & Content, Integration with Third-party Tools, Open Source Contributions. Product Accessibility Kubernetes & Serverless Hopsworks Community Solid Community With participation in 10s of community and building our own user base of Serverless and OSS users we can leverage the User Base as a Driver for Growth
  17. 21 https://ai-infrastructure.org/wp-content/uploads/2024/03/The-State-of-AI-Infrastructure-at-Scale-2024.pdf // Subscription-based Pricing (Cloud or On-Premises) Dev/Staging Virtual

    Machines VM type GCP Head Node 1 n2-standard-8 Workers 1 n2-standard-4 RonDB 1 e2-standard-4 Uptime Percentage 40.00% HA Production (Online) Virtual Machines type Head Node 1 n2-standard-8 Workers 1 n2-standard-8 RonDB 2 e2-highmem-8 RonDB Query Brokers 2 n2-Standard-4 Model Serving Containers 10 e2-highmem-8 Uptime Percentage 100.00%
  18. // Support model Standard Support: Ticketing System Support Team Access

    Business Hours Cover Hopsworks Support Subscription Response Times Support Level Tier-3 Community Documentation Code Examples Educational Events Community Support: Community Forum Technical Advisory Tier-2 Standard Support Certified Deployment Break-Fix Scope Best Practices & Software Validation 3 x Support Contacts Professional Services Training <4 Hours Response time Capabilities Proactive Support Consultative Scope Customer Success Manager 6 x Support Contacts Dedicated, expert Support Enterprise Support: Named Technical Support Engineer 24x7x365 Production Cover Tier-1 Enterprise Support <1 Hour Response time ~1 week Response time
  19. Building Machine Learning Systems with a Feature Store Batch, Real-Time,

    and LLM Systems Download the first two Chapters for Free! hopsworks.ai/lp/oreilly-book-building-ml-systems-with-a-feature-store