Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[POSETTE 2024] Postgres-powered AI: Running an End-to-End AI Platform with Postgres on Azure

[POSETTE 2024] Postgres-powered AI: Running an End-to-End AI Platform with Postgres on Azure

Ikigai Labs' AI platform, driven by Postgres, showcases its ability to handle diverse data types and benefit from a robust community ecosystem.

The platform's success comes from Postgres' ability to manage RDBMS advantages, such as ACID Compliance, alongside unstructured JSON data, ensuring compatibility with various data sources and programming languages. This versatility not only simplifies platform implementation but also enhances data interoperability of AI workloads.

The second pillar of Postgres' strength is its extensive community support, enabling Ikigai to integrate numerous open-source tools. By leveraging Postgres as a metadata store, the platform ensures compatibility with various data ingestion, processing, and analysis tools. This integration, coupled with Postgres' portability across cloud providers, enables Ikigai's AI platform to be easily deployable in any cloud environment, as demonstrated by its successful recent migration to Azure.

Ikigai recently launched the Azure version of its platform, previously exclusive to AWS, with under 3 months of migration timeframe. This achievement underscores the benefits of Postgres as the sole database choice, enabling Ikigai to expedite its migration process and capitalize on the platform's agility and scalability.

Jae Sim

June 12, 2024
Tweet

More Decks by Jae Sim

Other Decks in Technology

Transcript

  1. 1 1 Jaehyun Sim 5/16/24 POSETTE 2024 Postgres-powered AI Running

    an End-to-End AI Platform with Postgres on Azure 1
  2. 2 2 Jaehyun Sim Head of Engineering at Ikigai [email protected]

    linkedin.com/in/simjay 5/16/24 POSETTE 2024 Introduction 2
  3. 3 3 “Bring AI to Anyone” A new answer to

    old enterprise data problems • Designed specifically for multi-variate tabular and time series data • Generate forecasts on limited data • Operates on your own enterprise data 5/21/24 POSETTE 2024 Ikigai’s Mission 3
  4. 4 Where We Started Built an End-to-end AI Platform …on

    AWS as SaaS product To enable business users and data scientists to work seamlessly on one platform 5/21/24 POSETTE 2024
  5. 5 5 5/20/24 POSETTE 2024 Overview 5 Ikigai AI Platform

    Architecture An “Enterprise” AI Challenge Azure Adaptation Journey
  6. 7 7 5/21/24 POSETTE 2024 Ikigai AI Platform Architecture 7

    Why Postgres? o Fast, scalable, and predictable o Standard SQL database o Great JSON support o Great community support (Python)
  7. 8 8 5/21/24 POSETTE 2024 Ikigai Platform Features Ingest o

    Data Connectors o Dataset Mgmt Process o Data Preparation o Data Preprocessing AI/ML Workloads o Model Training o Model Inference Interface o Data Interaction o Data Visualization
  8. 9 9 Data Connectors o Various connectors with different substrate

    definitions o Internal states and metadata management for Airbyte Dataset Management o Metadata management for various filetypes, data types, and data formats 5/21/24 POSETTE 2024 Data Ingestion with Postgres 9
  9. 10 10 Data Type & Format Handling Managing the evolving

    data types and formats throughout the pipeline executions 5/21/24 POSETTE 2024 Data Processing with Postgres 10 Consistency Between Multiple Contexts Ensuring ACID compliance between remote Ray cluster and local execution within Python workers
  10. 11 11 5/21/24 POSETTE 2024 AI/ML Workloads with Postgres 11

    Hyperparameters & Version Control o Multiple versions exist for each model o Every version of the model has its own set of hyperparameters o All models have different definitions and requirements, which are stored in DB “Need to handle unstructured data but … still want to use traditional RDBMS experience”
  11. 12 12 5/21/24 POSETTE 2024 Data Visualization with Postgres 12

    Data Interaction o User-written IPython Notebook (JSON format) enables advanced data exploration o Internal states and metadata management for Jupyterhub Data Visualization o Different offerings of the dashboard types to offer more diverse visualizations o Internal states and metadata management for Superset and Dash
  12. 15 15 “Bring AI to Anyone” A new answer to

    old enterprise data problems • Designed specifically for multi-variate tabular and time series data • Generate forecasts on limited data • Operates on your own enterprise data 5/20/24 POSETTE 2024 Ikigai’s Mission 15
  13. 16 16 5/20/24 POSETTE 2024 Operating on Enterprise Data 16

    SaaS (Software-as-a-Service) BYOC (Bring Your Own Cloud) Powered by
  14. 18 18 5/20/24 POSETTE 2024 What We Evaluated Interoperability between

    Complex Requirements Adaptability for Infrastructural Limitations Compatibility with Open-source Tools
  15. 20 20 5/21/24 POSETTE 2024 Interoperability Between Complex Requirements 20

    Organization Structure User Tiers Authentication Credentials Postgres’ JSON Support o Diverging organization, team structure, and hierarchy management in uniform codebase o Easier to interact with the definitions with pre-existing Python toolkits o Maintains the API performance
  16. 23 What We Found Out… o Postgres boosts flexibility o

    We did not have to change much! o Almost zero adaptation effort was expected for components built with Postgres 5/20/24 POSETTE 2024
  17. 25 25 o We needed to find 1-to-1 mapping of

    all the AWS components to Azure services within the platform architecture o We needed to check if the corresponding components have all the features supported in their AWS counterparts o We needed to check if the platform performance metrics were maintained 5/20/24 POSETTE 2024 Challenges 25
  18. 26 26 o CI/CD infrastructure for Internal Users and Engineers

    o “Connector modules” to use Python Azure libraries o Logging mechanism o API gateway implementation o Authentication mechanism o Credential management … In short, everything but data/AI Workloads 5/21/24 POSETTE 2024 What We “Actually” Had to Change 26
  19. 27 27 5/18/24 POSETTE 2024 Timeline 27 January 31st, 2024

    Deployed to Azure Environment December 26th, 2023 Infrastructure Manifest Implemented November 6th, 2023 Development Finished September 27th, 2023 Technical Spec
  20. 28 28 o Flexibility: Postgres enables AI Platforms to handle

    dynamic AI workloads o Portability: Postgres-based infrastructure strategy eases the migration constraints o Performance: You can get the benefit of NoSQL with the interface of SQL o Ease of Use: It is harder to find a modern tool without Postgres Compatibility Conclusion: Postgres is an excellent choice of database for AI Platform Implementation 5/21/24 POSETTE 2024 Summary 28