1
1
Jaehyun Sim
5/16/24
POSETTE 2024
Postgres-powered AI
Running an End-to-End AI Platform
with Postgres on Azure
1
Slide 2
Slide 2 text
2
2
Jaehyun Sim
Head of Engineering at Ikigai
[email protected]
linkedin.com/in/simjay
5/16/24
POSETTE 2024
Introduction
2
Slide 3
Slide 3 text
3
3
“Bring AI to Anyone”
A new answer to old enterprise data problems
• Designed specifically for multi-variate tabular
and time series data
• Generate forecasts on limited data
• Operates on your own enterprise data
5/21/24
POSETTE 2024
Ikigai’s Mission
3
Slide 4
Slide 4 text
4
Where We Started
Built an End-to-end AI Platform
…on AWS as SaaS product
To enable business users and data
scientists to work seamlessly on one
platform
5/21/24
POSETTE 2024
Slide 5
Slide 5 text
5
5
5/20/24
POSETTE 2024
Overview
5
Ikigai AI Platform Architecture An “Enterprise” AI Challenge Azure Adaptation Journey
Slide 6
Slide 6 text
6
6
6
Ikigai AI Platform Architecture
5/17/24
POSETTE 2024
Slide 7
Slide 7 text
7
7
5/21/24
POSETTE 2024
Ikigai AI Platform Architecture
7
Why Postgres?
o Fast, scalable, and predictable
o Standard SQL database
o Great JSON support
o Great community support (Python)
Slide 8
Slide 8 text
8
8
5/21/24
POSETTE 2024
Ikigai Platform Features
Ingest
o Data Connectors
o Dataset Mgmt
Process
o Data Preparation
o Data Preprocessing
AI/ML
Workloads
o Model Training
o Model Inference
Interface
o Data Interaction
o Data Visualization
Slide 9
Slide 9 text
9
9
Data Connectors
o Various connectors with
different substrate definitions
o Internal states and metadata
management for Airbyte
Dataset Management
o Metadata management for various
filetypes, data types, and data formats
5/21/24
POSETTE 2024
Data Ingestion with Postgres
9
Slide 10
Slide 10 text
10
10
Data Type & Format Handling
Managing the evolving data types and formats
throughout the pipeline executions
5/21/24
POSETTE 2024
Data Processing with Postgres
10
Consistency Between Multiple Contexts
Ensuring ACID compliance between remote Ray cluster
and local execution within Python workers
Slide 11
Slide 11 text
11
11
5/21/24
POSETTE 2024
AI/ML Workloads with Postgres
11
Hyperparameters & Version Control
o Multiple versions exist for each model
o Every version of the model has
its own set of hyperparameters
o All models have different definitions
and requirements, which are stored
in DB
“Need to handle unstructured data but …
still want to use traditional RDBMS experience”
Slide 12
Slide 12 text
12
12
5/21/24
POSETTE 2024
Data Visualization with Postgres
12
Data Interaction
o User-written IPython Notebook (JSON format)
enables advanced data exploration
o Internal states and metadata
management for Jupyterhub
Data Visualization
o Different offerings of the dashboard types
to offer more diverse visualizations
o Internal states and metadata
management for Superset and Dash
Slide 13
Slide 13 text
13
13
5/21/24
POSETTE 2024
Are We Ready to Solve Problems?
13
Slide 14
Slide 14 text
14
14
14
An “Enterprise” AI Challenge
5/21/24
POSETTE 2024
Slide 15
Slide 15 text
15
15
“Bring AI to Anyone”
A new answer to old enterprise data problems
• Designed specifically for multi-variate tabular
and time series data
• Generate forecasts on limited data
• Operates on your own enterprise data
5/20/24
POSETTE 2024
Ikigai’s Mission
15
Slide 16
Slide 16 text
16
16
5/20/24
POSETTE 2024
Operating on Enterprise Data
16
SaaS (Software-as-a-Service) BYOC (Bring Your Own Cloud)
Powered by
18
18
5/20/24
POSETTE 2024
What We Evaluated
Interoperability between
Complex Requirements
Adaptability for
Infrastructural Limitations
Compatibility with
Open-source Tools
Slide 19
Slide 19 text
19
Interoperability Between Complex Requirements
5/21/24
POSETTE 2024
Slide 20
Slide 20 text
20
20
5/21/24
POSETTE 2024
Interoperability Between Complex Requirements
20
Organization
Structure
User Tiers
Authentication
Credentials
Postgres’ JSON Support
o Diverging organization, team structure, and
hierarchy management in uniform codebase
o Easier to interact with the definitions
with pre-existing Python toolkits
o Maintains the API performance
Slide 21
Slide 21 text
21
21
5/21/24
POSETTE 2024
Adaptability for Infrastructural Limitations
21
vs
vs vs vs
Slide 22
Slide 22 text
22
22
5/20/24
POSETTE 2024
Compatibility with Open-source Tools
22
Slide 23
Slide 23 text
23
What We Found Out…
o Postgres boosts flexibility
o We did not have to change much!
o Almost zero adaptation effort was
expected for components built
with Postgres
5/20/24
POSETTE 2024
25
25
o We needed to find 1-to-1 mapping of all the AWS
components to Azure services within the platform
architecture
o We needed to check if the corresponding components
have all the features supported in their AWS
counterparts
o We needed to check if the platform performance
metrics were maintained
5/20/24
POSETTE 2024
Challenges
25
Slide 26
Slide 26 text
26
26
o CI/CD infrastructure for Internal Users and Engineers
o “Connector modules” to use Python Azure libraries
o Logging mechanism
o API gateway implementation
o Authentication mechanism
o Credential management
… In short, everything but data/AI Workloads
5/21/24
POSETTE 2024
What We “Actually” Had to Change
26
Slide 27
Slide 27 text
27
27
5/18/24
POSETTE 2024
Timeline
27
January 31st, 2024
Deployed to Azure Environment
December 26th, 2023
Infrastructure Manifest Implemented
November 6th, 2023
Development Finished
September 27th, 2023
Technical Spec
Slide 28
Slide 28 text
28
28
o Flexibility: Postgres enables AI Platforms to handle dynamic AI workloads
o Portability: Postgres-based infrastructure strategy eases the migration constraints
o Performance: You can get the benefit of NoSQL with the interface of SQL
o Ease of Use: It is harder to find a modern tool without Postgres Compatibility
Conclusion: Postgres is an excellent choice of database for AI Platform Implementation
5/21/24
POSETTE 2024
Summary
28
Slide 29
Slide 29 text
29
29
Thank You
www.ikigailabs.io
5/16/24
POSETTE 2024
29