Slide 1

Slide 1 text

1 1 Jaehyun Sim 5/16/24 POSETTE 2024 Postgres-powered AI Running an End-to-End AI Platform with Postgres on Azure 1

Slide 2

Slide 2 text

2 2 Jaehyun Sim Head of Engineering at Ikigai [email protected] linkedin.com/in/simjay 5/16/24 POSETTE 2024 Introduction 2

Slide 3

Slide 3 text

3 3 “Bring AI to Anyone” A new answer to old enterprise data problems • Designed specifically for multi-variate tabular and time series data • Generate forecasts on limited data • Operates on your own enterprise data 5/21/24 POSETTE 2024 Ikigai’s Mission 3

Slide 4

Slide 4 text

4 Where We Started Built an End-to-end AI Platform …on AWS as SaaS product To enable business users and data scientists to work seamlessly on one platform 5/21/24 POSETTE 2024

Slide 5

Slide 5 text

5 5 5/20/24 POSETTE 2024 Overview 5 Ikigai AI Platform Architecture An “Enterprise” AI Challenge Azure Adaptation Journey

Slide 6

Slide 6 text

6 6 6 Ikigai AI Platform Architecture 5/17/24 POSETTE 2024

Slide 7

Slide 7 text

7 7 5/21/24 POSETTE 2024 Ikigai AI Platform Architecture 7 Why Postgres? o Fast, scalable, and predictable o Standard SQL database o Great JSON support o Great community support (Python)

Slide 8

Slide 8 text

8 8 5/21/24 POSETTE 2024 Ikigai Platform Features Ingest o Data Connectors o Dataset Mgmt Process o Data Preparation o Data Preprocessing AI/ML Workloads o Model Training o Model Inference Interface o Data Interaction o Data Visualization

Slide 9

Slide 9 text

9 9 Data Connectors o Various connectors with different substrate definitions o Internal states and metadata management for Airbyte Dataset Management o Metadata management for various filetypes, data types, and data formats 5/21/24 POSETTE 2024 Data Ingestion with Postgres 9

Slide 10

Slide 10 text

10 10 Data Type & Format Handling Managing the evolving data types and formats throughout the pipeline executions 5/21/24 POSETTE 2024 Data Processing with Postgres 10 Consistency Between Multiple Contexts Ensuring ACID compliance between remote Ray cluster and local execution within Python workers

Slide 11

Slide 11 text

11 11 5/21/24 POSETTE 2024 AI/ML Workloads with Postgres 11 Hyperparameters & Version Control o Multiple versions exist for each model o Every version of the model has its own set of hyperparameters o All models have different definitions and requirements, which are stored in DB “Need to handle unstructured data but … still want to use traditional RDBMS experience”

Slide 12

Slide 12 text

12 12 5/21/24 POSETTE 2024 Data Visualization with Postgres 12 Data Interaction o User-written IPython Notebook (JSON format) enables advanced data exploration o Internal states and metadata management for Jupyterhub Data Visualization o Different offerings of the dashboard types to offer more diverse visualizations o Internal states and metadata management for Superset and Dash

Slide 13

Slide 13 text

13 13 5/21/24 POSETTE 2024 Are We Ready to Solve Problems? 13

Slide 14

Slide 14 text

14 14 14 An “Enterprise” AI Challenge 5/21/24 POSETTE 2024

Slide 15

Slide 15 text

15 15 “Bring AI to Anyone” A new answer to old enterprise data problems • Designed specifically for multi-variate tabular and time series data • Generate forecasts on limited data • Operates on your own enterprise data 5/20/24 POSETTE 2024 Ikigai’s Mission 15

Slide 16

Slide 16 text

16 16 5/20/24 POSETTE 2024 Operating on Enterprise Data 16 SaaS (Software-as-a-Service) BYOC (Bring Your Own Cloud) Powered by

Slide 17

Slide 17 text

17 17 5/20/24 POSETTE 2024 Our Decision 17 https://www.statista.com/chart/30489/revenue-from-cloud-services-by-cloud-sector-market-leaders “Flexibility” Interactivity Scalability

Slide 18

Slide 18 text

18 18 5/20/24 POSETTE 2024 What We Evaluated Interoperability between Complex Requirements Adaptability for Infrastructural Limitations Compatibility with Open-source Tools

Slide 19

Slide 19 text

19 Interoperability Between Complex Requirements 5/21/24 POSETTE 2024

Slide 20

Slide 20 text

20 20 5/21/24 POSETTE 2024 Interoperability Between Complex Requirements 20 Organization Structure User Tiers Authentication Credentials Postgres’ JSON Support o Diverging organization, team structure, and hierarchy management in uniform codebase o Easier to interact with the definitions with pre-existing Python toolkits o Maintains the API performance

Slide 21

Slide 21 text

21 21 5/21/24 POSETTE 2024 Adaptability for Infrastructural Limitations 21 vs vs vs vs

Slide 22

Slide 22 text

22 22 5/20/24 POSETTE 2024 Compatibility with Open-source Tools 22

Slide 23

Slide 23 text

23 What We Found Out… o Postgres boosts flexibility o We did not have to change much! o Almost zero adaptation effort was expected for components built with Postgres 5/20/24 POSETTE 2024

Slide 24

Slide 24 text

24 24 24 Azure Compatibility Journey 5/20/24 POSETTE 2024

Slide 25

Slide 25 text

25 25 o We needed to find 1-to-1 mapping of all the AWS components to Azure services within the platform architecture o We needed to check if the corresponding components have all the features supported in their AWS counterparts o We needed to check if the platform performance metrics were maintained 5/20/24 POSETTE 2024 Challenges 25

Slide 26

Slide 26 text

26 26 o CI/CD infrastructure for Internal Users and Engineers o “Connector modules” to use Python Azure libraries o Logging mechanism o API gateway implementation o Authentication mechanism o Credential management … In short, everything but data/AI Workloads 5/21/24 POSETTE 2024 What We “Actually” Had to Change 26

Slide 27

Slide 27 text

27 27 5/18/24 POSETTE 2024 Timeline 27 January 31st, 2024 Deployed to Azure Environment December 26th, 2023 Infrastructure Manifest Implemented November 6th, 2023 Development Finished September 27th, 2023 Technical Spec

Slide 28

Slide 28 text

28 28 o Flexibility: Postgres enables AI Platforms to handle dynamic AI workloads o Portability: Postgres-based infrastructure strategy eases the migration constraints o Performance: You can get the benefit of NoSQL with the interface of SQL o Ease of Use: It is harder to find a modern tool without Postgres Compatibility Conclusion: Postgres is an excellent choice of database for AI Platform Implementation 5/21/24 POSETTE 2024 Summary 28

Slide 29

Slide 29 text

29 29 Thank You www.ikigailabs.io 5/16/24 POSETTE 2024 29