Simplify Big Data & AI on Spark and Ray with MLSQL (Dong Li, Kyligence)

Simplify Big Data & AI on Spark and Ray With
MLSQL Dong Li Head of Product, Kyligence Apache Kylin PMC Member

Agenda ◆ Background and Challenges ◆ Demo ◆ MLSQL on
Spark and Ray ◆ Deep Dive

2 Steps to Combine Big Data & AI

Is Python (based on Spark + Ray) enough? ◆ Python
is advanced programming skill ◆ Learn Python ◆ Learn PySpark ◆ Learn Ray programing ◆ No management for data ACL ◆ Intermediate storage is required for data exchange between Spark and Ray

MLSQL: Open Source SQL Variant for Big Data & AI
Unified Language and Platform for Data Management, Business Intelligence, and Machine Learning / AI

One-Stop for All Data Users

MLSQL Summary ◆ In Notebook ◆ All about SQL ◆
Seamless SQL & Python ◆ No PySpark ◆ Analyze and explore multiple data sources ◆ Support algorithms and feature engineering, support Python ecosystem ◆ Support Kylin and other analytical engines ◆ Non-intrusive, out-of-box data ACL ◆ Security on Plugin, Algorithm, Data and Directory ◆ Custom desensitization ◆ UDF and UDAF hot deployed ◆ Pluggable architecture ◆ User defined extension

Ray on Spark vs. Spark on Ray (Ray DP) HDFS/Object
Store Slow Raylet PySpark App Ray Object Store Quick Ray Cluster Spark Driver Executor Ray manager Raylet Executor Ray manager Raylet

The New Way: MLSQL on Spark + Ray Apps JDBC/Rest
API Proxy Server （Load Balance） MLSQL Engine Driver Executor Executor Executor Java Executor Python Deamon Python Worker Ray Cluster Ray Cluster Ray Cluster Yarn/K8s/Standalone/Local Yarn/K8s/Standalone/Local MLSQL Cluster Existing Ray Cluster

Why do it this way? ◆ Fusion mode, python may
impact the stability of big data cluster ◆ Traditionally, Data Landing is required ◆ MLSQL exchange data on the fly ◆ Ray is optional ◆ Users can provide multiple Ray clusters to select ◆ Traditionally, you need to learn Python/PySpark/Ray ◆ No need for PySpark with MLSQL

Deep Dive for Data Exchange base on PyJava Lib Learn
more from: https://github.com/allwefantasy/pyjava

Deep Dive for Data Exchange (In detail) Read Once server
Read Once server Read Once server python worker Ray client Actor 0 Actor 1 Actor 2 Read Once server Read Once server Read Once server

MLSQL is expected to bridge Data and AI, and become
the industrial standard of language interface. --- William Zhu, Author of MLSQL

Contact Us Kyligence Inc ◆ http://kyligence.io ◆ [email protected] ◆ Twitter:
@Kyligence Apache Kylin ◆ http://kylin.apache.org ◆ [email protected] ◆ Twitter: @ApacheKylin

Simplify Big Data & AI on Spark and Ray with ML...

Simplify Big Data & AI on Spark and Ray with MLSQL (Dong Li, Kyligence)

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

Simplify Big Data & AI on Spark and Ray With

Agenda ◆ Background and Challenges ◆ Demo ◆ MLSQL on

2 Steps to Combine Big Data & AI

Is Python (based on Spark + Ray) enough? ◆ Python

MLSQL: Open Source SQL Variant for Big Data & AI

One-Stop for All Data Users

MLSQL Summary ◆ In Notebook ◆ All about SQL ◆

Ray on Spark vs. Spark on Ray (Ray DP) HDFS/Object

The New Way: MLSQL on Spark + Ray Apps JDBC/Rest

Why do it this way? ◆ Fusion mode, python may

Deep Dive for Data Exchange base on PyJava Lib Learn

Deep Dive for Data Exchange (In detail) Read Once server

MLSQL is expected to bridge Data and AI, and become

Contact Us Kyligence Inc ◆ http://kyligence.io ◆ [email protected] ◆ Twitter: