Slide 1

Slide 1 text

牧野 祐己 CTO PLAID 2019 年 8 月 1日 D2-3-S09: 秒間 6 万イベントの大規模 リアルタイムデータを活用する 機械学習基盤の実現 ※本資料は当日登壇資料の一部を抜粋しています。

Slide 2

Slide 2 text

02 リアルタイムデ ータ分析基盤と 学習基盤

Slide 3

Slide 3 text

牧野 祐己 Chief Technology Officer

Slide 4

Slide 4 text

at GINZA SIX

Slide 5

Slide 5 text

Customer Experience Platform Real-time Analytics & Action Tool karte.io

Slide 6

Slide 6 text

どれくらいリアルタイムデータを 処理しているか? 65,000 events (rows) / sec 3 billion events per day 2+ PB stored in BigQuery 1 PB streaming insert per month 0.x sec to real-time actions 60+ PB analysis per month 6,000 slots with flat Rate US & JP 500+ datasets for clients

Slide 7

Slide 7 text

データ分析基盤のアーキテクチャ Reactive Layer Track Compute Engine Autoscaling BigQuery Cloud Spanner Cloud Pub/Sub Cloud Dataflow Admin Compute Engine Autoscaling Analyze Compute Engine Autoscaling Cloud Bigtable 65,000 events / sec Redis Redis Cloud Bigtable Object Storage Salesforce Others... ELT Streaming Insert Client Cloud Storage 1 PB streaming insert per month Import / Export

Slide 8

Slide 8 text

データ活用における大事なポイント ● とりあえずデータは全部突っ込んでから考える ● データをどう使うか、から考える ● 活用時はクライアントがコントローラブルであること ● とりあえず活用してみて調整していく

Slide 9

Slide 9 text

機械学習基盤で実現するライフサイクル 1. データ投入と前処理 2. 学習と検証 3. モデルのデプロイ 4. 予測とアクション そのサイクルをまわす Prepare Pre-process Ingest Train Evaluation Deploy Predict

Slide 10

Slide 10 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture

Slide 11

Slide 11 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Prepare, Pre-process, Ingest ML Architecture

Slide 12

Slide 12 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store ML Architecture

Slide 13

Slide 13 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service ML Architecture Deploy

Slide 14

Slide 14 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture

Slide 15

Slide 15 text

03 GCPによる実現 と活用事例

Slide 16

Slide 16 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture

Slide 17

Slide 17 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Periodically Update Daily/Weekly /Monthly

Slide 18

Slide 18 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly

Slide 19

Slide 19 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT

Slide 20

Slide 20 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep

Slide 21

Slide 21 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep AI Platform Training Cloud AutoML BigQuery ML

Slide 22

Slide 22 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable

Slide 23

Slide 23 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Periodically Update Daily/Weekly /Monthly

Slide 24

Slide 24 text

BigQueryを中心のデータストア とする機械学習基盤 Before BigQuery Cloud Dataflow BigQuery Cloud Bigtable Cloud Spanner AWS S3 Cloud Storage SFTP Treasure Data Salesforce Cloud Spreadsheet External Data ELT Import Export BigQuery

Slide 25

Slide 25 text

BigQueryを中心のデータストア とする機械学習基盤 After BigQuery Cloud Dataflow BigQuery Cloud Bigtable Cloud Spanner AWS S3 Cloud Storage SFTP Treasure Data Salesforce Cloud Spreadsheet External Data ELT Import Export BigQuery AI Platform Training Cloud AutoML BigQuery ML ML Predict

Slide 26

Slide 26 text

データ活用における大事なポイント ● とりあえずデータは全部突っ込んでから考える ● データをどう使うか、から考える ● 活用時はクライアントがコントローラブルであること ● とりあえず活用してみて調整していく

Slide 27

Slide 27 text

事例 1: BigQuery ML k-means で ユーザーをクラスタリング サイトのユーザーを RFM パターンで クラスタリングする Recency: 最後に”購入”してからの時間 Frequency: ”購入”の頻度 Monetary Value: ”購入”の価値の大きさ CREATE OR REPLACE MODEL `cloud_next_session.rfm_cluster` OPTIONS (model_type='kmeans', num_clusters=4, standardize_features = TRUE) AS SELECT * FROM ( SELECT SUM(revenue) monetary, COUNT(*) frequency, MAX(_date) recency FROM `karte.events` WHERE event_name = 'buy' GROUP BY keys.visitor_id )

Slide 28

Slide 28 text

ユーザーをクラスタリングして セグメンテーションに利用

Slide 29

Slide 29 text

BI ツールで可視化

Slide 30

Slide 30 text

事例 3: AI Platform で リアルタイムに購入予測 https://aaa.com/ category/items https://aaa.com/top https://aaa.com/ category/item/X XXXXXX CNN CNN CNN Embedding feature vector CNN True / False TensorFlow

Slide 31

Slide 31 text

事例 3: AI Platform で リアルタイムに購入予測 BigQuery Cloud Dataflow Cloud Storage TensorFlow AI Platform Training Preprocess Train Deploy Ingest Realtime API

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Real-time Offline Analyze Key - Value Store Data Lake Ingest Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Real-time Predict Service Predict Deploy Predict End-User Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Feature Feature Real-time Pattern

Slide 34

Slide 34 text

リアルタイム予測か、バッチ予測か? リアルタイム予測 Pros - モデルの更新頻度が高い - 予測結果変動が大きい場合 - 必要な場合だけ計算 Cons - 使える Feature が限定的 バッチ予測 Pros - 複雑な Feature が利用できる - アーキテクチャがシンプル - 予測結果が確認できる Cons - モデルの更新頻度が低い - ありうる対象すべて計算しておく

Slide 35

Slide 35 text

04 まとめ

Slide 36

Slide 36 text

まとめ リアルタイムデータ分 析基盤を元に機械学習 基盤を構築 GCP のさまざまなデ ータ、機械学習サービ スを利用してシンプル に実現 実際に活用している リアルタイム、バッチ の3事例を紹介

Slide 37

Slide 37 text

Thank you