Upgrade to Pro — share decks privately, control downloads, hide ads and more …

D2-3-S09_秒間_6_万イベントの大規模リアルタイムデータを活用する機械学習基盤の実...

PLAID
August 01, 2019

 D2-3-S09_秒間_6_万イベントの大規模リアルタイムデータを活用する機械学習基盤の実現__抜粋版_.pdf

PLAID

August 01, 2019
Tweet

More Decks by PLAID

Other Decks in Technology

Transcript

  1. 牧野 祐己 CTO PLAID 2019 年 8 月 1日 D2-3-S09:

    秒間 6 万イベントの大規模 リアルタイムデータを活用する 機械学習基盤の実現 ※本資料は当日登壇資料の一部を抜粋しています。
  2. どれくらいリアルタイムデータを 処理しているか? 65,000 events (rows) / sec 3 billion events

    per day 2+ PB stored in BigQuery 1 PB streaming insert per month 0.x sec to real-time actions 60+ PB analysis per month 6,000 slots with flat Rate US & JP 500+ datasets for clients
  3. データ分析基盤のアーキテクチャ Reactive Layer Track Compute Engine Autoscaling BigQuery Cloud Spanner

    Cloud Pub/Sub Cloud Dataflow Admin Compute Engine Autoscaling Analyze Compute Engine Autoscaling Cloud Bigtable 65,000 events / sec Redis Redis Cloud Bigtable Object Storage Salesforce Others... ELT Streaming Insert Client Cloud Storage 1 PB streaming insert per month Import / Export
  4. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture
  5. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest ML Architecture
  6. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store ML Architecture
  7. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service ML Architecture Deploy
  8. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture
  9. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture
  10. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Periodically Update Daily/Weekly /Monthly
  11. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly
  12. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT
  13. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep
  14. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep AI Platform Training Cloud AutoML BigQuery ML
  15. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable
  16. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Periodically Update Daily/Weekly /Monthly
  17. BigQueryを中心のデータストア とする機械学習基盤 Before BigQuery Cloud Dataflow BigQuery Cloud Bigtable Cloud

    Spanner AWS S3 Cloud Storage SFTP Treasure Data Salesforce Cloud Spreadsheet External Data ELT Import Export BigQuery
  18. BigQueryを中心のデータストア とする機械学習基盤 After BigQuery Cloud Dataflow BigQuery Cloud Bigtable Cloud

    Spanner AWS S3 Cloud Storage SFTP Treasure Data Salesforce Cloud Spreadsheet External Data ELT Import Export BigQuery AI Platform Training Cloud AutoML BigQuery ML ML Predict
  19. 事例 1: BigQuery ML k-means で ユーザーをクラスタリング サイトのユーザーを RFM パターンで

    クラスタリングする Recency: 最後に”購入”してからの時間 Frequency: ”購入”の頻度 Monetary Value: ”購入”の価値の大きさ CREATE OR REPLACE MODEL `cloud_next_session.rfm_cluster` OPTIONS (model_type='kmeans', num_clusters=4, standardize_features = TRUE) AS SELECT * FROM ( SELECT SUM(revenue) monetary, COUNT(*) frequency, MAX(_date) recency FROM `karte.events` WHERE event_name = 'buy' GROUP BY keys.visitor_id )
  20. 事例 3: AI Platform で リアルタイムに購入予測 https://aaa.com/ category/items https://aaa.com/top https://aaa.com/

    category/item/X XXXXXX CNN CNN CNN Embedding feature vector CNN True / False TensorFlow
  21. 事例 3: AI Platform で リアルタイムに購入予測 BigQuery Cloud Dataflow Cloud

    Storage TensorFlow AI Platform Training Preprocess Train Deploy Ingest Realtime API
  22. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Real-time Predict Service Predict Deploy Predict End-User Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Feature Feature Real-time Pattern
  23. リアルタイム予測か、バッチ予測か? リアルタイム予測 Pros - モデルの更新頻度が高い - 予測結果変動が大きい場合 - 必要な場合だけ計算 Cons

    - 使える Feature が限定的 バッチ予測 Pros - 複雑な Feature が利用できる - アーキテクチャがシンプル - 予測結果が確認できる Cons - モデルの更新頻度が低い - ありうる対象すべて計算しておく