D2-3-S09_秒間_6_万イベントの大規模リアルタイムデータを活用する機械学習基盤の実現__抜粋版_.pdf

755ac6e09cca0a46485ff24157f03aa1?s=47 PLAID
August 01, 2019

 D2-3-S09_秒間_6_万イベントの大規模リアルタイムデータを活用する機械学習基盤の実現__抜粋版_.pdf

755ac6e09cca0a46485ff24157f03aa1?s=128

PLAID

August 01, 2019
Tweet

Transcript

  1. 牧野 祐己 CTO PLAID 2019 年 8 月 1日 D2-3-S09:

    秒間 6 万イベントの大規模 リアルタイムデータを活用する 機械学習基盤の実現 ※本資料は当日登壇資料の一部を抜粋しています。
  2. 02 リアルタイムデ ータ分析基盤と 学習基盤

  3. 牧野 祐己 Chief Technology Officer

  4. at GINZA SIX

  5. Customer Experience Platform Real-time Analytics & Action Tool karte.io

  6. どれくらいリアルタイムデータを 処理しているか? 65,000 events (rows) / sec 3 billion events

    per day 2+ PB stored in BigQuery 1 PB streaming insert per month 0.x sec to real-time actions 60+ PB analysis per month 6,000 slots with flat Rate US & JP 500+ datasets for clients
  7. データ分析基盤のアーキテクチャ Reactive Layer Track Compute Engine Autoscaling BigQuery Cloud Spanner

    Cloud Pub/Sub Cloud Dataflow Admin Compute Engine Autoscaling Analyze Compute Engine Autoscaling Cloud Bigtable 65,000 events / sec Redis Redis Cloud Bigtable Object Storage Salesforce Others... ELT Streaming Insert Client Cloud Storage 1 PB streaming insert per month Import / Export
  8. データ活用における大事なポイント • とりあえずデータは全部突っ込んでから考える • データをどう使うか、から考える • 活用時はクライアントがコントローラブルであること • とりあえず活用してみて調整していく

  9. 機械学習基盤で実現するライフサイクル 1. データ投入と前処理 2. 学習と検証 3. モデルのデプロイ 4. 予測とアクション そのサイクルをまわす

    Prepare Pre-process Ingest Train Evaluation Deploy Predict
  10. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture
  11. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest ML Architecture
  12. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store ML Architecture
  13. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service ML Architecture Deploy
  14. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture
  15. 03 GCPによる実現 と活用事例

  16. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Prepare, Pre-process, Ingest Train, Evaluation Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy End-User End-User ML Architecture
  17. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Periodically Update Daily/Weekly /Monthly
  18. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly
  19. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT
  20. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep
  21. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep AI Platform Training Cloud AutoML BigQuery ML
  22. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable Periodically Update Daily/Weekly /Monthly BigQuery BigQuery BigQuery BigQuery ELT Cloud Dataflow Cloud Dataproc Cloud Dataprep AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable
  23. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Batch Predict Job Real-time Predict Service Predict Key - Value Store Prediction Data Deploy Predict End-User End-User Predict GCP で実現 Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML BigQuery ML BigQuery Cloud Bigtable AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Periodically Update Daily/Weekly /Monthly
  24. BigQueryを中心のデータストア とする機械学習基盤 Before BigQuery Cloud Dataflow BigQuery Cloud Bigtable Cloud

    Spanner AWS S3 Cloud Storage SFTP Treasure Data Salesforce Cloud Spreadsheet External Data ELT Import Export BigQuery
  25. BigQueryを中心のデータストア とする機械学習基盤 After BigQuery Cloud Dataflow BigQuery Cloud Bigtable Cloud

    Spanner AWS S3 Cloud Storage SFTP Treasure Data Salesforce Cloud Spreadsheet External Data ELT Import Export BigQuery AI Platform Training Cloud AutoML BigQuery ML ML Predict
  26. データ活用における大事なポイント • とりあえずデータは全部突っ込んでから考える • データをどう使うか、から考える • 活用時はクライアントがコントローラブルであること • とりあえず活用してみて調整していく

  27. 事例 1: BigQuery ML k-means で ユーザーをクラスタリング サイトのユーザーを RFM パターンで

    クラスタリングする Recency: 最後に”購入”してからの時間 Frequency: ”購入”の頻度 Monetary Value: ”購入”の価値の大きさ CREATE OR REPLACE MODEL `cloud_next_session.rfm_cluster` OPTIONS (model_type='kmeans', num_clusters=4, standardize_features = TRUE) AS SELECT * FROM ( SELECT SUM(revenue) monetary, COUNT(*) frequency, MAX(_date) recency FROM `karte.events` WHERE event_name = 'buy' GROUP BY keys.visitor_id )
  28. ユーザーをクラスタリングして セグメンテーションに利用

  29. BI ツールで可視化

  30. 事例 3: AI Platform で リアルタイムに購入予測 https://aaa.com/ category/items https://aaa.com/top https://aaa.com/

    category/item/X XXXXXX CNN CNN CNN Embedding feature vector CNN True / False TensorFlow
  31. 事例 3: AI Platform で リアルタイムに購入予測 BigQuery Cloud Dataflow Cloud

    Storage TensorFlow AI Platform Training Preprocess Train Deploy Ingest Realtime API
  32. None
  33. Real-time Offline Analyze Key - Value Store Data Lake Ingest

    Prep Training Data Store Feature Store Ingest Train & Eval Batch Training Job Model Store Real-time Predict Service Predict Deploy Predict End-User Cloud Pub/Sub Cloud Bigtable BigQuery BigQuery BigQuery AI Platform Training Cloud AutoML AI Platform Prediction Cloud AutoML API Cloud Dataflow Cloud Dataproc Cloud Dataprep BigQuery ELT Feature Feature Real-time Pattern
  34. リアルタイム予測か、バッチ予測か? リアルタイム予測 Pros - モデルの更新頻度が高い - 予測結果変動が大きい場合 - 必要な場合だけ計算 Cons

    - 使える Feature が限定的 バッチ予測 Pros - 複雑な Feature が利用できる - アーキテクチャがシンプル - 予測結果が確認できる Cons - モデルの更新頻度が低い - ありうる対象すべて計算しておく
  35. 04 まとめ

  36. まとめ リアルタイムデータ分 析基盤を元に機械学習 基盤を構築 GCP のさまざまなデ ータ、機械学習サービ スを利用してシンプル に実現 実際に活用している

    リアルタイム、バッチ の3事例を紹介
  37. Thank you