Upgrade to Pro — share decks privately, control downloads, hide ads and more …

D2-3-S09_秒間_6_万イベントの大規模リアルタイムデータを活用する機械学習基盤の実現__抜粋版_.pdf

PLAID
PRO
August 01, 2019

 D2-3-S09_秒間_6_万イベントの大規模リアルタイムデータを活用する機械学習基盤の実現__抜粋版_.pdf

PLAID
PRO

August 01, 2019
Tweet

More Decks by PLAID

Other Decks in Technology

Transcript

  1. 牧野 祐己 CTO PLAID
    2019 年 8 月 1日
    D2-3-S09:
    秒間 6 万イベントの大規模
    リアルタイムデータを活用する
    機械学習基盤の実現
    ※本資料は当日登壇資料の一部を抜粋しています。

    View Slide

  2. 02
    リアルタイムデ
    ータ分析基盤と
    学習基盤

    View Slide

  3. 牧野 祐己
    Chief Technology Officer

    View Slide

  4. at GINZA SIX

    View Slide

  5. Customer Experience Platform
    Real-time Analytics & Action Tool
    karte.io

    View Slide

  6. どれくらいリアルタイムデータを
    処理しているか?
    65,000
    events
    (rows) / sec
    3 billion
    events per day
    2+ PB
    stored in
    BigQuery
    1 PB
    streaming insert
    per month
    0.x
    sec to real-time
    actions
    60+ PB
    analysis per
    month
    6,000
    slots with flat
    Rate US & JP
    500+
    datasets for
    clients

    View Slide

  7. データ分析基盤のアーキテクチャ
    Reactive Layer
    Track
    Compute Engine
    Autoscaling
    BigQuery
    Cloud
    Spanner
    Cloud
    Pub/Sub
    Cloud
    Dataflow
    Admin
    Compute Engine
    Autoscaling
    Analyze
    Compute Engine
    Autoscaling
    Cloud
    Bigtable
    65,000
    events / sec
    Redis
    Redis
    Cloud
    Bigtable
    Object
    Storage
    Salesforce Others...
    ELT
    Streaming
    Insert
    Client
    Cloud
    Storage
    1 PB
    streaming insert
    per month
    Import / Export

    View Slide

  8. データ活用における大事なポイント
    ● とりあえずデータは全部突っ込んでから考える
    ● データをどう使うか、から考える
    ● 活用時はクライアントがコントローラブルであること
    ● とりあえず活用してみて調整していく

    View Slide

  9. 機械学習基盤で実現するライフサイクル
    1. データ投入と前処理
    2. 学習と検証
    3. モデルのデプロイ
    4. 予測とアクション
    そのサイクルをまわす
    Prepare
    Pre-process
    Ingest
    Train
    Evaluation
    Deploy
    Predict

    View Slide

  10. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Prepare, Pre-process, Ingest Train, Evaluation
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    End-User
    End-User
    ML Architecture

    View Slide

  11. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Prepare, Pre-process, Ingest
    ML Architecture

    View Slide

  12. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Prepare, Pre-process, Ingest Train, Evaluation
    Batch Training
    Job
    Model
    Store
    ML Architecture

    View Slide

  13. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Prepare, Pre-process, Ingest Train, Evaluation
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    ML Architecture
    Deploy

    View Slide

  14. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Prepare, Pre-process, Ingest Train, Evaluation
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    End-User
    End-User
    ML Architecture

    View Slide

  15. 03
    GCPによる実現
    と活用事例

    View Slide

  16. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Prepare, Pre-process, Ingest Train, Evaluation
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    End-User
    End-User
    ML Architecture

    View Slide

  17. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    BigQuery
    BigQuery
    BigQuery
    AI Platform
    Training
    Cloud
    AutoML
    BigQuery
    ML
    BigQuery
    Cloud
    Bigtable
    AI Platform
    Prediction
    Cloud
    AutoML
    API
    Cloud
    Dataflow
    Cloud
    Dataproc
    Cloud
    Dataprep
    BigQuery
    ELT
    Periodically Update
    Daily/Weekly /Monthly

    View Slide

  18. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    Periodically Update
    Daily/Weekly /Monthly

    View Slide

  19. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    Periodically Update
    Daily/Weekly /Monthly
    BigQuery
    BigQuery
    BigQuery
    BigQuery
    ELT

    View Slide

  20. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    Periodically Update
    Daily/Weekly /Monthly
    BigQuery
    BigQuery
    BigQuery
    BigQuery
    ELT
    Cloud
    Dataflow
    Cloud
    Dataproc
    Cloud
    Dataprep

    View Slide

  21. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    Periodically Update
    Daily/Weekly /Monthly
    BigQuery
    BigQuery
    BigQuery
    BigQuery
    ELT
    Cloud
    Dataflow
    Cloud
    Dataproc
    Cloud
    Dataprep
    AI Platform
    Training
    Cloud
    AutoML
    BigQuery
    ML

    View Slide

  22. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    Periodically Update
    Daily/Weekly /Monthly
    BigQuery
    BigQuery
    BigQuery
    BigQuery
    ELT
    Cloud
    Dataflow
    Cloud
    Dataproc
    Cloud
    Dataprep
    AI Platform
    Training
    Cloud
    AutoML
    BigQuery
    ML
    BigQuery
    Cloud
    Bigtable

    View Slide

  23. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Batch Predict
    Job
    Real-time
    Predict Service
    Predict
    Key -
    Value
    Store
    Prediction
    Data
    Deploy
    Predict
    End-User
    End-User
    Predict
    GCP で実現
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    BigQuery
    BigQuery
    BigQuery
    AI Platform
    Training
    Cloud
    AutoML
    BigQuery
    ML
    BigQuery
    Cloud
    Bigtable
    AI Platform
    Prediction
    Cloud
    AutoML
    API
    Cloud
    Dataflow
    Cloud
    Dataproc
    Cloud
    Dataprep
    BigQuery
    ELT
    Periodically Update
    Daily/Weekly /Monthly

    View Slide

  24. BigQueryを中心のデータストア
    とする機械学習基盤 Before
    BigQuery
    Cloud
    Dataflow
    BigQuery
    Cloud
    Bigtable
    Cloud
    Spanner
    AWS S3
    Cloud
    Storage
    SFTP
    Treasure Data
    Salesforce Cloud
    Spreadsheet
    External Data
    ELT
    Import
    Export
    BigQuery

    View Slide

  25. BigQueryを中心のデータストア
    とする機械学習基盤 After
    BigQuery
    Cloud
    Dataflow
    BigQuery
    Cloud
    Bigtable
    Cloud
    Spanner
    AWS S3
    Cloud
    Storage
    SFTP
    Treasure Data
    Salesforce Cloud
    Spreadsheet
    External Data
    ELT
    Import
    Export
    BigQuery
    AI Platform
    Training
    Cloud
    AutoML
    BigQuery
    ML
    ML
    Predict

    View Slide

  26. データ活用における大事なポイント
    ● とりあえずデータは全部突っ込んでから考える
    ● データをどう使うか、から考える
    ● 活用時はクライアントがコントローラブルであること
    ● とりあえず活用してみて調整していく

    View Slide

  27. 事例 1: BigQuery ML k-means で
    ユーザーをクラスタリング
    サイトのユーザーを RFM パターンで
    クラスタリングする
    Recency: 最後に”購入”してからの時間
    Frequency: ”購入”の頻度
    Monetary Value: ”購入”の価値の大きさ
    CREATE OR REPLACE MODEL
    `cloud_next_session.rfm_cluster`
    OPTIONS
    (model_type='kmeans',
    num_clusters=4,
    standardize_features = TRUE) AS
    SELECT
    *
    FROM (
    SELECT
    SUM(revenue) monetary,
    COUNT(*) frequency,
    MAX(_date) recency
    FROM
    `karte.events`
    WHERE
    event_name = 'buy'
    GROUP BY
    keys.visitor_id )

    View Slide

  28. ユーザーをクラスタリングして
    セグメンテーションに利用

    View Slide

  29. BI ツールで可視化

    View Slide

  30. 事例 3: AI Platform で
    リアルタイムに購入予測
    https://aaa.com/
    category/items
    https://aaa.com/top
    https://aaa.com/
    category/item/X
    XXXXXX
    CNN
    CNN
    CNN
    Embedding
    feature
    vector
    CNN
    True /
    False
    TensorFlow

    View Slide

  31. 事例 3: AI Platform で
    リアルタイムに購入予測
    BigQuery
    Cloud
    Dataflow
    Cloud
    Storage
    TensorFlow
    AI Platform
    Training
    Preprocess Train Deploy
    Ingest
    Realtime
    API

    View Slide

  32. View Slide

  33. Real-time
    Offline
    Analyze
    Key -
    Value
    Store
    Data
    Lake
    Ingest Prep
    Training
    Data
    Store
    Feature
    Store
    Ingest Train & Eval
    Batch Training
    Job
    Model
    Store
    Real-time
    Predict Service
    Predict
    Deploy
    Predict
    End-User
    Cloud
    Pub/Sub
    Cloud
    Bigtable
    BigQuery
    BigQuery
    BigQuery
    AI Platform
    Training
    Cloud
    AutoML
    AI Platform
    Prediction
    Cloud
    AutoML
    API
    Cloud
    Dataflow
    Cloud
    Dataproc
    Cloud
    Dataprep
    BigQuery
    ELT
    Feature
    Feature
    Real-time Pattern

    View Slide

  34. リアルタイム予測か、バッチ予測か?
    リアルタイム予測
    Pros
    - モデルの更新頻度が高い
    - 予測結果変動が大きい場合
    - 必要な場合だけ計算
    Cons
    - 使える Feature が限定的
    バッチ予測
    Pros
    - 複雑な Feature が利用できる
    - アーキテクチャがシンプル
    - 予測結果が確認できる
    Cons
    - モデルの更新頻度が低い
    - ありうる対象すべて計算しておく

    View Slide

  35. 04
    まとめ

    View Slide

  36. まとめ
    リアルタイムデータ分
    析基盤を元に機械学習
    基盤を構築
    GCP のさまざまなデ
    ータ、機械学習サービ
    スを利用してシンプル
    に実現
    実際に活用している
    リアルタイム、バッチ
    の3事例を紹介

    View Slide

  37. Thank you

    View Slide