Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
2
3.5k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
1
2k
Java 21 Overview
line_developers
6
1k
Code Review Challenge: An example of a solution
line_developers
1
1.1k
KARTEのAPIサーバ化
line_developers
1
450
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.1k
A/B Testing at LINE NEWS
line_developers
3
860
LINEのサポートバージョンの考え方
line_developers
2
1.1k
Other Decks in Technology
See All in Technology
組織に自動テストを書く文化を根付かせる戦略(2024冬版) / Building Automated Test Culture 2024 Winter Edition
twada
PRO
13
3.7k
2024年にチャレンジしたことを振り返るぞ
mitchan
0
140
サーバレスアプリ開発者向けアップデートをキャッチアップしてきた #AWSreInvent #regrowth_fuk
drumnistnakano
0
190
開発生産性向上! 育成を「改善」と捉えるエンジニア育成戦略
shoota
2
350
なぜCodeceptJSを選んだか
goataka
0
160
NilAway による静的解析で「10 億ドル」を節約する #kyotogo / Kyoto Go 56th
ytaka23
3
380
GitHub Copilot のテクニック集/GitHub Copilot Techniques
rayuron
34
13k
AWS re:Invent 2024で発表された コードを書く開発者向け機能について
maruto
0
190
小学3年生夏休みの自由研究「夏休みに Copilot で遊んでみた」
taichinakamura
0
150
成果を出しながら成長する、アウトプット駆動のキャッチアップ術 / Output-driven catch-up techniques to grow while producing results
aiandrox
0
310
あの日俺達が夢見たサーバレスアーキテクチャ/the-serverless-architecture-we-dreamed-of
tomoki10
0
450
alecthomas/kong はいいぞ / kamakura.go#7
fujiwara3
1
300
Featured
See All Featured
Side Projects
sachag
452
42k
Automating Front-end Workflow
addyosmani
1366
200k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
365
25k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
170
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
169
50k
Agile that works and the tools we love
rasmusluckow
328
21k
The Invisible Side of Design
smashingmag
298
50k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU