Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
2
3.7k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
2
2.3k
Java 21 Overview
line_developers
6
1.2k
Code Review Challenge: An example of a solution
line_developers
1
1.3k
KARTEのAPIサーバ化
line_developers
1
540
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2.2k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2.1k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.5k
A/B Testing at LINE NEWS
line_developers
3
990
LINEのサポートバージョンの考え方
line_developers
2
1.3k
Other Decks in Technology
See All in Technology
2025新卒研修・HTML/CSS #弁護士ドットコム
bengo4com
2
3.6k
Power Automate のパフォーマンス改善レシピ / Power Automate Performance Improvement Recipes
karamem0
0
280
마라톤 끝의 단거리 스퍼트: 2025년의 AI
inureyes
PRO
1
200
robocopy の怖い話/scary-story-about-robocopy
emiki
0
420
ユーザー理解の爆速化とPdMの価値
kakehashi
PRO
1
110
With Devin -AIの自律とメンバーの自立
kotanin0
2
940
【CEDEC2025】『Shadowverse: Worlds Beyond』二度目のDCG開発でゲームをリデザインする~遊びやすさと競技性の両立~
cygames
PRO
1
160
Claude Codeが働くAI中心の業務システム構築の挑戦―AIエージェント中心の働き方を目指して
os1ma
8
1.1k
【CEDEC2025】大規模言語モデルを活用したゲーム内会話パートのスクリプト作成支援への取り組み
cygames
PRO
1
530
メモ整理が苦手な者による頑張らないObsidian活用術
optim
1
160
AIに全任せしないコーディングとマネジメント思考
kikuchikakeru
0
300
Kiroから考える AIコーディングツールの潮流
s4yuba
2
540
Featured
See All Featured
How to train your dragon (web standard)
notwaldorf
96
6.1k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.9k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
21
1.4k
Fireside Chat
paigeccino
37
3.6k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.8k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
Visualization
eitanlees
146
16k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU