Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
2
3.6k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
1
2.3k
Java 21 Overview
line_developers
6
1.2k
Code Review Challenge: An example of a solution
line_developers
1
1.3k
KARTEのAPIサーバ化
line_developers
1
530
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2.2k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2.1k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.5k
A/B Testing at LINE NEWS
line_developers
3
970
LINEのサポートバージョンの考え方
line_developers
2
1.3k
Other Decks in Technology
See All in Technology
Lambda Web Adapterについて自分なりに理解してみた
smt7174
5
130
作曲家がボカロを使うようにPdMはAIを使え
itotaxi
0
290
TechLION vol.41~MySQLユーザ会のほうから来ました / techlion41_mysql
sakaik
0
200
Witchcraft for Memory
pocke
1
630
CursorによるPMO業務の代替 / Automating PMO Tasks with Cursor
motoyoshi_kakaku
1
540
監視のこれまでとこれから/sakura monitoring seminar 2025
fujiwara3
11
4k
SalesforceArchitectGroupOsaka#20_CNX'25_Report
atomica7sei
0
250
AIの最新技術&テーマをつまんで紹介&フリートークするシリーズ #1 量子機械学習の入門
tkhresk
0
140
AI導入の理想と現実~コストと浸透〜
oprstchn
0
120
変化する開発、進化する体系時代に適応するソフトウェアエンジニアの知識と考え方(JaSST'25 Kansai)
mizunori
1
240
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
5
550
使いたいMCPサーバーはWeb APIをラップして自分で作る #QiitaBash
bengo4com
0
1k
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
49
14k
Optimizing for Happiness
mojombo
379
70k
Speed Design
sergeychernyshev
32
1k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
A designer walks into a library…
pauljervisheath
207
24k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
Code Review Best Practice
trishagee
69
18k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.4k
Build The Right Thing And Hit Your Dates
maggiecrowley
36
2.8k
The World Runs on Bad Software
bkeepers
PRO
69
11k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU