Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
2
3.6k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
1
2.1k
Java 21 Overview
line_developers
6
1.1k
Code Review Challenge: An example of a solution
line_developers
1
1.2k
KARTEのAPIサーバ化
line_developers
1
470
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2.1k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.2k
A/B Testing at LINE NEWS
line_developers
3
900
LINEのサポートバージョンの考え方
line_developers
2
1.2k
Other Decks in Technology
See All in Technology
トラシューアニマルになろう ~開発者だからこそできる、安定したサービス作りの秘訣~
jacopen
2
2k
全文検索+セマンティックランカー+LLMの自然文検索サ−ビスで得られた知見
segavvy
2
110
7日間でハッキングをはじめる本をはじめてみませんか?_ITエンジニア本大賞2025
nomizone
2
1.9k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
6
57k
SA Night #2 FinatextのSA思想/SA Night #2 Finatext session
satoshiimai
1
140
エンジニアが加速させるプロダクトディスカバリー 〜最速で価値ある機能を見つける方法〜 / product discovery accelerated by engineers
rince
4
380
Swiftの “private” を テストする / Testing Swift "private"
yutailang0119
0
130
Oracle Cloud Infrastructure:2025年2月度サービス・アップデート
oracle4engineer
PRO
1
220
株式会社EventHub・エンジニア採用資料
eventhub
0
4.3k
自動テストの世界に、この5年間で起きたこと
autifyhq
10
8.6k
管理者しか知らないOutlookの裏側のAIを覗く#AzureTravelers
hirotomotaguchi
2
430
速くて安いWebサイトを作る
nishiharatsubasa
11
13k
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
49
2.3k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
How GitHub (no longer) Works
holman
314
140k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.2k
Docker and Python
trallard
44
3.3k
For a Future-Friendly Web
brad_frost
176
9.5k
How STYLIGHT went responsive
nonsquared
98
5.4k
4 Signs Your Business is Dying
shpigford
182
22k
RailsConf 2023
tenderlove
29
1k
Fireside Chat
paigeccino
34
3.2k
Designing for humans not robots
tammielis
250
25k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU