Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
3.8k
2
Share
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
3
2.5k
Java 21 Overview
line_developers
6
1.3k
Code Review Challenge: An example of a solution
line_developers
1
1.6k
KARTEのAPIサーバ化
line_developers
1
630
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2.3k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2.5k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.9k
A/B Testing at LINE NEWS
line_developers
3
1.1k
LINEのサポートバージョンの考え方
line_developers
2
1.5k
Other Decks in Technology
See All in Technology
エンタープライズの厳格な制約を開発者に意識させない:クラウドネイティブ開発基盤設計/cloudnative-kaigi-golden-path
mhrtech
0
370
鹿野さんに聞く!CSSの最新トレンド Ver.2026
tonkotsuboy_com
6
2.7k
Purview Endpoint DLP 動かしてみた
kozakigh
0
210
Swift Sequence の便利 API 再発見
treastrain
1
240
"うちにはまだ早い"は本当? ─ 小さく始めるPlatform Engineering入門
harukasakihara
4
430
「強制アップデート」か「チームの自律」か?エンタープライズが辿り着いたプラットフォームのハイブリッド運用/cloudnative-kaigi-hybrid-platform-operations
mhrtech
0
150
なぜ、私がCommunity Builderに?〜活動期間1か月半でも選出されたワケ〜
yama3133
0
120
サービスの信頼性を高めるため、形骸化した「プロダクションミーティング」を立て直すまでの取り組み
stefafafan
1
260
生成AIはソフトウェア開発の革命か、ソフトウェア工学の宿題再提出なのか -ソフトウェア品質特性の追加提案-
kyonmm
PRO
2
870
Modernizing Your HCL Connections Experience: Visual Report to chain, Profile Enhancements, and AI Integration
wannesrams
0
300
SREの仕事は「壊さないこと」ではなくなった 〜自律化していくシステムに、責任と判断を与えるという価値〜 / 20260515 Naoki Shimada
shift_evolve
PRO
1
110
「QA=テスト」「シフトレフト=スクラムイベントの参加者の一員」の呪縛を解く。アジャイルな開発を止めないために、10Xで挑んだ「右側のしわ寄せ」解消記 #scrumniigata
nihonbuson
PRO
4
970
Featured
See All Featured
Docker and Python
trallard
47
3.8k
The Cult of Friendly URLs
andyhume
79
6.9k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.1k
WCS-LA-2024
lcolladotor
0
580
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
65
55k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
140
Music & Morning Musume
bryan
47
7.2k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
65
54k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
390
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
690
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
200
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU