Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
PRO
March 07, 2019
Technology
2
3.5k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
PRO
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
PRO
1
2k
Java 21 Overview
line_developers
PRO
6
1k
Code Review Challenge: An example of a solution
line_developers
PRO
1
1.1k
KARTEのAPIサーバ化
line_developers
PRO
1
440
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
PRO
5
2k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
PRO
3
2k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
PRO
9
3k
A/B Testing at LINE NEWS
line_developers
PRO
3
830
LINEのサポートバージョンの考え方
line_developers
PRO
2
1.1k
Other Decks in Technology
See All in Technology
社内で最大の技術的負債のリファクタリングに取り組んだお話し
kidooonn
1
550
OTelCol_TailSampling_and_SpanMetrics
gumamon
1
160
Security-JAWS【第35回】勉強会クラウドにおけるマルウェアやコンテンツ改ざんへの対策
4su_para
0
180
Platform Engineering for Software Developers and Architects
syntasso
1
520
AWS Media Services 最新サービスアップデート 2024
eijikominami
0
200
スクラムチームを立ち上げる〜チーム開発で得られたもの・得られなかったもの〜
ohnoeight
2
350
アジャイルでの品質の進化 Agile in Motion vol.1/20241118 Hiroyuki Sato
shift_evolve
0
150
誰も全体を知らない ~ ロールの垣根を超えて引き上げる開発生産性 / Boosting Development Productivity Across Roles
kakehashi
1
230
SREが投資するAIOps ~ペアーズにおけるLLM for Developerへの取り組み~
takumiogawa
1
310
Adopting Jetpack Compose in Your Existing Project - GDG DevFest Bangkok 2024
akexorcist
0
110
初心者向けAWS Securityの勉強会mini Security-JAWSを9ヶ月ぐらい実施してきての近況
cmusudakeisuke
0
120
Amazon CloudWatch Network Monitor のススメ
yuki_ink
1
210
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Testing 201, or: Great Expectations
jmmastey
38
7.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
Writing Fast Ruby
sferik
627
61k
Into the Great Unknown - MozCon
thekraken
32
1.5k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
How STYLIGHT went responsive
nonsquared
95
5.2k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
Visualization
eitanlees
145
15k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
26
2.1k
What's in a price? How to price your products and services
michaelherold
243
12k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
27
4.3k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU