Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
2
3.7k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
3
2.4k
Java 21 Overview
line_developers
6
1.2k
Code Review Challenge: An example of a solution
line_developers
1
1.5k
KARTEのAPIサーバ化
line_developers
1
580
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2.2k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2.3k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.7k
A/B Testing at LINE NEWS
line_developers
3
1k
LINEのサポートバージョンの考え方
line_developers
2
1.4k
Other Decks in Technology
See All in Technology
LT登壇を続けたらポッドキャストに呼ばれた話
yamatai1212
0
120
今年のデータ・ML系アップデートと気になるアプデのご紹介
nayuts
1
230
Haskell を武器にして挑む競技プログラミング ─ 操作的思考から意味モデル思考へ
naoya
6
1.3k
AI活用によるPRレビュー改善の歩み ― 社内全体に広がる学びと実践
lycorptech_jp
PRO
1
190
Snowflakeでデータ基盤を もう一度作り直すなら / rebuilding-data-platform-with-snowflake
pei0804
4
1.1k
文字列の並び順 / Unicode Collation
tmtms
3
420
AI時代の開発フローとともに気を付けたいこと
kkamegawa
0
2.6k
Lessons from Migrating to OpenSearch: Shard Design, Log Ingestion, and UI Decisions
sansantech
PRO
1
100
AI 駆動開発勉強会 フロントエンド支部 #1 w/あずもば
1ftseabass
PRO
0
290
SSO方式とJumpアカウント方式の比較と設計方針
yuobayashi
7
580
Reinforcement Fine-tuning 基礎〜実践まで
ch6noota
0
160
MLflowで始めるプロンプト管理、評価、最適化
databricksjapan
1
110
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
It's Worth the Effort
3n
187
29k
Unsuck your backbone
ammeep
671
58k
Documentation Writing (for coders)
carmenintech
76
5.2k
Typedesign – Prime Four
hannesfritz
42
2.9k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Facilitating Awesome Meetings
lara
57
6.7k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Building Flexible Design Systems
yeseniaperezcruz
330
39k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Producing Creativity
orderedlist
PRO
348
40k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU