Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Anonymize Large-scale Sparse User Features at L...
Search
LINE Developers
March 07, 2019
Technology
2
3.6k
Anonymize Large-scale Sparse User Features at LINE Corp
2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim
LINE Developers
March 07, 2019
Tweet
Share
More Decks by LINE Developers
See All by LINE Developers
LINEスタンプのSREing事例集:大きなスパイクアクセスを捌くためのSREing
line_developers
1
2k
Java 21 Overview
line_developers
6
1.1k
Code Review Challenge: An example of a solution
line_developers
1
1.2k
KARTEのAPIサーバ化
line_developers
1
460
著作権とは何か?〜初歩的概念から権利利用法、侵害要件まで
line_developers
5
2.1k
生成AIと著作権 〜生成AIによって生じる著作権関連の課題と対処
line_developers
3
2k
マイクロサービスにおけるBFFアーキテクチャでのモジュラモノリスの導入
line_developers
9
3.1k
A/B Testing at LINE NEWS
line_developers
3
880
LINEのサポートバージョンの考え方
line_developers
2
1.1k
Other Decks in Technology
See All in Technology
Docker Desktop で Docker を始めよう
zembutsu
PRO
0
160
AWS re:Invent 2024 re:Cap Taipei (for Developer): New Launches that facilitate Developer Workflow and Continuous Innovation
dwchiang
0
160
RubyでKubernetesプログラミング
sat
PRO
4
160
dbtを中心にして組織のアジリティとガバナンスのトレードオンを考えてみた
gappy50
0
210
Visual StudioとかIDE関連小ネタ話
kosmosebi
1
370
ゼロからわかる!!AWSの構成図を書いてみようワークショップ 問題&解答解説 #デッカイギ #羽田デッカイギおつ
_mossann_t
0
1.5k
.NET AspireでAzure Functionsやクラウドリソースを統合する
tsubakimoto_s
0
190
Alignment and Autonomy in Cybozu - 300人の開発組織でアラインメントと自律性を両立させるアジャイルな組織運営 / RSGT2025
ama_ch
1
2.4k
30分でわかる「リスクから学ぶKubernetesコンテナセキュリティ」/30min-k8s-container-sec
mochizuki875
3
440
PaaSの歴史と、 アプリケーションプラットフォームのこれから
jacopen
7
1.4k
AWS Community Builderのススメ - みんなもCommunity Builderに応募しよう! -
smt7174
0
170
Copilotの力を実感!3ヶ月間の生成AI研修の試行錯誤&成功事例をご紹介。果たして得たものとは・・?
ktc_shiori
0
340
Featured
See All Featured
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.2k
The Cost Of JavaScript in 2023
addyosmani
46
7.2k
The Cult of Friendly URLs
andyhume
78
6.1k
Scaling GitHub
holman
459
140k
The World Runs on Bad Software
bkeepers
PRO
66
11k
We Have a Design System, Now What?
morganepeng
51
7.3k
Mobile First: as difficult as doing things right
swwweet
222
9k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Designing Experiences People Love
moore
139
23k
Why Our Code Smells
bkeepers
PRO
335
57k
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Transcript
ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,
LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE
Corporation l Account Platform Development Dept. l Ad performance optimization
Agenda • Z-Features • Y-Features • Evaluation • Conclusion
Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
WHAT ARE Z-FEATURES
BENEFIT OF Z-FEATURES Reusable Flexible
LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse
Y-FEATURES
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far
as possible
BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778
OVERVIEW OF SCDV
INTEGRATE Z-FEATURES WITH SCDV
SYSTEM OVERVIEW
EVALUATION
DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000
1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000
1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00
25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97
0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
CONCLUSION
CONCLUSION l Anonymize user features based on SCDV l Enough
to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …
THANK YOU