SHIELD : Protect from Abusers in LINE Timeline

Agenda - About SHIELD - Team Tasks - History of
SHIELD - Team Objective - Why need to abusing detection? - Abusing cases - How to detect abusers - Architecture - Workflow - Rule based Models - ML based Models - History of infra - Detected Abusers - What SHIELD will do

About SHIELD Team Tasks - Text Filtering (NLP) - Agent
and Admin - Abusing Detection (Anomaly Detection)

About SHIELD History of SHIELD Start SHIELD Release first model
Release story and birthday models Will release new models 2019.03 2020 2021 2019.02

About SHIELD Team Objective Make Brighter Green for LINE

Why need to abusing detection? Reason - Users want to
use more comportable. - Protect user’s emotion. - Users want to see the contents they want it.

Abusing Cases Comment abusing cases - User who only counts
meaningless numbers - User listing unknown strings - User who used it over 2,000 times in 1 hour

Agenda - About SHIELD - Team Members - History of

How to detect abusers ML base abusing detection ML base
anomaly detection ML base spam detection Penalty Results Table Alert Warning Tables Data stream Infra aggregation module rule base abusing detection Log Anomaly detection Aggregation Table Data storage Real time Data processing Data analytics Anomaly Pattern Analyse Abusing Pattern Analyse Workflow

How to detect abusers Workflow Rule based Models ML based
Models Request Analyze Decision Threshold Service Penalty Tuning Discard Yes No Research Implement Verify Service Penalty Tuning Discard Yes No

Alert system - The purpose of collecting and analyzing the
anomaly cases - Warning, not penalty - Link with penalty if a certain pattern is found - Anomaly case is detected only using autoencoder - Isolation forest will also be operational in 2021 2H

How to detect abusers Rule based Models - LINE Timeline
Story - LINE Timeline for Birthday Card - LINE Timeline

How to detect abusers ML based Models - DBSCAN -
Autoencoder - density_based - Isolation Forest

Density_based - Process according to the user's usage by time
zone - Divide into cells using standard deviation. - Self development algorithm - Change to normal depending on the surrounding cells from abnormal - Use the number of users in the cell to classify normal/abnormal.

Density_based

Density_based - Ambiguous abusing that does not fall under the
rules - Abusing that cannot be detected as a pattern of the rule model

Density_based

DBSCAN - No need to set the number of clusters
- Can find any shape of cluster - A Density based clustering algorithm - Can identify anomalies

DBSCAN - Hyper parameter : MinPts, Epsilon

DBSCAN - Missing abusers in density_based model

Autoencoder - Easy implement - Consist of an encoder and
a decoder - Anomaly has a large reconstruction error. - Intuitive anomaly detection

Autoencoder

Autoencoder - Can not know what behavior the abusers will
use - Can not monitor all actions of the abuser - Need to preemptive detection - Detection of changing abuser patterns

Isolation Forest - Isolate quickly anomaly data - Useful for
high dimensional data sets - Don’t use density or distance - Split data randomly based on decision tree

Isolation Forest

Isolation Forest - Need to preemptive detection - Detect Abusing
pattern according to major behaviors

How to detect abusers History of infra 2021 Clickhouse 2020
Elastic Search 2019 LINE common DW

How to detect abusers Characteristics LINE Common DW Store almost
LINE log Convenient use Elastic Search Clickhouse ELK(Elastic search, Logstash, Kibana) Search engine Column based DB

How to detect abusers Pros. & Cons. LINE Common DW
Data diversity Data persistence Elastic Search Clickhouse Time delay Slow response with query Fast storage speed Fast search Difficult big data aggregation Fastest response with query Easy data aggregation Lack of references

How to detect abusers Why do we use clickhouse? -
Propose of use - LINE common data warehouse - Data analyze - Elastic search - Search - Clickhouse - Near real time data process

Detected Abusers Results of 2021 1H ML based results Rule
based results 0 12,500 25,000 37,500 50,000 Jan. Feb. Mar. Apr. May Jun. Timeline 0 2,500 5,000 7,500 10,000 Jan. Feb. Mar. Apr. May Jun. Timeline

Detected Abusers Results of 2021 1H Rule based results 0
15 30 45 60 Jan. Feb. Mar. Apr. May Jun. Story Birthday

What SHIELD will do Next Plan - Improve text filter
performance - Make user negative behavior score Text Filter Manager for abusing and result of text filtering - Make more convenient system - Make monitoring system for abusing Abusing Detection - Add rule and ML based models - Make alert system - Make user negative behavior score

Thank you

SHIELD : Protect from Abusers in LINE Timeline

SHIELD : Protect from Abusers in LINE Timeline

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Featured

Transcript