LINE 広告における 8400 万人を対象としたリーチ数の推定 / Estimated reach of 84 million people in Line ads

LINE޿ࠂʹ͓͚Δ8400 ສਓΛର৅ ͱͨ͠Ϧʔν਺ͷਪఆ ։ൃ4ηϯλʔ / B2B Platform։ൃࣨ Ճլ୩ ๺ే /
Hokuto Kagaya LINE Developer Meetup #66 - 2020/07/29 (Wed.)

What’s LINE޿ࠂ?

84,000,000↑ LINE MAU in Japan (Mar. 2020)

LINE Ads Overview (from oﬃcial Business Guide)

User Demographics - Attributes (ex. marital status, mobile carrier, estimated
salary etc.) - Behaviors (ex. How often do you watching TV?) - Interests (ex. game, sports, fashion, music, book, travel etc.) - Age, Gender - Area (spot targeting, radius targeting) * These demographic data are basically estimated results based on some behaviors of LINE users

Audience (Re)Targeting - Audience Group - IDFA/AAID/Phone number/E-mail address list
audience - LINE tag audience - LINE Ofﬁcial Account friends audience - Mobile app reengagement audience - Lookalike audience - Cross platform audience (LINE Ofﬁcial Account, LINE POINT…) - etc..

Example - You want to deliver your ads to users
who: - live in/work in any places within 5 km of the Shinjuku station - are married - are over 40s - use Android - belong to audience group A or audience group B - do NOT belong to audience group C - have interest on “Finance” OR “health and ﬁtness” - do NOT have interest on “Sports” ?

Example - You want to deliver your ads to users
who: - live in/work in any places within 5 km of the Shinjuku station - are married - are over 40s - use Android - belong to audience group A or audience group B - do NOT belong to audience group C - have interest on “Finance” or “health and ﬁtness” - do NOT have interest on “Sports” AND NOT OR OR NOT ?

Photo by Bermix Studio on Unsplash “Wait, how many users
may see my wonderful ads?”

Want to know the size of here!

The diﬃculties - Set operation - Must support multiple set
operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day

Objective Your ads will be delivered to XXX users! Our
Awesome System Feedback

How to provide the simulator? - Architecture

How to provide the simulator? - Main Storage

Handle set operations against massive sets - What’s the problem?
In other words.. - “To search (count) users who satisfy multiple conditions” - We can use any general search engines! “A” OR “B” AND “C” -D

"_source": { "country": “TH", "gender": “2”, "age_range": { "gte": 25,
"lte": 29 }, "os_code": "ANDROID", "os_version": "9.0.0.0", "carrier": "9", "persona": [ 76, 127 ], "interests": [ "1.999", "3.999", “6.999", ], "area_geohash": "xxxyyyz", "area_updated_at":1595898442725, "area_level_1": "xxx.u", "area_level_2": “xxx.u.y”, “area_level_3”: ”xxx.u.y.z”, "audience_groups": [ 1111111111111, 2222222222222, 3333333333333 ] }

How to provide the simulator? - Data Preparation

Store data to Elasticsearch - Input Sources - Multiple input
sources - audience groups ← Redis for Ad delivery - estimated attributes ← Hadoop cluster - location data ← Hadoop cluster - NRT data (audience groups) ← Job server

Store data to Elasticsearch - # of Users - Numerous
users - Sampling -> About 29 Million documents (≒ users, global) - Just “estimation”! - We don’t need exact results

Store data to Elasticsearch - Audience Groups - Audience groups
are updated day-by-day! - Tag events - Daily execution of lookalike algorithm - List upload by advertisers - Changes of Ofﬁcial Account Friends - Updated by other platforms - etc..

Store data to Elasticsearch - Audience Groups - We were
going to adopt stream processing initially - Consume ADD/REMOVE events - Store data to Elasticsearch by updating a document - Elasticsearch is NOT good at UPDATE operation - UPDATE ≒ DELETE and INSERT - ONE event means an operation to an “array” ﬁeld (ADD/REMOVE) - It’s a costly operation max 200,000 qps!

Store data to Elasticsearch - Audience Groups - Classic batch
processing - Update “all” audience groups that a user belongs to - Just “estimation”!!! - We don’t need exact results - Execute a batch job bi-hourly

How to provide the simulator? - Query Part

Key Takeaways - We’ve built a system to estimate audience
size for LINE Ads - We’ve used Elasticsearch as a main storage - Solutions to huge and frequently-changed data - Sampling - Classic batch processing

Hokuto Kagaya @hokkun_dayo / hokkun-dayo +PJOBTBOFXHSBEVBUF d 8PSLFEPO-*/&(".&1-"5'03. d
8PSLJOHPO-*/&%.1 Thank you for watching :)

Appendix

Other topics - Security / Privacy - Use one-way hash
for user ID (document ID on Elasticsearch) - Introduce data retention to support opt-out - Periodic removal of obsolete data from Elasticsearch - ID conversion - IDFA/AAID <=> LINE (internal) user ID - We have a mapping on HBase

(Appendix) Statistics - About 15,000 query per weekday - About
1,500 unique users per weekday - Execution time %ile (milliseconds) - 50% (median): 32 - 90%: 73 - 95%: 98 - 99%: 171

(Appendix) Statistics 0 1 2 3 4 5 6 7
8 9 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 50000000 efficient_targeting_option_size / estimated_size

LINE 広告における 8400 万人を対象としたリーチ数の推定 / Estimated re...

LINE 広告における 8400 万人を対象としたリーチ数の推定 / Estimated reach of 84 million people in Line ads

LINE Developers

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript

LINE޿ࠂʹ͓͚Δ8400 ສਓΛର৅ ͱͨ͠Ϧʔν਺ͷਪఆ ։ൃ4ηϯλʔ / B2B Platform։ൃࣨ Ճլ୩ ๺ే /

What’s LINE޿ࠂ?

84,000,000↑ LINE MAU in Japan (Mar. 2020)

LINE Ads Overview (from oﬃcial Business Guide)

LINE Ads Overview (from oﬃcial Business Guide)

User Demographics - Attributes (ex. marital status, mobile carrier, estimated

Audience (Re)Targeting - Audience Group - IDFA/AAID/Phone number/E-mail address list

Example - You want to deliver your ads to users

Example - You want to deliver your ads to users

Photo by Bermix Studio on Unsplash “Wait, how many users

Want to know the size of here!

The diﬃculties - Set operation - Must support multiple set

Objective Your ads will be delivered to XXX users! Our

How to provide the simulator? - Architecture

How to provide the simulator? - Main Storage

The diﬃculties - Set operation - Must support multiple set

Handle set operations against massive sets - What’s the problem?

"_source": { "country": “TH", "gender": “2”, "age_range": { "gte": 25,

How to provide the simulator? - Data Preparation

The diﬃculties - Set operation - Must support multiple set

Store data to Elasticsearch - Input Sources - Multiple input

Store data to Elasticsearch - # of Users - Numerous

Store data to Elasticsearch - Audience Groups - Audience groups

Store data to Elasticsearch - Audience Groups - We were

Store data to Elasticsearch - Audience Groups - Classic batch

How to provide the simulator? - Query Part

Key Takeaways - We’ve built a system to estimate audience

Hokuto Kagaya @hokkun_dayo / hokkun-dayo +PJOBTBOFXHSBEVBUF d 8PSLFEPO-*/&(".&1-"5'03. d

Appendix

Other topics - Security / Privacy - Use one-way hash

(Appendix) Statistics - About 15,000 query per weekday - About

(Appendix) Statistics 0 1 2 3 4 5 6 7