Slide 1

Slide 1 text

LINE޿ࠂʹ͓͚Δ8400 ສਓΛର৅ ͱͨ͠Ϧʔν਺ͷਪఆ ։ൃ4ηϯλʔ / B2B Platform։ൃࣨ Ճլ୩ ๺ే / Hokuto Kagaya LINE Developer Meetup #66 - 2020/07/29 (Wed.)

Slide 2

Slide 2 text

What’s LINE޿ࠂ?

Slide 3

Slide 3 text

84,000,000↑ LINE MAU in Japan (Mar. 2020)

Slide 4

Slide 4 text

LINE Ads Overview (from official Business Guide)

Slide 5

Slide 5 text

LINE Ads Overview (from official Business Guide)

Slide 6

Slide 6 text

User Demographics - Attributes (ex. marital status, mobile carrier, estimated salary etc.) - Behaviors (ex. How often do you watching TV?) - Interests (ex. game, sports, fashion, music, book, travel etc.) - Age, Gender - Area (spot targeting, radius targeting) * These demographic data are basically estimated results based on some behaviors of LINE users

Slide 7

Slide 7 text

Audience (Re)Targeting - Audience Group - IDFA/AAID/Phone number/E-mail address list audience - LINE tag audience - LINE Official Account friends audience - Mobile app reengagement audience - Lookalike audience - Cross platform audience (LINE Official Account, LINE POINT…) - etc..

Slide 8

Slide 8 text

Example - You want to deliver your ads to users who: - live in/work in any places within 5 km of the Shinjuku station - are married - are over 40s - use Android - belong to audience group A or audience group B - do NOT belong to audience group C - have interest on “Finance” OR “health and fitness” - do NOT have interest on “Sports” ?

Slide 9

Slide 9 text

Example - You want to deliver your ads to users who: - live in/work in any places within 5 km of the Shinjuku station - are married - are over 40s - use Android - belong to audience group A or audience group B - do NOT belong to audience group C - have interest on “Finance” or “health and fitness” - do NOT have interest on “Sports” AND NOT OR OR NOT ?

Slide 10

Slide 10 text

Photo by Bermix Studio on Unsplash “Wait, how many users may see my wonderful ads?”

Slide 11

Slide 11 text

Want to know the size of here!

Slide 12

Slide 12 text

The difficulties - Set operation - Must support multiple set operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day

Slide 13

Slide 13 text

Objective Your ads will be delivered to XXX users! Our Awesome System Feedback

Slide 14

Slide 14 text

How to provide the simulator? - Architecture

Slide 15

Slide 15 text

How to provide the simulator? - Main Storage

Slide 16

Slide 16 text

The difficulties - Set operation - Must support multiple set operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day

Slide 17

Slide 17 text

Handle set operations against massive sets - What’s the problem? In other words.. - “To search (count) users who satisfy multiple conditions” - We can use any general search engines! “A” OR “B” AND “C” -D

Slide 18

Slide 18 text

"_source": { "country": “TH", "gender": “2”, "age_range": { "gte": 25, "lte": 29 }, "os_code": "ANDROID", "os_version": "9.0.0.0", "carrier": "9", "persona": [ 76, 127 ], "interests": [ "1.999", "3.999", “6.999", ], "area_geohash": "xxxyyyz", "area_updated_at":1595898442725, "area_level_1": "xxx.u", "area_level_2": “xxx.u.y”, “area_level_3”: ”xxx.u.y.z”, "audience_groups": [ 1111111111111, 2222222222222, 3333333333333 ] }

Slide 19

Slide 19 text

How to provide the simulator? - Data Preparation

Slide 20

Slide 20 text

The difficulties - Set operation - Must support multiple set operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day

Slide 21

Slide 21 text

Store data to Elasticsearch - Input Sources - Multiple input sources - audience groups ← Redis for Ad delivery - estimated attributes ← Hadoop cluster - location data ← Hadoop cluster - NRT data (audience groups) ← Job server

Slide 22

Slide 22 text

Store data to Elasticsearch - # of Users - Numerous users - Sampling -> About 29 Million documents (≒ users, global) - Just “estimation”! - We don’t need exact results

Slide 23

Slide 23 text

Store data to Elasticsearch - Audience Groups - Audience groups are updated day-by-day! - Tag events - Daily execution of lookalike algorithm - List upload by advertisers - Changes of Official Account Friends - Updated by other platforms - etc..

Slide 24

Slide 24 text

Store data to Elasticsearch - Audience Groups - We were going to adopt stream processing initially - Consume ADD/REMOVE events - Store data to Elasticsearch by updating a document - Elasticsearch is NOT good at UPDATE operation - UPDATE ≒ DELETE and INSERT - ONE event means an operation to an “array” field (ADD/REMOVE) - It’s a costly operation max 200,000 qps!

Slide 25

Slide 25 text

Store data to Elasticsearch - Audience Groups - Classic batch processing - Update “all” audience groups that a user belongs to - Just “estimation”!!! - We don’t need exact results - Execute a batch job bi-hourly

Slide 26

Slide 26 text

How to provide the simulator? - Query Part

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Key Takeaways - We’ve built a system to estimate audience size for LINE Ads - We’ve used Elasticsearch as a main storage - Solutions to huge and frequently-changed data - Sampling - Classic batch processing

Slide 29

Slide 29 text

Hokuto Kagaya @hokkun_dayo / hokkun-dayo +PJOBTBOFXHSBEVBUF d 8PSLFEPO-*/&(".&1-"5'03. d 8PSLJOHPO-*/&%.1 Thank you for watching :)

Slide 30

Slide 30 text

Appendix

Slide 31

Slide 31 text

Other topics - Security / Privacy - Use one-way hash for user ID (document ID on Elasticsearch) - Introduce data retention to support opt-out - Periodic removal of obsolete data from Elasticsearch - ID conversion - IDFA/AAID <=> LINE (internal) user ID - We have a mapping on HBase

Slide 32

Slide 32 text

(Appendix) Statistics - About 15,000 query per weekday - About 1,500 unique users per weekday - Execution time %ile (milliseconds) - 50% (median): 32 - 90%: 73 - 95%: 98 - 99%: 171

Slide 33

Slide 33 text

(Appendix) Statistics 0 1 2 3 4 5 6 7 8 9 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 50000000 efficient_targeting_option_size / estimated_size