Pro Yearly is on sale from $80 to $50! »

LINE 広告における 8400 万人を対象としたリーチ数の推定 / Estimated reach of 84 million people in Line ads

LINE 広告における 8400 万人を対象としたリーチ数の推定 / Estimated reach of 84 million people in Line ads

加賀谷 北斗 (LINE / 開発4センター/ B2B Platform開発室)
2020年3月に LINE 広告の管理画面向けにリリースした「推定オーディエンスモジュール」は、「こんな人に広告を配信したい」というターゲティングの設定から、実際に配信される可能性のあるユーザの数を推定する機能です。日本における LINE の MAU 8400 万人を対象にして、複雑な条件を満たすユーザのユニーク数を推定するのは単純な問題ではなく、ナイーブな処理方法を取ると全く太刀打ちできません。このセッションでは、"Count-distinct Problem" とも呼ばれるこの問題に我々がどのように対処しているかをサーバサイドエンジニアリングの視点からご紹介します。

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers

July 29, 2020
Tweet

Transcript

  1. LINE޿ࠂʹ͓͚Δ8400 ສਓΛର৅ ͱͨ͠Ϧʔν਺ͷਪఆ ։ൃ4ηϯλʔ / B2B Platform։ൃࣨ Ճլ୩ ๺ే /

    Hokuto Kagaya LINE Developer Meetup #66 - 2020/07/29 (Wed.)
  2. What’s LINE޿ࠂ?

  3. 84,000,000↑ LINE MAU in Japan (Mar. 2020)

  4. LINE Ads Overview (from official Business Guide)

  5. LINE Ads Overview (from official Business Guide)

  6. User Demographics - Attributes (ex. marital status, mobile carrier, estimated

    salary etc.) - Behaviors (ex. How often do you watching TV?) - Interests (ex. game, sports, fashion, music, book, travel etc.) - Age, Gender - Area (spot targeting, radius targeting) * These demographic data are basically estimated results based on some behaviors of LINE users
  7. Audience (Re)Targeting - Audience Group - IDFA/AAID/Phone number/E-mail address list

    audience - LINE tag audience - LINE Official Account friends audience - Mobile app reengagement audience - Lookalike audience - Cross platform audience (LINE Official Account, LINE POINT…) - etc..
  8. Example - You want to deliver your ads to users

    who: - live in/work in any places within 5 km of the Shinjuku station - are married - are over 40s - use Android - belong to audience group A or audience group B - do NOT belong to audience group C - have interest on “Finance” OR “health and fitness” - do NOT have interest on “Sports” ?
  9. Example - You want to deliver your ads to users

    who: - live in/work in any places within 5 km of the Shinjuku station - are married - are over 40s - use Android - belong to audience group A or audience group B - do NOT belong to audience group C - have interest on “Finance” or “health and fitness” - do NOT have interest on “Sports” AND NOT OR OR NOT ?
  10. Photo by Bermix Studio on Unsplash “Wait, how many users

    may see my wonderful ads?”
  11. Want to know the size of here!

  12. The difficulties - Set operation - Must support multiple set

    operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day
  13. Objective Your ads will be delivered to XXX users! Our

    Awesome System Feedback
  14. How to provide the simulator? - Architecture

  15. How to provide the simulator? - Main Storage

  16. The difficulties - Set operation - Must support multiple set

    operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day
  17. Handle set operations against massive sets - What’s the problem?

    In other words.. - “To search (count) users who satisfy multiple conditions” - We can use any general search engines! “A” OR “B” AND “C” -D
  18. "_source": { "country": “TH", "gender": “2”, "age_range": { "gte": 25,

    "lte": 29 }, "os_code": "ANDROID", "os_version": "9.0.0.0", "carrier": "9", "persona": [ 76, 127 ], "interests": [ "1.999", "3.999", “6.999", ], "area_geohash": "xxxyyyz", "area_updated_at":1595898442725, "area_level_1": "xxx.u", "area_level_2": “xxx.u.y”, “area_level_3”: ”xxx.u.y.z”, "audience_groups": [ 1111111111111, 2222222222222, 3333333333333 ] }
  19. How to provide the simulator? - Data Preparation

  20. The difficulties - Set operation - Must support multiple set

    operations: AND, OR and NOT - These set operations can be nested - Size - Multiple input sources - Numerous users - Updatability - Each set is updated day-by-day
  21. Store data to Elasticsearch - Input Sources - Multiple input

    sources - audience groups ← Redis for Ad delivery - estimated attributes ← Hadoop cluster - location data ← Hadoop cluster - NRT data (audience groups) ← Job server
  22. Store data to Elasticsearch - # of Users - Numerous

    users - Sampling -> About 29 Million documents (≒ users, global) - Just “estimation”! - We don’t need exact results
  23. Store data to Elasticsearch - Audience Groups - Audience groups

    are updated day-by-day! - Tag events - Daily execution of lookalike algorithm - List upload by advertisers - Changes of Official Account Friends - Updated by other platforms - etc..
  24. Store data to Elasticsearch - Audience Groups - We were

    going to adopt stream processing initially - Consume ADD/REMOVE events - Store data to Elasticsearch by updating a document - Elasticsearch is NOT good at UPDATE operation - UPDATE ≒ DELETE and INSERT - ONE event means an operation to an “array” field (ADD/REMOVE) - It’s a costly operation max 200,000 qps!
  25. Store data to Elasticsearch - Audience Groups - Classic batch

    processing - Update “all” audience groups that a user belongs to - Just “estimation”!!! - We don’t need exact results - Execute a batch job bi-hourly
  26. How to provide the simulator? - Query Part

  27. None
  28. Key Takeaways - We’ve built a system to estimate audience

    size for LINE Ads - We’ve used Elasticsearch as a main storage - Solutions to huge and frequently-changed data - Sampling - Classic batch processing
  29. Hokuto Kagaya @hokkun_dayo / hokkun-dayo  +PJOBTBOFXHSBEVBUF d 8PSLFEPO-*/&(".&1-"5'03. d

    8PSLJOHPO-*/&%.1 Thank you for watching :)
  30. Appendix

  31. Other topics - Security / Privacy - Use one-way hash

    for user ID (document ID on Elasticsearch) - Introduce data retention to support opt-out - Periodic removal of obsolete data from Elasticsearch - ID conversion - IDFA/AAID <=> LINE (internal) user ID - We have a mapping on HBase
  32. (Appendix) Statistics - About 15,000 query per weekday - About

    1,500 unique users per weekday - Execution time %ile (milliseconds) - 50% (median): 32 - 90%: 73 - 95%: 98 - 99%: 171
  33. (Appendix) Statistics 0 1 2 3 4 5 6 7

    8 9 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 50000000 efficient_targeting_option_size / estimated_size