Slide 1

Slide 1 text

Generating a Pairwise Dataset for Click-through Rate Prediction of News Articles Considering Positions and Contents Shotaro Ishihara (Nikkei, Inc.), Yasufumi Nakama [email protected] 2022 Computation + Journalism Conference June 9-11, 2022

Slide 2

Slide 2 text

Research Overview 2 ● Click-through Rate (CTR) prediction is a common task, and useful for evaluating the quality of headlines and thumbnail images. ● However, the CTR prediction model trained with users log data is heavily affected by the display position. ● Therefore, this research proposes a method for generating a pairwise dataset for training the CTR prediction model through a framework of pairwise learning-to-rank. ● We verified its usefulness by experiments and discussed the potential for editing support.

Slide 3

Slide 3 text

Nikkei Overview 3 ● Nikkei's core business is newspaper publishing. Total print and digital subscribers of the Nikkei reach around 3 million. ● The Nikkei is known as the must-read paper for Japanese professionals with extensive coverage of Japan's economy, industry and markets. ● With more than 40 affiliated companies, the group business spreads to publishing, broadcasting, events, database services and index business. ● Financial Times is also part of the Nikkei.

Slide 4

Slide 4 text

Outline 4 ● Introduction ● Related Works ● Proposed Method ● Experiments ● Use Case for Editing Support ● Conclusion and Future Work

Slide 5

Slide 5 text

Headlines and thumbnail images matters 5 ● Many news services displays a list of articles, and individual article pages often provide guidance on related articles. ● Readers also decide whether to move on to the article page based on the information displayed in the external inflow, such as social networking services and browser searches.

Slide 6

Slide 6 text

One of the solutions to measure the quality 6 Nikkei, Inc. utilizes the pattern tests for providing multiple options. Randomized controlled trial Published Pattern test with multi-armed bandit Published Distribution rate

Slide 7

Slide 7 text

Practical difficulties in online evaluation 7 ● There are situations where it is desirable to present uniform information to all readers for news of high importance. ● The possibility that low-quality options may negatively affect the user experience during experiments must be taken into account. ● The workload of editors would be increased in terms of the need to produce several candidates of sufficiently high quality to present to the readers.

Slide 8

Slide 8 text

Editing support of the CTR prediction model 8 headline thumbnail image model create, update feedback predicted CTR publish

Slide 9

Slide 9 text

Position bias: Difficulty in machine learning 9 ● The higher the position, the higher its CTR. ● If the raw CTR data is simply used as a training dataset, there is a concern that a prediction model would be created that focuses with more importance on the display position than on the information of the article itself.

Slide 10

Slide 10 text

Pairwise learning-to-rank 10 ● We construct a pairwise dataset using the similarity of display positions and contents. ● We build a model with learning-to-rank framework: focusing more on contents information by learning to compare the two pairs of articles. model CTR: 0.05, 0.01

Slide 11

Slide 11 text

Outline 11 ● Introduction ● Related Works ● Proposed Method ● Experiments ● Use Case for Editing Support ● Conclusion and Future Work

Slide 12

Slide 12 text

Related works in three perspectives 12 ● CTR prediction: Deep learning [25, 26], Multi-modal [13] ● Position bias: Pairwise learning-to-rank [9, 23] ● Editing support: CTR prediction [17], Headline generation [16, 24] Case study on Yahoo! News, which is similar in problem setting.

Slide 13

Slide 13 text

Position of this research 13 ● CTR prediction: Deep learning [25, 26], Multi-modal [13] ● Position bias: Pairwise learning-to-rank [9, 23] ● Editing support: CTR prediction [17], Headline generation [16, 24] 1. Consideration of position bias derived from service UI. 2. Not only headlines but also thumbnail images. 3. Discussion on use case of headline generation.

Slide 14

Slide 14 text

Outline 14 ● Introduction ● Related Works ● Proposed Method ● Experiments ● Use Case for Editing Support ● Conclusion and Future Work

Slide 15

Slide 15 text

Overview of the proposed method 15 CTR of individual articles Generating a pairwise dataset display position = 1 cluster number = 1 display position = 1 cluster number = 2 … display position = 10 cluster number = 1000 Extracting two pairs of articles from a set that satisfy the set size condition Building a model for predicting CTR using pairwise learning-to-rank model CTR: 0.05, 0.01

Slide 16

Slide 16 text

📝 Notes: ● Clustering: k-means++ [1] ● Vectorizing: TF-IDF [19] ● Hyperparameters: The number of clusters Clustering and creating candidate sets 16 CTR of individual articles Generating a pairwise dataset display position = 1 cluster number = 1 display position = 1 cluster number = 2 … display position = 10 cluster number = 1000

Slide 17

Slide 17 text

Extracting two pairs of articles 17 CTR of individual articles Generating a pairwise dataset display position = 1 cluster number = 1 display position = 1 cluster number = 2 … display position = 10 cluster number = 1000 📝 Notes: ● Hyperparameters: Maximum set size Extracting two pairs of articles from a set that satisfy the set size condition

Slide 18

Slide 18 text

Building a model by pairwise learning-to-rank 18 CTR of individual articles Generating a pairwise dataset display position = 1 cluster number = 1 display position = 1 cluster number = 2 … display position = 10 cluster number = 1000 Extracting two pairs of articles from a set that satisfy the set size condition Building a model for predicting CTR using pairwise learning-to-rank model CTR: 0.05, 0.01

Slide 19

Slide 19 text

Margin Ranking Loss 19 The loss function we use for pairwise learning-to-rank:

Slide 20

Slide 20 text

Outline 20 ● Introduction ● Related Works ● Proposed Method ● Experiments ● Use Case for Editing Support ● Conclusion and Future Work

Slide 21

Slide 21 text

Dataset from the Nikkei Online Edition 21 ● SingleCTR: Raw CTR data ● PatternCTR: ○ Pattern test results. ○ We use its accuracy for evaluation metric. ● PairwiseCTR: ○ Generated from SingleCTR. ○ We use it for training and validation.

Slide 22

Slide 22 text

Four types of models are prepared: ● Baseline: with headline and thumbnail image. ● Baseline + display position + published date time: including information as input. ● Baseline + fixed CTR: correcting the CTR of the training dataset. ● Proposed method: trained with PairwiseCTR. Models 22 headline BERT thumbnail image EfficientNet display position published date time fully connected layer

Slide 23

Slide 23 text

Result tables 23

Slide 24

Slide 24 text

Result summary 24 ● Baseline: suggested the existence of position bias. ● Baseline + display position + published date time: showed improvement for headlines, while no clear performance improvement could be confirmed for thumbnail images. ● Baseline + fixed CTR: did not contribute to the performance. ● Proposed method: showed particularly high performance for thumbnail images. There was also a certain improvement for headlines compared to the baseline, in some cases obtaining results as good as 0.720.

Slide 25

Slide 25 text

Outline 25 ● Introduction ● Related Works ● Proposed Method ● Experiments ● Use Case for Editing Support ● Conclusion and Future Work

Slide 26

Slide 26 text

Workflow in automatic headline generation 26 ● Editor's decision-making can be assisted with the predicted CTR. ● It should also be available as for one perspective for summarization. ● We can also present a visualization of the weights.

Slide 27

Slide 27 text

Be careful not to create clickbait 27 ● It is necessary to be aware of the clickbait issues. ● Even if the CTR is high, headlines and thumbnail images that do not match the body text would damage the user experience. ● We also tackle this issue, for example creating a recognizing textual entailment model.

Slide 28

Slide 28 text

Outline 28 ● Introduction ● Related Works ● Proposed Method ● Experiments ● Use Case for Editing Support ● Conclusion and Future Work

Slide 29

Slide 29 text

Conclusion and Future Work 29 ● This research proposed a method to generate a pairwise dataset for creating the CTR prediction model in the framework of pairwise learning-to-rank considering position bias. ● The experiment reported the better performance potential, and the practical use as editing support was explained. ● The future work is to expand the evaluation dataset for larger scale performance evaluation. 📧 [email protected] 📘 https://speakerdeck.com/upura/